Use the PGI Profiler (a/k/a PGPROF®) to help you analyze your application's performance.

There are two profiling modes: Command-line profiling and Visual profiling. You can enter the command-line mode by opening a shell and entering the commands that follow below. You can enter the graphical mode by typing pgprof in a command shell. You can follow this guide in either mode.

You're currently viewing in:   

Using the PGI Profiler consists of two basic steps: Profiling your application and then analyzing the profile. Both steps can be accomplished in either mode. In command-line mode, two distinct commands are used to profile and analyze. In graphical mode, both profiling and analysis can be accomplished in the same session. Additionally, it's possible to save a profile either at the command-line or within graphical mode. The saved profile can be analyzed within the graphical mode by using the File | Import option.

  How long does it take my application to run?

Profile a.out and save the performance results in the file a.prof.

$ pgprof -o a.prof a.out
                

The pgprof --cpu-profiling-mode top-down option orients the call tree to show main at the top and the functions it called below.

$ pgprof --cpu-profiling-mode top-down -i a.prof
                
======== CPU profiling result (top down):
Time(%)      Time  Name
 97.36%  35.2596s  main
 97.36%  35.2596s  | MAIN_
 32.02%  11.5976s  |   swim_mod_calc3_
 29.98%  10.8578s  |   swim_mod_calc2_
 25.93%  9.38965s  |   swim_mod_calc1_
  6.82%  2.46976s  |   swim_mod_inital_
  1.76%  637.36ms  |   | __fvd_sin_vex_256
  1.73%  625.98ms  |   | | __fvd_sin_vex
  1.51%  546.31ms  |   | __fvd_cos_vex_256
  1.48%  534.93ms  |   | | __fvd_cos_vex
  0.06%  22.763ms  |   | __fvd_cos_vex
  0.03%  11.381ms  |   | __fvd_sin_vex
  0.03%  11.381ms  |   | _mp_penter
  0.03%  11.381ms  |   |   _mp_cpenter
  0.03%  11.381ms  |   |     _mp_create_team
  0.03%  11.381ms  |   |       _mp_barrierw
  1.79%  648.74ms  |   swim_mod_calc3z_
  0.44%  159.34ms  |   _mp_pexit
  0.44%  159.34ms  |     _mp_cpexit
  0.44%  159.34ms  |       _mp_barrierw
  2.64%  956.04ms  _mp_slave
  2.64%  956.04ms    _mp_cslave
  2.64%  956.04ms      _mp_barrier_tw
                
  • To start a new profiling session, after launching the PGI Profiler, open the File menu and select New Session.
  • In the dialog box, browse to the executable file you want to profile. Then add any command line arguments with which to launch it.
  • Click Next then Finish.
  • In the CPU Details tab, click on the "Show the top-down (callers first) call tree view" button as shown below.
PGPROF GUI views
  What part of my application takes the longest time to run?

Profile a.out and save the performance results in the file a.prof.

$ pgprof -o a.prof a.out
                

The pgprof --cpu-profiling-mode bottom-up option orients the call tree to show each function followed by functions that called it working backwards to main.

$ pgprof --cpu-profiling-mode bottom-up -i a.prof
                
======== CPU profiling result (bottom up):
Time(%)      Time  Name
 32.02%  11.5976s  swim_mod_calc3_
 32.02%  11.5976s  | MAIN_
 32.02%  11.5976s  |   main
 29.98%  10.8578s  swim_mod_calc2_
 29.98%  10.8578s  | MAIN_
 29.98%  10.8578s  |   main
 25.93%  9.38965s  swim_mod_calc1_
 25.93%  9.38965s  | MAIN_
 25.93%  9.38965s  |   main
  3.43%  1.24057s  swim_mod_inital_
  3.43%  1.24057s  | MAIN_
  3.43%  1.24057s  |   main
  2.64%  956.04ms  _mp_barrier_tw
  2.64%  956.04ms  | _mp_cslave
  2.64%  956.04ms  |   _mp_slave
  1.79%  648.74ms  swim_mod_calc3z_
  1.79%  648.74ms  | MAIN_
  1.79%  648.74ms  |   main
  1.76%  637.36ms  __fvd_sin_vex
  1.73%  625.98ms  | __fvd_sin_vex_256
  1.73%  625.98ms  | | swim_mod_inital_
  1.73%  625.98ms  | |   MAIN_
  1.73%  625.98ms  | |     main
  0.03%  11.381ms  | swim_mod_inital_
  0.03%  11.381ms  |   MAIN_
  0.03%  11.381ms  |     main
  1.54%  557.69ms  __fvd_cos_vex
  1.48%  534.93ms  | __fvd_cos_vex_256
  1.48%  534.93ms  | | swim_mod_inital_
  1.48%  534.93ms  | |   MAIN_
  1.48%  534.93ms  | |     main
  0.06%  22.763ms  | swim_mod_inital_
  0.06%  22.763ms  |   MAIN_
  0.06%  22.763ms  |     main
  0.47%  170.72ms  _mp_barrierw
  0.44%  159.34ms  | _mp_cpexit
  0.44%  159.34ms  | | _mp_pexit
  0.44%  159.34ms  | |   MAIN_
  0.44%  159.34ms  | |     main
  0.03%  11.381ms  | _mp_create_team
  0.03%  11.381ms  |   _mp_cpenter
  0.03%  11.381ms  |     _mp_penter
  0.03%  11.381ms  |       swim_mod_inital_
  0.03%  11.381ms  |         MAIN_
  0.03%  11.381ms  |           main
  0.38%  136.58ms  MAIN_
  0.38%  136.58ms  | main
  0.03%  11.381ms  __fvd_cos_vex_256
  0.03%  11.381ms  | swim_mod_inital_
  0.03%  11.381ms  |   MAIN_
  0.03%  11.381ms  |     main
  0.03%  11.381ms  __fvd_sin_vex_256
  0.03%  11.381ms    swim_mod_inital_
  0.03%  11.381ms      MAIN_
  0.03%  11.381ms        main
                
  • To start a new profiling session, after launching the PGI Profiler, open the File menu and select New Session.
  • In the dialog box, browse to the executable file you want to profile. Then add any command line arguments with which to launch it.
  • Click Next then Finish.
  • In the CPU Details tab, click on the "Show the bottom-up (callees first) call tree view" button as shown below.
PGPROF GUI views
  How can I visualize both CPU and GPU performance data?

Profile a.out and save the performance results in the file a.prof.

$ pgprof -o a.prof a.out
                

Then display the contents of the output file.

$ pgprof -i a.prof
                

The results are broken into four sections:

  1. GPU kernel execution profile.
  2. CUDA API execution profile.
  3. OpenACC execution profile.
  4. CPU execution profile.
====== Profiling result:
Time(%)      Time     Calls       Avg       Min       Max  Name
 38.14%  1.41393s        20  70.696ms  70.666ms  70.731ms  calc2_198_gpu
 31.11%  1.15312s        18  64.062ms  64.039ms  64.083ms  calc3_273_gpu
 23.35%  865.68ms        20  43.284ms  43.244ms  43.325ms  calc1_142_gpu
  5.17%  191.78ms       141  1.3602ms  1.3120us  1.6409ms  [CUDA memcpy HtoD]
...

======== API calls:
Time(%)      Time     Calls       Avg       Min       Max  Name
 92.65%  3.49314s        62  56.341ms  1.8850us  70.771ms  cuStreamSynchronize
  3.78%  142.36ms         1  142.36ms  142.36ms  142.36ms  cuDevicePrimaryCtxRetain
...

======== OpenACC (excl):
Time(%)      Time     Calls       Avg       Min       Max  Name
 36.27%  1.41470s        20  70.735ms  70.704ms  70.773ms  acc_wait@swim-acc-data.f:223
 29.60%  1.15449s        18  64.138ms  64.114ms  64.159ms  acc_wait@swim-acc-data.f:302
 22.22%  866.66ms        20  43.333ms  43.294ms  43.376ms  acc_wait@swim-acc-data.f:169
  9.06%  353.49ms         1  353.49ms  353.49ms  353.49ms  acc_update@swim-acc-data.f:402
...

======== CPU profiling result (bottom up):
Time(%)      Time  Name
 59.09%  8.55785s  cudbgGetAPIVersion
 59.09%  8.55785s  | start_thread
 59.09%  8.55785s  |   clone
 25.75%  3.73007s  cuStreamSynchronize
 25.75%  3.73007s  | __pgi_uacc_cuda_wait
 25.75%  3.73007s  |   __pgi_uacc_computedone
 10.38%  1.50269s  |     swim_mod_calc2_
 10.38%  1.50269s  |     | MAIN_
 10.38%  1.50269s  |     |   main
  8.54%  1.23625s  |     swim_mod_calc3_
  8.54%  1.23625s  |     | MAIN_
  8.54%  1.23625s  |     |   main
  6.48%  937.85ms  |     swim_mod_calc1_
  6.48%  937.85ms  |     | MAIN_
  6.48%  937.85ms  |     |   main
  0.37%  53.287ms  |     swim_mod_calc3z_
  0.37%  53.287ms  |       MAIN_
  0.37%  53.287ms  |         main
  6.03%   873.9ms  swim_mod_inital_
...
                
  • To start a new profiling session, after launching the PGI Profiler, open the File menu and select New Session.
  • In the dialog box, browse to the executable file you want to profile. Then add any command line arguments with which to launch it.
  • Click Next then Finish.
  1. The Timeline view will show the profiled events ordered by the time they occurred.
  2. The GPU Details tab lists the performance details for each GPU kernel.
  3. The CPU Details tab shows the CPU call tree.
  4. The Properties tab shows the details of the events selected on the timeline.
PGPROF GUI     views
  What information can the compiler tell me about how my application is structured for performance?

Line level profiling is not currently available on OpenPOWER. Use the Visual mode to view compiler messages.

The PGI Profiler can show you information about how your program was compiled. First add the following option when compiling and linking:

-Minfo=ccff
 		

Profile a.out and save the performance results in the file a.prof.

$ pgprof -o a.prof a.out
                

Now show the profile with the CCFF information and break down the performance results by line.

$ pgprof --cpu-profiling-show-ccff on -i a.prof
                
======== CPU profiling result (bottom up):
Time(%)      Time  Name
  3.81%  1.42602s  swim_mod_calc3_ (src/swim-omp.f:259 0xe27)
                 [CCFF] (MSGINTENSITY) Intensity = 1.00
                 [CCFF] (MSGVECT) Generated 3 alternate versions of the loop
                 [CCFF] (MSGVECT) Generated vector sse code for the loop
                 [CCFF] (MSGPREFETCH) Generated 9 prefetch instructions for the loop
  3.81%  1.42602s  | MAIN_ (src/swim-omp.f:423 0x3d1)
  3.81%  1.42602s  |   main (0x44)
  3.74%  1.40205s  swim_mod_calc1_ (src/swim-omp.f:140 0xe0e)
                 [CCFF] (MSGINTENSITY) Intensity = 1.93
                 [CCFF] (MSGLRE) 2 loop-carried redundant expressions removed with 2
                                 operations and 4 arrays
                 [CCFF] (MSGVECT) Generated 5 alternate versions of the loop
                 [CCFF] (MSGVECT) Generated vector sse code for the loop
                 [CCFF] (MSGPREFETCH) Generated 6 prefetch instructions for the loop
  3.74%  1.40205s  | MAIN_ (src/swim-omp.f:384 0x398)
                |  [CCFF] (MSGINTENSITY) Intensity = 1.00
                |  [CCFF] (MSGINTENSITY) Intensity = 1.00
                |  [CCFF] (MSGNEGVECT) Loop not vectorized/parallelized: contains call
  3.74%  1.40205s  |   main (0x44)
  3.58%  1.34214s  swim_mod_calc3_ (src/swim-omp.f:259 0xe01)
                 [CCFF] (MSGINTENSITY) Intensity = 1.00
                 [CCFF] (MSGVECT) Generated 3 alternate versions of the loop
                 [CCFF] (MSGVECT) Generated vector sse code for the loop
                 [CCFF] (MSGPREFETCH) Generated 9 prefetch instructions for the loop
  3.58%  1.34214s  | MAIN_ (src/swim-omp.f:423 0x3d1)
  3.58%  1.34214s  |   main (0x44)
  3.46%   1.2942s  swim_mod_calc3_ (src/swim-omp.f:259 0xde6)
                 [CCFF] (MSGINTENSITY) Intensity = 1.00
                 [CCFF] (MSGVECT) Generated 3 alternate versions of the loop
                 [CCFF] (MSGVECT) Generated vector sse code for the loop
                 [CCFF] (MSGPREFETCH) Generated 9 prefetch instructions for the loop
  3.46%   1.2942s  | MAIN_ (src/swim-omp.f:423 0x3d1)
  3.46%   1.2942s  |   main (0x44)
  3.07%   1.1504s  swim_mod_calc3_ (src/swim-omp.f:259 0xdd7)
                 [CCFF] (MSGINTENSITY) Intensity = 1.00
                 [CCFF] (MSGVECT) Generated 3 alternate versions of the loop
                 [CCFF] (MSGVECT) Generated vector sse code for the loop
                 [CCFF] (MSGPREFETCH) Generated 9 prefetch instructions for the loop
  3.07%   1.1504s  | MAIN_ (src/swim-omp.f:423 0x3d1)
  3.07%   1.1504s  |   main (0x44)
  2.98%  1.11445s  swim_mod_calc3_ (src/swim-omp.f:259 0xdc4)
                 [CCFF] (MSGINTENSITY) Intensity = 1.00
                 [CCFF] (MSGVECT) Generated 3 alternate versions of the loop
                 [CCFF] (MSGVECT) Generated vector sse code for the loop
                 [CCFF] (MSGPREFETCH) Generated 9 prefetch instructions for the loop
  2.98%  1.11445s  | MAIN_ (src/swim-omp.f:423 0x3d1)
  2.98%  1.11445s  |   main (0x44)
                

The PGI Profiler can provide you with information about how your program was compiled. First add the following option when compiling and linking:

-Minfo=ccff
  		

Then profile.

  • To start a new profiling session, after launching the PGI Profiler, open the File menu and select New Session.
  • In the dialog box, browse to the executable file you want to profile. Then add any command line arguments with which to launch it.
  • Click Next then Finish.
  • On the CPU Details tab, click the "Show the code structure view" button.
  • Double-click on a file entry, function name or line number to bring up that file or specific line in the source code view.
  • To the left of the line numbers in the source code view you’ll see a series of compiler notes; hover the mouse over a note to bring up the feedback provided by the compiler.
PGPROF source code browser

For more information about the PGI Profiler, use the '--help' option in command-line mode or in graphical mode, use the Help | Contents menu item.

Click me