PGI Tools Questions

Debugging with PGDBG

What should I know about debugging unified binaries?

In unified binaries (programs compiled with '-tp=x64'), subprogram names may be modified in order to distinguish the AMD64 version of the subprogram from the EM64T version. For example, given subprogram 'sub', the PGI compilers may generate a subprogram named sub.pgi.uni.1 (the AMD64 version) and sub.pgi.uni.2 (the EM64T version). In this case, debuggers will probably not recognize subprogram 'sub', and the processor-specific name will have to be used instead. The following bullet items describe how to work around some of the issues caused by this renaming.


When I compile using -g and -Mconcur, line number information appears to be missing for non-parallelized functions like "main". Is this to be expected?

Yes. The -Mconcur, -Mvect, -fast, and -fastsse options all result in optimization levels greater than -O0. At best, debug information for optimized code will only contain line numbers for 'basic blocks'. A basic block can consist of more than one statement. Note that debug information for lexical blocks is not produced with -Mvect and -Mconcur. It doesn't matter if vectorization or parallelization wasn't performed; it's just the presence of the option which disables the generation of lexical block information.


I am seeing messages printed to the Program I/O window of the PGDBG GUI that do not originate from my program.

All output from the child processes of the GUI is redirected to the Program I/O window. The 'shell' and 'edit' commands, for example, will send their output to the Program I/O window. Some PGDBG error messages may also appear here (e.g. failed reads/writes to program memory).


I am not able to call a Fortran subroutine/function.

In PGDBG 5.2, the call command does not support Fortran subroutines with pointer or deferred-shape array arguments or return values. Prior to release 5.2, PGDBG did not support any Fortran array arguments or return values.


I am having trouble getting F90 module information in pgdbg.

Suppose you have a source file, foo.f90, containing 'MODULE XX'. Compilation of foo.f90 results in two files: 'XX.mod' and 'foo.o'. Suppose another F90 file refers to the module by using the statement 'USE XX'. In some cases, that file can be compiled and linked without linking to foo.o.

However, when compiling foo.f90 for debug with the '-g' option, the debug information generated for 'MODULE XX' is located in the object file foo.o. If an executable refers to 'MODULE XX' with the statement 'USE XX', and is not linked to foo.o, the debug information for the module will be unavailable.

If you want to access module information from PGDBG, make sure that the executable is linked with the object files ('.o' files) generated from the source files containing your modules and that the files were compiled with '-g'.


When I recompile my program, the source window of the GUI shows the old program source.

If you recompile your program, you must reload the program into PGDBG using the 'debug' command, and (in release 5.2) update the source panel by entering control-L (^L) or selecting the menu item Options->Refresh.


How do you start pgdbg with a startup file?

PGDBG can be invoked with a startup file (called 'startup' in the example below) as follows:

pgdbg -s startup  a.out

Can I read the SSE XMM registers from pgdbg?

If you are using PGDBG to debug a program running on an X86 processor, set the PGDBG_SSE environment variable to "on" to make the XMM registers visible inside the debugger.

% setenv PGDBG_SSE on  
 
PGDBG_SSE=on ; export PGDBG_SSE

If you are using PGDBG to debug a program running on an AMD64 processor, the XMM registers are visible by default.

If you are using PGDBG 4.1-2, the XMM registers are available, but only if the program is linked -Bstatic.


My program is taking a relatively long amount of time to execute under the control of the debugger.

Performance when executing under control of PGDBG can be affected in two scenarios:

  1. You are using one of the watch family of commands: watch, watchi, trace, tracei, track, and tracki. These commands may single-step the program instruction by instruction, watching for some condition. Slow performance in this scenario is expected behavior.
  2. You are debugging a program compiled with -mp or -Mconcur, and the PGDBG OpenMP event handler is enabled (it is disabled by default in release 5.2 and later). Slow performance in this scenario is expected behavior.

    When the OpenMP event handler is enabled, PGDBG adds breakpoints at certain locations in your program to detect various OpenMP events (such as the start and end of a parallel region). PGDBG's OpenMP event handler is most useful when you are debugging a parallel region, or passing in and out of parallel regions with the line-level debugging commands such as 'next' and 'step'.

    If you are continuing over many parallel regions, the program will run much faster under PGDBG if the OpenMP event handler is disabled. Use the pgienv command to disable the OpenMP event handler:

    pgdbg> pgienv omp off
    

    NOTE: The OpenMP event handler should only be enabled/disabled while the program is stopped in a serial region.


Why doesn't the debugger always single step over Windows system calls?

On Windows the stepi/nexti PGDBG commands time out when executed on a blocked system call. When attempting to step over the block system call, the program counter does not increment. Notice in the following example that the second stepi results in the program remaining stopped at the same address as the first stepi call.

pgdbg [all] 0> stepi
[0] Stopped at 0x78EF181A, function ZwWaitForMultipleObjects
0x78EF181A:  C3                         ret

pgdbg [all] 0> stepi
[0] Stopped at 0x78EF181A, function ZwWaitForMultipleObjects
0x78EF181A:  C3                         ret

To simulate the effect of single stepping the program, set a breakpoint at the return address and then continue the program, per the following typescript:

pgdbg [all] 0> p ((void **)$rsp)[0]
(void *) 0x78D6CF8B
pgdbg [all] 0> breaki 0x78D6CF8B
[all] (1)breakpoint set at: ReleaseSemaphore address: 0x78D6CF8B
1
pgdbg [all] 0> cont
[0] Breakpoint at 0x78D6CF8B, function ReleaseSemaphore
0x78D6CF8B:  8B F8                      movl   %eax,%edi

pgdbg [all] 0>

Debugging MPI Applications with PGDBG

How do I invoke PGDBG to debug my MPI program?

PGDBG must be invoked via MPIRUN to debug a parallel MPI program. The MPI program must be linked with the MPICH library and contain the appropriate calls to MPI routines (e.g. MPI_Init).

% pgcc -g -o program prog.c -lmpich
% mpirun -np 4 -dbg=pgdbg program

Prior to release 5.2, PGDBG was not able to attach to a running MPI program. /


Are there restrictions on what types of MPI applications PGDBG can debug?

PGDBG makes several assumptions about MPI applications:


What do I have to do to my custom MPI distribution to make it work with PGDBG?

IMPORTANT: PGDBG will work automatically with the MPICH library that is shipped with the CDK. No modifications to the CDK version of MPIRUN are needed.

Follow the instructions below only if you want to use PGDBG with a distribution of MPICH other than the one provided in the CDK.

If your version of MPIRUN supports the '-dbg' option

  1. Copy mpirun_dbg.pgdbg from $PGI/linux86/bin into the bin directory containing the MPIRUN script files you will be using.
  2. You must modify MPIRUN if you wish to use PGDBG with the following MPIRUN switches:
    • -stdout
    • -stderr
    • -nolocal
    • -stdin

    To support the switches listed, edit mpirun.ch_p4 script file as shown by the "diffs" below. Add the changes in the second file (denoted by the '>' prefix) to your mpirun.ch_p4 script file:

    % diff mpirun.ch_p4 mpirun.ch_p4.modified
    204c204
    < . $MPIRUN_HOME/mpirun_dbg.$debugger -progname $prognamemain -p4pg
    $p4pgfile_master -p4wd $p4workdir -cmdlineargs "$cmdLineArgs"
    ---
    > $MPIRUN_HOME/mpirun_dbg.$debugger -progname $prognamemain -p4pg
    $p4pgfile_master -p4wd $p4workdir -cmdlineargs "$cmdLineArgs"
    

If your version of MPIRUN supports the '-debug' option (not '-dbg')

  1. Edit your mpirun.ch_p4 script file to use PGDBG. Below is an example of how the MPIRUN scripts from MPICH 1.1.2 were modified to work with PGDBG. Make the changes in the second file (denoted by the '>' prefix) in your mpirun.ch_p4 script file:

  2. % diff mpirun.ch_p4 mpirun.ch_p4.modified
    192c192
    < echo "ignore USR1" >> $dbgfile
    ---
    > echo "ignore 10" >> $dbgfile
    198c198
    <   echo "stop in MPI_Init" >> $dbgfile
    ---
    >   echo "break main" >> $dbgfile
    %
    
  3. Add a "-pgdbg" option to your mpirun.args script file. Modify mpirun.args to include the changes in the second file.

    % diff mpirun.args mpirun.args.modified
    112a113
    > -pgdbg   Start the first process under pgdbg where possible
    405a407,410
    >   ;;
    > -pgdbg)
    >   debugger="pgdbg"
    >   commandfile="-s %f"
    %   
    

    In the absence of the MPIRUN '-dbg' option, PGDBG is invoked using the -pgdbg option as shown.

    % mpirun -np 4 -pgdbg program <args ...>
    
  4. The MPIRUN scripts will need further modification to support the following options for use with the -pgdbg option.

    • -stdout
    • -stderr
    • -nolocal
    • -stdin

    Instructions for making such modifications are not included here.


What does PGDBG do differently when the MPIRUN -nolocal option is used?

When PGDBG is invoked using MPIRUN without '-nolocal', the target application processes are spawned on the local host as well as on remote hosts. The debugger is launched on the local host, and a lightweight debug server (named 'pgserv') is launched on every host where a target process is running (both local and remote).

When PGDBG is invoked using MPIRUN with '-nolocal', the target application processes are spawned only on remote hosts, as determined by the 'machines.LINUX' file. The debugger is launched on the local host, and pgserv is launched only on the hosts where target processes are running (i.e. only on remote hosts). This allows the debugger to be isolated from the application under debug, in order to (for example) prevent resource contention between the debugger and the application.


How do I use a script file with PGDBG when it is invoked with MPIRUN?

PGDBG options (-s,-text,-dbx,...) are not available when PGDBG is invoked via MPIRUN. The default init file ${HOME}/.pgdbgrc (or ./.pgdbgrc if it exists), in conjunction with the 'script' command, can be used in most cases to provide the same functionality. The script file '${PGI}/linux86/bin/mpirun_dbg.pgdbg' could also be edited to add any of these arguments.


Why do I not see output written to standard output by some of my MPI processes?

By default, output from non-initial MPI processes is block buffered, so output may not be flushed to stdout until some time after it is written. This is a characteristic of MPICH. See fflush(3) for a workaround.


The debugger adds double quotes around the command line program arguments to my MPI program.

To fix this add the following line (the one just after the comment line) to your mpirun_dbg.pgdbg file:

...
#undo quoting done in mpirun.args
cmdLineArgs="`eval echo $cmdLineArgs`"

cmdPgdbgRun="run $cmdLineArgs -p4pg $p4pgfile -p4wd $p4workdir -mpichtv $cmdPgdb
gRun"
echo "$cmdPgdbgRun" >> $dbgfile 
...


Using SSH (secure shell) as PGDBG's interprocess communication mechanism.

PGDBG uses a remote debug server to debug each node of a distributed MPI program. PGDBG communicates with these servers using 'rsh' by default. To force PGDBG to use 'ssh' set the following environment variable on the host system:

   % setenv PGRSH 'ssh'  

   $ PGRSH=ssh ; export PGRSH

To avoid having to type your ssh key for each remote process and debug server, an 'ssh-agent' can be set up. See your ssh documentation.

Note that PGDBG will use 'rsh' or 'ssh' only when debugging a remote process.


Why does PGDBG fail to attach to processes in my MPI application?

There is a system-imposed limit on the number of sockets that can be created on any specific instance of the Linux operating system. PGDBG uses sockets for inter-process communication between the debugger and lightweight debug servers that run on each node of the cluster. If the system runs out of socket resources, the message below will be printed.

  poll: protocol failure in circuit setup
  - accept: Bad file descriptor
  ERROR: unable to attach to (PID 2763, HOST <hostname>)

  [New Process (PID 26200, HOST <hostname>) IGNORED]  

PGDBG will then ignore any application processes that are not yet controlled by the debugger, and those processes will not return from MPI_Init.

Debugging OpenMP and Multi-Threaded Applications with PGDBG

I cannot access OpenMP private variables while using PGDBG to debug my Fortran program.

Accessing the values of Fortran private variables using PGDBG is not currently supported.

NOTE: The above statement applies only to debug information emitted by PGI compilers. All code generated by PGI compilers is fully compliant with the OpenMP standard. Only the ability to inspect and modify the values of private variables using PGDBG (and any other debugger) is restricted.


I cannot access OpenMP private variables while using PGDBG to debug my C program.

Private variables in C must be declared in the enclosing lexical block of the parallel region in order for them to be visible using PGDBG.

Currently, only the following form of private declaration is supported by PGDBG:

#pragma omp parallel
{
int i;
....
}

In the above case, i would be visible inside PGDBG for each thread. However, in the following example, i is not visible inside PGDBG.

int i;
#pragma omp parallel private(i)
{
....
}

NOTE: The above statement applies only to debug information emitted by PGI compilers. All code generated by PGI compilers is fully compliant with the OpenMP standard. Only the ability to inspect and modify the values of private variables using PGDBG (and any other debugger) is restricted.


Why does PGDBG show that all my OpenMP threads are killed during execution?

Calling omp_set_num_threads(n) from the target program may cause all non-initial threads to exit, followed by the creation of the 'n-1' new threads requested by the call to omp_set_num_threads(n). This is expected behavior.


Why does PGDBG show that my OpenMP/MPI application receives SIGHUP?

Calling omp_set_num_threads(n) from hybrid OpenMP/MPI applications may result in the use of the SIGHUP signal by MPI_Finalize. This is expected behavior.


The threads of my program appear to block forever (hang) in a system call.

If you are running some threads while leaving other threads in a stopped state, the running threads may be blocked trying to synchronize with the stopped threads. For example, a thread must acquire a lock in order to write to stdout. I/O routines, pthread initialization, and OpenMP barriers are all examples of places where such issues may arise.

In many cases, the best solution is to set a breakpoint in the path of all threads at some source location and continue all threads to that breakpoint. The 'sync' command can be used for this, or you can enable the PGDBG OpenMP event handler.


What do you mean by PGDBG OpenMP event handling?

The OpenMP run-time support performs internal synchronization when a program enters or leaves a parallel region, and may perform synchronization within a parallel region. Because of this, PGDBG users may encounter unexpected behavior in which it appears that the program under debug blocks forever, or "hangs".

PGDBG is capable of coordinating threads into and out of OpenMP parallel regions and over OpenMP barriers and synchronization points, in order to preserve line level debugging. This facility is disabled by default, since it may slow program performance significantly. OpenMP event handling can be enabled using the 'pgienv' command.

% pgdbg> pgienv omp off

NOTE: The OpenMP event handler should only be enabled/disabled while the program is stopped in a serial region.

Unless you enable OpenMP event handling you must explicitly advance all threads

otherwise, some threads may busy wait in OpenMP library calls waiting for other threads to run.

See the online PGDBG User's Guide for a description of PGDBG's OpenMP support, and an introduction to important 'pgienv' environment variables that you can use to configure the behavior of PGDBG (e.g. 'threadstop', 'threadwait', 'omp',...).

Profiling with PGPROF: (5.2 and later)

Why does pgprof show execution times that are 10X the actual run time?

When using sample-based profiling (e.g., gmon.out), the number of samples per second is defined by sysconf(_SC_CLK_TCK). However, on some systems the value returned is not the same as the actual sampling rate. It will usually be off by a factor of 10. You can determine if your system has this problem by using "time a.out" and comparing it to the total time shown in the profile.

Profiling with PGPROF: (5.1 and earlier)

In pgprof, I get the message 'Cannot convert string "-dt-interface user-medium-r-normal-m'.

This problem has not been solved in PGPROF for releases prior to 5.2. There is a workaround.

  1. Save the current xwindow resources

    xrdb -query >current.xrdb.resources
  2. Copy current resources for editing

    cp current.xrdb.resources  pgprof.xrdb.resources
  3. Remove all lines in pgprof.xrdb.resources with references to '-dt-interface'
    Replace the resources

    xrdb -load pgprof.xrdb.resources
  4. Now try pgprof.

When you need to return to normal resources,

xrdb -load current.xrdb.resources


PGPROF: "Warning: File pgprof123.out not found; data may be incomplete"

If you are using PGPROF 4.1-1 or earlier, and you are profiling more than two nodes, you will have to edit your pgprof.out file as in the following example.

For example, when profiling a 4 node program, 4 pgprof output files will be generated:

pgprof.out
pgprof1.out
pgprof2.out
pgprof3.out

At the end of the pgprof.out file edit the lines starting with "i" to point to the correct pgprof#.out files.

Change from:

i pgprof1.out
i pgprof12.out
i pgprof123.out

to:

i pgprof1.out
i pgprof2.out
i pgprof3.out

Click me