In unified binaries (programs compiled with '-tp=x64'), subprogram names may be modified in order to distinguish the AMD64 version of the subprogram from the EM64T version. For example, given subprogram 'sub', the PGI compilers may generate a subprogram named sub.pgi.uni.1 (the AMD64 version) and sub.pgi.uni.2 (the EM64T version). In this case, debuggers will probably not recognize subprogram 'sub', and the processor-specific name will have to be used instead. The following bullet items describe how to work around some of the issues caused by this renaming.
Setting breakpoints in Fortran and C code work similarly for the general case. Given the following Fortran example compiled with -g -tp x64:
program prog call func() end subroutine func() end
Breakpoints can be set on the alternate function versions generated for each function in the program. To set breakpoints for 'func' in the above example, use
pgdbg> b "func.pgi.uni.1" breakpoint set at: func.pgi.uni.1_ line: "prog.f"@6 address: 0x401252 pgdbg> b "func.pgi.uni.2" breakpoint set at: func.pgi.uni.2_ line: "prog.f"@6 address: 0x401264
A breakpoint can also be set for 'func' itself. The breakpoint is set at the first assembly instruction in the function. Run (or continue) to the breakpoint, then 'next' into the function's alternate version:
pgdbg> b func WARNING: Function func_ is not compiled using -g breakpoint set at: func_ address: 0x401220 pgdbg> run Breakpoint at 0x401220, function func_ 401220: 83 3D 79 47 10 0 1 cmpl $0x1,0x104779(%rip) pgdbg> n Breakpoint at 0x401220, function func_ 401220: 83 3D 79 47 10 0 1 cmpl $0x1,0x104779(%rip) pgdbg> n Stopped at 0x401252, function func.pgi.uni.1_, file prog.f, line 6 #6: end
Note that "breaki func" or "break {addr func}" can also be used to set breakpoints at the beginning of a function.
Also note that stepping into a function in a unified binary does not work reliably. It is best to set a breakpoint on the function and 'continue', or to 'stepi' at the function call.
To set breakpoints on module contained subprograms or internal procedures, the full unified binary names of the subprograms must be used. For the following code snippet
module mm contains subroutine sub() call inner_sub() contains subroutine inner_sub() end subroutine end subroutine end module
these are valid breakpoints:
pgdbg> b "sub.pgi.uni.1" pgdbg> b "sub.pgi.uni.2" pgdbg> b "inner_sub.pgi.uni.1" pgdbg> b "inner_sub.pgi.uni.2"
Breaking on "sub" or "inner_sub" is not allowed.
There are known problems debugging Fortran subprograms containing multiple entry points. In code compiled -g, locals may not be recognized as being in scope. In code compiled -g -tp x64, both locals and dummy arguments may not be recognized as being in scope.
There are two main routines produced for C/C++ programs, main.pgi.uni.1 and main.pgi.uni.2. A dwarf2 entry is not produced for main, so pgdbg emits this warning when debugging a unified C/C++ binary:
NOTE: Can't find main function compiled -g
The program can still be debugged, however.
To set a breakpoint on a class-contained subprogram, the subprogram's mangled name must be used. To set a breakpoint in the subprogram "foo" in the following class:
class myclass {
private:
int a;
public:
int foo(int b) { a = b; return 0; }
};
use myclass::foo's mangled name plus unified binary extension:
pgdbg> b "foo__7myclassFi.pgi.uni.1" pgdbg> b "foo__7myclassFi.pgi.uni.2"
Yes. The -Mconcur, -Mvect, -fast, and -fastsse options all result in optimization levels greater than -O0. At best, debug information for optimized code will only contain line numbers for 'basic blocks'. A basic block can consist of more than one statement. Note that debug information for lexical blocks is not produced with -Mvect and -Mconcur. It doesn't matter if vectorization or parallelization wasn't performed; it's just the presence of the option which disables the generation of lexical block information.
All output from the child processes of the GUI is redirected to the Program I/O window. The 'shell' and 'edit' commands, for example, will send their output to the Program I/O window. Some PGDBG error messages may also appear here (e.g. failed reads/writes to program memory).
In PGDBG 5.2, the call command does not support Fortran subroutines with pointer or deferred-shape array arguments or return values. Prior to release 5.2, PGDBG did not support any Fortran array arguments or return values.
Suppose you have a source file, foo.f90, containing 'MODULE XX'. Compilation of foo.f90 results in two files: 'XX.mod' and 'foo.o'. Suppose another F90 file refers to the module by using the statement 'USE XX'. In some cases, that file can be compiled and linked without linking to foo.o.
However, when compiling foo.f90 for debug with the '-g' option, the debug information generated for 'MODULE XX' is located in the object file foo.o. If an executable refers to 'MODULE XX' with the statement 'USE XX', and is not linked to foo.o, the debug information for the module will be unavailable.
If you want to access module information from PGDBG, make sure that the executable is linked with the object files ('.o' files) generated from the source files containing your modules and that the files were compiled with '-g'.
If you recompile your program, you must reload the program into PGDBG using the 'debug' command, and (in release 5.2) update the source panel by entering control-L (^L) or selecting the menu item Options->Refresh.
PGDBG can be invoked with a startup file (called 'startup' in the example below) as follows:
pgdbg -s startup a.out
If you are using PGDBG to debug a program running on an X86 processor, set the PGDBG_SSE environment variable to "on" to make the XMM registers visible inside the debugger.
% setenv PGDBG_SSE on PGDBG_SSE=on ; export PGDBG_SSE
If you are using PGDBG to debug a program running on an AMD64 processor, the XMM registers are visible by default.
If you are using PGDBG 4.1-2, the XMM registers are available, but only if the program is linked -Bstatic.
Performance when executing under control of PGDBG can be affected in two scenarios:
You are debugging a program compiled with -mp or -Mconcur, and the PGDBG OpenMP event handler is enabled (it is disabled by default in release 5.2 and later). Slow performance in this scenario is expected behavior.
When the OpenMP event handler is enabled, PGDBG adds breakpoints at certain locations in your program to detect various OpenMP events (such as the start and end of a parallel region). PGDBG's OpenMP event handler is most useful when you are debugging a parallel region, or passing in and out of parallel regions with the line-level debugging commands such as 'next' and 'step'.
If you are continuing over many parallel regions, the program will run much faster under PGDBG if the OpenMP event handler is disabled. Use the pgienv command to disable the OpenMP event handler:
pgdbg> pgienv omp off
NOTE: The OpenMP event handler should only be enabled/disabled while the program is stopped in a serial region.
On Windows the stepi/nexti PGDBG commands time out when executed on a blocked system call. When attempting to step over the block system call, the program counter does not increment. Notice in the following example that the second stepi results in the program remaining stopped at the same address as the first stepi call.
pgdbg [all] 0> stepi [0] Stopped at 0x78EF181A, function ZwWaitForMultipleObjects 0x78EF181A: C3 ret pgdbg [all] 0> stepi [0] Stopped at 0x78EF181A, function ZwWaitForMultipleObjects 0x78EF181A: C3 ret
To simulate the effect of single stepping the program, set a breakpoint at the return address and then continue the program, per the following typescript:
pgdbg [all] 0> p ((void **)$rsp)[0] (void *) 0x78D6CF8B pgdbg [all] 0> breaki 0x78D6CF8B [all] (1)breakpoint set at: ReleaseSemaphore address: 0x78D6CF8B 1 pgdbg [all] 0> cont [0] Breakpoint at 0x78D6CF8B, function ReleaseSemaphore 0x78D6CF8B: 8B F8 movl %eax,%edi pgdbg [all] 0>
PGDBG must be invoked via MPIRUN to debug a parallel MPI program. The MPI program must be linked with the MPICH library and contain the appropriate calls to MPI routines (e.g. MPI_Init).
% pgcc -g -o program prog.c -lmpich % mpirun -np 4 -dbg=pgdbg program
Prior to release 5.2, PGDBG was not able to attach to a running MPI program. /
PGDBG makes several assumptions about MPI applications:
IMPORTANT: PGDBG will work automatically with the MPICH library that is shipped with the CDK. No modifications to the CDK version of MPIRUN are needed.
Follow the instructions below only if you want to use PGDBG with a distribution of MPICH other than the one provided in the CDK.
If your version of MPIRUN supports the '-dbg' option
To support the switches listed, edit mpirun.ch_p4 script file as shown by the "diffs" below. Add the changes in the second file (denoted by the '>' prefix) to your mpirun.ch_p4 script file:
% diff mpirun.ch_p4 mpirun.ch_p4.modified 204c204 < . $MPIRUN_HOME/mpirun_dbg.$debugger -progname $prognamemain -p4pg $p4pgfile_master -p4wd $p4workdir -cmdlineargs "$cmdLineArgs" --- > $MPIRUN_HOME/mpirun_dbg.$debugger -progname $prognamemain -p4pg $p4pgfile_master -p4wd $p4workdir -cmdlineargs "$cmdLineArgs"
If your version of MPIRUN supports the '-debug' option (not '-dbg')
Edit your mpirun.ch_p4 script file to use PGDBG. Below is an example of how the MPIRUN scripts from MPICH 1.1.2 were modified to work with PGDBG. Make the changes in the second file (denoted by the '>' prefix) in your mpirun.ch_p4 script file:
% diff mpirun.ch_p4 mpirun.ch_p4.modified 192c192 < echo "ignore USR1" >> $dbgfile --- > echo "ignore 10" >> $dbgfile 198c198 < echo "stop in MPI_Init" >> $dbgfile --- > echo "break main" >> $dbgfile %
Add a "-pgdbg" option to your mpirun.args script file. Modify mpirun.args to include the changes in the second file.
% diff mpirun.args mpirun.args.modified 112a113 > -pgdbg Start the first process under pgdbg where possible 405a407,410 > ;; > -pgdbg) > debugger="pgdbg" > commandfile="-s %f" %
In the absence of the MPIRUN '-dbg' option, PGDBG is invoked using the -pgdbg option as shown.
% mpirun -np 4 -pgdbg program <args ...>
The MPIRUN scripts will need further modification to support the following options for use with the -pgdbg option.
Instructions for making such modifications are not included here.
When PGDBG is invoked using MPIRUN without '-nolocal', the target application processes are spawned on the local host as well as on remote hosts. The debugger is launched on the local host, and a lightweight debug server (named 'pgserv') is launched on every host where a target process is running (both local and remote).
When PGDBG is invoked using MPIRUN with '-nolocal', the target application processes are spawned only on remote hosts, as determined by the 'machines.LINUX' file. The debugger is launched on the local host, and pgserv is launched only on the hosts where target processes are running (i.e. only on remote hosts). This allows the debugger to be isolated from the application under debug, in order to (for example) prevent resource contention between the debugger and the application.
PGDBG options (-s,-text,-dbx,...) are not available when PGDBG is invoked via MPIRUN. The default init file ${HOME}/.pgdbgrc (or ./.pgdbgrc if it exists), in conjunction with the 'script' command, can be used in most cases to provide the same functionality. The script file '${PGI}/linux86/bin/mpirun_dbg.pgdbg' could also be edited to add any of these arguments.
By default, output from non-initial MPI processes is block buffered, so output may not be flushed to stdout until some time after it is written. This is a characteristic of MPICH. See fflush(3) for a workaround.
To fix this add the following line (the one just after the comment line) to your mpirun_dbg.pgdbg file:
... #undo quoting done in mpirun.args cmdLineArgs="`eval echo $cmdLineArgs`" cmdPgdbgRun="run $cmdLineArgs -p4pg $p4pgfile -p4wd $p4workdir -mpichtv $cmdPgdb gRun" echo "$cmdPgdbgRun" >> $dbgfile ...
PGDBG uses a remote debug server to debug each node of a distributed MPI program. PGDBG communicates with these servers using 'rsh' by default. To force PGDBG to use 'ssh' set the following environment variable on the host system:
% setenv PGRSH 'ssh' $ PGRSH=ssh ; export PGRSH
To avoid having to type your ssh key for each remote process and debug server, an 'ssh-agent' can be set up. See your ssh documentation.
Note that PGDBG will use 'rsh' or 'ssh' only when debugging a remote process.
There is a system-imposed limit on the number of sockets that can be created on any specific instance of the Linux operating system. PGDBG uses sockets for inter-process communication between the debugger and lightweight debug servers that run on each node of the cluster. If the system runs out of socket resources, the message below will be printed.
poll: protocol failure in circuit setup - accept: Bad file descriptor ERROR: unable to attach to (PID 2763, HOST <hostname>) [New Process (PID 26200, HOST <hostname>) IGNORED]
PGDBG will then ignore any application processes that are not yet controlled by the debugger, and those processes will not return from MPI_Init.
Accessing the values of Fortran private variables using PGDBG is not currently supported.
NOTE: The above statement applies only to debug information emitted by PGI compilers. All code generated by PGI compilers is fully compliant with the OpenMP standard. Only the ability to inspect and modify the values of private variables using PGDBG (and any other debugger) is restricted.
Private variables in C must be declared in the enclosing lexical block of the parallel region in order for them to be visible using PGDBG.
Currently, only the following form of private declaration is supported by PGDBG:
#pragma omp parallel
{
int i;
....
}
In the above case, i would be visible inside PGDBG for each thread. However, in the following example, i is not visible inside PGDBG.
int i;
#pragma omp parallel private(i)
{
....
}
NOTE: The above statement applies only to debug information emitted by PGI compilers. All code generated by PGI compilers is fully compliant with the OpenMP standard. Only the ability to inspect and modify the values of private variables using PGDBG (and any other debugger) is restricted.
Calling omp_set_num_threads(n) from the target program may cause all non-initial threads to exit, followed by the creation of the 'n-1' new threads requested by the call to omp_set_num_threads(n). This is expected behavior.
Calling omp_set_num_threads(n) from hybrid OpenMP/MPI applications may result in the use of the SIGHUP signal by MPI_Finalize. This is expected behavior.
If you are running some threads while leaving other threads in a stopped state, the running threads may be blocked trying to synchronize with the stopped threads. For example, a thread must acquire a lock in order to write to stdout. I/O routines, pthread initialization, and OpenMP barriers are all examples of places where such issues may arise.
In many cases, the best solution is to set a breakpoint in the path of all threads at some source location and continue all threads to that breakpoint. The 'sync' command can be used for this, or you can enable the PGDBG OpenMP event handler.
The OpenMP run-time support performs internal synchronization when a program enters or leaves a parallel region, and may perform synchronization within a parallel region. Because of this, PGDBG users may encounter unexpected behavior in which it appears that the program under debug blocks forever, or "hangs".
PGDBG is capable of coordinating threads into and out of OpenMP parallel regions and over OpenMP barriers and synchronization points, in order to preserve line level debugging. This facility is disabled by default, since it may slow program performance significantly. OpenMP event handling can be enabled using the 'pgienv' command.
% pgdbg> pgienv omp off
NOTE: The OpenMP event handler should only be enabled/disabled while the program is stopped in a serial region.
Unless you enable OpenMP event handling you must explicitly advance all threads
See the online PGDBG User's Guide for a description of PGDBG's OpenMP support, and an introduction to important 'pgienv' environment variables that you can use to configure the behavior of PGDBG (e.g. 'threadstop', 'threadwait', 'omp',...).
When using sample-based profiling (e.g., gmon.out), the number of samples per second is defined by sysconf(_SC_CLK_TCK). However, on some systems the value returned is not the same as the actual sampling rate. It will usually be off by a factor of 10. You can determine if your system has this problem by using "time a.out" and comparing it to the total time shown in the profile.
This problem has not been solved in PGPROF for releases prior to 5.2. There is a workaround.
Save the current xwindow resources
xrdb -query >current.xrdb.resources
Copy current resources for editing
cp current.xrdb.resources pgprof.xrdb.resources
Remove all lines in pgprof.xrdb.resources with references to '-dt-interface'
Replace the resources
xrdb -load pgprof.xrdb.resources
Now try pgprof.
When you need to return to normal resources,
xrdb -load current.xrdb.resources
If you are using PGPROF 4.1-1 or earlier, and you are profiling more than two nodes, you will have to edit your pgprof.out file as in the following example.
For example, when profiling a 4 node program, 4 pgprof output files will be generated:
pgprof.out pgprof1.out pgprof2.out pgprof3.out
At the end of the pgprof.out file edit the lines starting with "i" to point to the correct pgprof#.out files.
Change from:
i pgprof1.out i pgprof12.out i pgprof123.out
to:
i pgprof1.out i pgprof2.out i pgprof3.out