<< >> Title Contents Index Home Help

3 PGHPF Runtime Options


This chapter describes features of the PGHPF runtime library that the programmer controls using runtime options or by setting environment variables. The PGHPF runtime library uses transport independent calls and is implemented using different library versions for different targets. The transport independent calls interact with a transport dependent interface supporting various communications protocols. The transport dependent communication protocols currently supported include:

There are different versions of the PGHPF runtime libraries for each supported transport mechanism. Depending on your hardware and the product or products you purchased from PGI, your PGHPF compilation tools may include libraries for one or more transport mechanisms. From the HPF programmer's point of view, the differences between versions of the PGHPF runtime library have little effect on program development. The differences are:

If no communication/runtime option is specified on the compile and link lines, the defaults listed in Table 3-1 are used:

Table 3-1 Default Runtime Options

Platform

Default

Other Supported Options

Compaq AlphaCluster


-Mmpi


-Mrpm


CRAY T3E


-Msmp


Custom/shmem


CRAY PVP


-Mrpm


none


IA-32/Linux


-Mrpm


-Msmp, -Mmpi


IA-32/NT


-Mrpm


-Msmp


IA-32/Solaris


-Mrpm


-Msmp, -Mmpi


IBM RS6000/SP


-Mmpi


-Mrpm


Intel ASCI Red


-Mmpi


none


MIPS/IRIX


-Mrpm


-Msmp, -Mmpi


NEC SX4/SX5


-Mmpi


None


PA-RISC/HP-UX


-Mmpi


-Mrpm, -Msmp


SPARC/Solaris


-Mrpm


-Msmp, -Mmpi


3.1 Runtime Options and Environment Variables

This section describes the runtime options and environment variables for executable HPF programs linked with the PGHPF runtime libraries. All runtime options may be specified as runtime options or environment variables. The environment variable corresponding to a PGHPF runtime option -xxx is PGHPF_XXX. Runtime command-line options override values specified by environment variables.

The following sections describe the options for each PGHPF runtime library. While many options are similar depending on the library, the options also sometimes have slightly different meanings depending on the runtime library.

3.1.1 Using Runtime Options and Environment Variables

In order to support program-specific command-line input arguments, PGHPF runtime options must be prefaced with -pghpf. All options prior to -pghpf are treated as input arguments to the HPF program, and all options subsequent to -pghpf are treated as PGHPF runtime options. An example using command line options for a program compiled and linked using -Mrpm:

% a.out <program_args> -pghpf -np 8 -stat all 

An example using environment variables:

% setenv PGHPF_NP 8
% setenv PGHPF_STAT all
% a.out <program_args>

Options may also be specified using command line format in the PGHPF_OPTS environment variable:

% setenv PGHPF_OPTS "-np 8 -stat all"

Command line options override environment variables and individual environment variables override the PGHPF_OPTS environment variable.

3.1.2 Generic PGHPF Runtime Options

The following PGHPF runtime options are valid on all platforms regardless of the runtime library used.

Table 3-2 Generic PGHPF Runtime Options and Variables

Option


Environment Variable


Purpose


-no_stop_message


PGHPF_NO_STOP_MESSAGE


Disables FORTRAN STOP message display.


-np <num>


PGHPF_NP num


Specify the number of HPF processors or processes to use.


-prof <arg>


PGHPF_PROF <arg>


Specify type of data written to the pgprof.out file.


-sigmsg <arg>


PGHPF_SIGMSG <arg>


Used to manipulate signal handling.


-stat <arg>


PGHPF_STAT <arg>


Display runtime statistics after the program has completed execution.


-system_clock_rate <n>


PGHPF_SYSTEM_CLOCK_RATE <n>


Sets the value of COUNT_RATE for the F90 subroutine SYSTEM_CLOCK.


-trace [n]


PGHPF_TRACE [n]


For programs compiled and linked with -Mprof, prints a trace of the user sub-programs called.


-V


PGHPF_V


Print version information about the runtime library linked into the executable.


-version


PGHPF_VERSION


A synonym for -V.


-zmem { yes | no }


PGHPF_ZMEM { yes | no }


Specify whether dynamically allocated memory initialized to zeroes.


Disable printing of STOP message using -no_stop_messageDisables the default FORTRAN STOP message display when a STOP statement with no string is executed.

Specify number of HPF processors using -np <num>Specifies the number of HPF processors or processes to use; this option is not valid for RPM1 and is unnecessary for programs using MPI communications and invoked using mpirun or CRAY T3E programs invoked using mpprun.

The runtime option -np num and the environment variable PGHPF_NP num specify the number of processors to use in an HPF executable. When neither of these options is specified, the default value for the number of processors is set to one (1). The value supplied for num must be a positive integer. For example, to compile and run the HPF program test1.hpf using four processors using RPM, issue the following commands:

$ pghpf -o test1 test1.hpf
$ test1 -pghpf -np 4

If your program uses the PROCESSORS directive and references the intrinsic NUMBER_OF_PROCESSORS, then the value supplied to the -np runtime option (or the PGHPF_NP environment variable) will be the number of processors for the current HPF execution. For example:

!HPF$ PROCESSORS NUM_PROC(NUMBER_OF_PROCESSORS())

If your program does not use an intrinsic to calculate the number of processors available, but uses an explicit PROCESSORS directive with a value, then the -np value should be at least as great as the value used in the program's PROCESSORS directive. For example if the value used in the PROCESSORS directive is four:

!HPF$	PROCESSORS NUM_PROC(4)

then the -pghpf -np argument must be greater than or equal to four. If you supply a value less than four, your program will abort with an error message. If your program does not use a PROCESSORS directive, the program runs on the number of processors specified in the -np runtime option.

Specify data written to pgprof.out file using -prof { all | average | none }When a program is compiled and linked with the -Mprof=func or -Mprof=lines options, this tells the runtime library to write the trace file pgprof.out with data for all processors, with data averaged over all processors, or not to write the file at all. The default is to write data for all processors, which can result in very large trace files for large programs executed on large numbers of processors.

Control signal handling using -sigmsg { yes | all | n[,n]... }By default, runtime signals that abmnormally halt execution (such as a segmentation fault) are processed by the signal handler in the runtime system, which prints an appropriate message. With a yes or all argument, this switch enables the signal handler for all known signals, including some that do not cause process termination. With a list of integer values, the signal handler is enabled for those signals.

Display statistics using -stat { mem | mems | cpu | cpus | msg | msgs | all | alls }Display runtime statistics after the program has completed execution. The cpus, mems, msgs, and alls arguments display information for each processor, while cpu, mem, msg, and all display only summary information. The arguments are:

mem | mems
Display memory usage
cpu | cpus
Display CPU execution time
msg | msgs
Display message statistics if the program was linked with -Mstats.
all | alls
Display all of the above information

On most systems, the -stat runtime option or the PGHPF_STAT environment variable cause runtime statistics to be displayed when program execution is complete.

The "s" versions provide information for all processors running the program on a per-processor basis. Options without the "s" provide summary information aggregated across all processors. The PGHPF_STAT environment variable allows any of the following arguments corresponding to those available for the -stat runtime option.

PGHPF_STAT {cpu|mem|msg|all|cpus|mems|msgs|alls}

In order to see message statistics you must specify the -Mstats option on the pghpf command line during linking (refer to Chapter 2, PGHPF Compiler Options, for details). Enabling the message statistics collection with this option may slightly reduce performance on some systems, as it links in an instrumented version of the PGHPF runtime libraries to gather message statistics.

Viewing execution time statistics using -stat { cpu | cpus }

Assume you are running a PGHPF program which uses the SMP or RPM runtime libraries, and wish to display the CPU-related execution statistics. If the executable is named test2, use the command:

% test2 -pghpf -np 8 -stat cpus
cpu        real      user       sys     ratio   node
0* 5.22 4.91 0.08 96% 0
1 5.11 4.87 0.07 97% 1
2 5.11 4.86 0.04 96% 2
3 5.12 4.87 0.02 96% 3
4 5.10 4.86 0.00 95% 4
5 5.01 4.87 0.01 97% 5
6 5.09 4.86 0.02 96% 6
7 5.09 4.86 0.02 96% 7
min 5.01 4.86 0.00
avg 5.11 4.87 0.03
max 5.22 4.91 0.08
total 5.22 38.96 0.26 7.52x

A program that uses the MPI runtime libraries is executed similarly, but using mpirun or whatever command is appropriate for your system (see section 3.1.3 below).

The first eight lines of the output above show information for each processor. The first column shows the logical processor number. The asterisk (*) in the first column indicates that logical processor 0 is printing the information. The second column shows the real or elapsed time in seconds. The third column shows the user CPU time in seconds. The fourth column shows the system CPU time in seconds. The fifth column shows the percentage of the elapsed time that the CPU was actively executing the HPF program. This is calculated using the formula:


(user time + system time) / real time

The last column shows the processors' hostnames or system-dependent processor identification number; on some systems this is the physical processor number or node, on others it is the socket or process ID.

The next to last three lines show the minimum, average and maximum times for the real, user, and system times reported. The last line shows the maximum elapsed time, the total user and system times, and the speedup factor. The speedup factor is calculated using the formula:


(user time + system time) / real time.

On most systems the -stat cpu runtime option or the PGHPF_STAT environment variable set to cpu displays only the minimum, average, maximum, and total time for the CPU statistics. For example:

% test2 -pghpf -np 8 -stat cpu
cpu real user sys ratio node
min 5.01 4.86 0.00
avg 5.11 4.87 0.03
max 5.22 4.91 0.08
total 5.22 38.96 0.26 7.52x

If only one processor executes the program, the minimum, average, and maximum times are not shown. All times are acquired from the host operating systems using calls such as gettimeofday(), getrusage(), and times(). Note that the output is not always accurate, for example, the user and system time for a single processor may occasionally exceed the real time.

Viewing memory statistics using -stat { mem | mems }

Assume you are running an RPM or SMP program and wish to display the memory usage execution statistics. If the executable is named test2, use the command:

% test2 -pghpf -np 2 -stat mems

A program that uses the MPI runtime libraries is executed similarly, but using mpirun or whatever command is appropriate for your system (see section 3.1.3 below).

The mems option displays output showing the columns: heap used, page faults, signals received, voluntary switches, involuntary switches and res size as shown in the following screen dump:

memory local global res size pag flts pag flts voluntary involunt
heap heap (pages) minor major switches switches
0* 57KB 1KB 0 229 210 0 0
1 61KB 1KB 0 209 180 0 0
total 117KB 1KB 0 438 390 0 0

The second column, local heap, shows the local heap used by each node. The third column, global heap, shows the global heap used by each node - this area is shared among the nodes. The fourth column, res size (pages), shows the average resident set size of your program in memory - this can sometimes be inaccurate depending on the implementation of getrusage() on your system. The fifth column, pag flts minor, shows the total number of page faults that did not require I/O (i.e. the page was in the free list and had not yet migrated to disk). The sixth column, pag flts major, shows the total number of page faults that did require I/O to disk. The seventh column, voluntary switches, shows the total number of times your job was swapped out while actively executing. The eighth column, involunt switches, shows the total number of times your job was swapped out while stalled on an I/O operation.

The fourth through eighth columns show information returned by getrusage(). Some systems do not fully support getrusage(), these systems will display zero for unsupported fields. Refer to your system's man pages for more information. One line is displayed for each node present. The last line is the same as the line displayed by the option -stat mem. On most systems the -stat mem runtime option or the PGHPF_STAT environment variable set to mem displays only the totals for the memory statistics.

Viewing message statistics using -stat { msg | msgs }

To run a program, for example test2 and see the message-related execution statistics, use the command:

% test2 -pghpf -np 2 -stat msgs

A program that uses the MPI runtime libraries is executed similarly, but using mpirun or whatever command is appropriate for your system (see section 3.1.3 below).

The message statistics will be zero, unless the -Mstats option is specified on the PGHPF compiler command line (refer to Chapter 2, PGHPF Compiler Options, for details). Enabling the message statistics collection with this option may slightly reduce performance on some systems. The -stat msgs option will display the following:

messages send send send recv recv recv copy copy copy
cnt total avg cnt total avg cnt total avg
0* 1001 782KB 799 B 1000 782KB 800 B 1000 782KB 800 B
1 1000 782KB 800 B 1001 782KB 799 B 1000 782KB 800 B
total 2001 3MB 800 B 2001 3MB 800 B 2000 3MB 800 B

One line is displayed for each node. The second column, send cnt, contains the total number of messages sent by each node. The third column, send total, contains the total number of bytes sent by each node. The fourth column, send avg, contains the average message size. The fifth through seventh columns contain the same information for receives, and the eighth through tenth columns contain the same information for local memory-to-memory copies on each node. Memory-to-memory copies are sometimes performed to move data into a temporary array during a given loop or computation.

Viewing all statistics using -stat { all | alls }

The -stat all or -stat alls options are equivalent to specifying the cpu, mem, msg or cpus, mems, msgs options respectively. Unless the -Mstats option is specified on the pghpf command line, the message statistics will be zero (refer to the preceding section for details).

Set SYSTEM_CLOCK resolution using -system_clock_rate <count_rate>

Used to set the value of COUNT_RATE for the F90 subroutine SYSTEM_CLOCK. The default is 1000 for machines with 4-byte integers, and 1,000,000 for machines with 8-byte integers, except on the CRAY T3E where the value cannot be changed and is defined by the system.

Display statistics using -trace [ n ]

If the program is compiled and linked with -Mprof=func or -Mprof=lines, this option will print a trace of the user subprograms called. With the optional argument n, the trace will only be printed for logical processor n; the default is to print the trace for each processor.

Display runtime version information using -V

This option causes version information about the runtime library linked into the executable program to be printed to stderr upon execution.

Zero out dynamically allocated memory using -zmem { yes | no }

With a yes argument, causes the runtime to initialize any dynamically allocated memory to zero; dynamically allocated memory is used for ALLOCATE statements and for distributed arrays. The default is to not initialize dynamically allocated memory.

3.1.3 Running PGHPF -Mmpi programs

MPI is the Message Passing Interface de facto standard, a library-based standard for message-passing programming on shared- and distributed-memory computers. Many hardware system vendors have highly optimized implementations of MPI available on their systems. In these cases, PGHPF generally links against these vendor-supplied libraries by default when the -Mmpi compile/link option is specified. PGHPF also works with MPICH, a public domain implementation of MPI from Argonne National Laboratory and Mississippi State University that is available on most platforms. See the PGHPF installation and release notes for more information on the version of MPI targeted by the PGHPF runtime libraries on your system.

PGHPF-compiled programs built using the -Mmpi compile/link option can be treated as "normal" MPI executables. They should be invoked using the mechanism provided by your MPI implementation. It is assumed here that an mpirun command is used, as is the case with MPICH and several of the vendor-supplied versions of MPI. The command line format for executing a PGHPF-compiled MPI program is:


% mpirun <mpi_opts> a.out <user_opts> -pghpf <hpf_opts>

where:

mpirun
is the command used to execute MPI programs
<mpi_opts>
are the options to the mpirun command
a.out
is the executable program
<user_opts>
are the program's options
<hpf_opts>
are any of the valid options to the PGHPF runtime library

Table 3-3 shows the valid PGHPF -Mmpi runtime library options (-pghpf options).

Table 3-3 MPI-specific Runtime Library Options and Variables

Option


Environment Variable


Purpose


-maxxfer size


PGHPF_MAXXFER size


Specify the maximum message size; larger messages are broken up.


-minxfer size


PGHPF_MINXFER size


Specify the minimum unbuffered message size.


-unsafe {yes | no}


PGHPF_UNSAFE yes | no


Specify that MPI asynchronous communications should be used.


Specify maximum message size using -maxxfer <num>The -maxxfer <num> MPI runtime option specifies the maximum message size to be sent in a given MPI_SEND or MPI_RECV call (these calls occur within the PGHPF -Mmpi runtime libraries and are not user-visible). Messages of size greater than or equal to num bytes are broken up and sent as multiple smaller messages. On some switched distributed-memory systems, this option can enable more efficient use of aggregate communications bandwidth by maximizing the amount of data in the communications network at a given time.

Specify minimum unbuffered message size using -minxfer <num>The -minxfer <num> RPM runtime option specifies the minimum message size that will not be buffered (in bytes). Messages of size greater than or equal to num bytes are sent as individual messages. Consecutive messages of size less than num bytes that are being sent to the same destination are buffered. The default value is 2048 bytes. Experimenting with this value may improve performance on some systems.

Enable asynchronous MPI communications using -unsafe { yes | no }By default synchronous MPI sends/receives are used to perform inter-process communications in PGHPF-compiled -Mmpi programs. Specifying -unsafe yes enables the use of asynchronous MPI sends/receives. This option can improve performance, but is unreliable on many systems due to the large number of messages that must be buffered in many PGHPF-compiled programs. Most programs run faster with this option, some may run slower, and some may hang.

3.1.4 Running PGHPF -Msmp programs

PGHPF programs compiled and linked using the -Msmp option use shared-memory runtime system and a global heap to hold all HPF-distributed data. In this case, a global heap is always used regardless of whether -heapz is specified. The -heapz runtime option can be used to alter the default heap size, which is 8 Mbytes. PGHPF programs that use SMP communications do not use library calls to implement processor-to-processor communication. Rather, each processor fetches/stores remote data by reference from/to the shared global heap as needed without interrupting the processor that "owns" the data. The address calculations required to implement these communications are somewhat more expensive that normal linear Fortran address calculations, but in general communication overhead is much lower in SMP programs than in corresponding versions of the program that use library-based two-sided messaging (MPI or RPM).

The command line format for executing a PGHPF program compiled/linked using -Msmp on all platforms other than the CRAY T3E is:


% a.out <user_opts> -pghpf <hpf_opts>

where:

a.out
is the name of the executable program
<user_opts>
are the program's options
<hpf_opts>
are any of the valid options to the PGHPF runtime library

The command line format for executing a PGHPF program compiled/linked using -Msmp on the CRAY T3E is:


% mpprun <mpprun_opts> a.out <user_opts> -pghpf <hpf_opts>

where:

mpprun
is the command used to execute CRAY T3E programs
<mpprun_opts>
are the valid options to the mpprun command
a.out
is the name of the executable program
<user_opts>
are the program's options
<hpf_opts>
are any of the valid options to the PGHPF runtime library

Table 3-4 lists the runtime options available for SMP programs.

Table 3-4 SMP-specific Runtime Library Options and Variables

Option


Environment Variable


Purpose


-debugger path


PGHPF_DEBUGGER path


(Not valid on CRAY T3E) Specify the default debugger.


-g [ n | all ]


PGHPF_G [n | all]


(Not valid on CRAY T3E) Enable parallel debugging.


-heapz n


PGHPF_HEAPZ n


(Not valid on CRAY T3E) Specify the shared global heap size (default is 4 Mbytes).


-heapinit <value>


PGHPF_HEAPINIT <value>


(Not valid on CRAY T3E) Specify the value to which shared global heap elements are initialized at startup.


-heapz <size>


PGHPF_HEAPZ <size>


(Not valid on CRAY T3E) Specify the shared global heap size.


-noheapinit <value>


PGHPF_NOHEAPINIT <value>


(Not valid on CRAY T3E) Specify the value to which shared global heap elements are initialized at startup.


Debug node-local Fortran intermediate files using -g and -debuggerThe -g SMP runtime option or PGHPF_G environment variable causes a node-local Fortran debugger to be invoked upon program startup. This option can be used to debug PGHPF-generated intermediate Fortran source code. The debugger is specified using the -debugger runtime option or the PGHPF_DEBUGGER environment variable to the pathname of the debugger. For example,

% setenv PGHPF_DEBUGGER /usr/ucb/dbx

If more than one debugger is present, entering commands is difficult. A better method for debugging is to create a shell script to create a window for each debugger. For example, create a script named smpdbg (any name will do) containing the following.

exec xterm -e /usr/ucb/dbx $1

And then set PGHPF_DEBUGGER to smpdbg. All PGHPF runtime options must be specified on the command line that invokes the program. The user-supplied runtime arguments must be specified with each debugger's run command. The file smpdbg must be accessible in the current PATH.

The n and all optional arguments to the -g option and PGHPF_G variable specify debugging parameters. The positive integer n specifies a logical process number (this should be between 0 and the number of processors). The all option specifies debug all processes.

Enlarge the default SMP heap size using -heapz <size>The SMP runtime option -heapz size specifies the size of the shared-memory segment that should be created to hold HPF-distributed data. In the presence of the -heapz option, all SMP processes allocate all HPF-distributed data within the shared-memory segment, referred to as a global heap. Communication between processes are performed using direct memory accesses within the global heap.



NOTE


The use of the SMP runtime library will generally improve performance on shared-memory systems where there is a one-to-one correspondence between logical and physical processors. In general, you should never use the SMP runtime library with more processes than available processors, as performance will generally be much worse than with socket-based communications using RPM.

The -heapz size command line option and PGHPF_HEAPZ size environment variable specify the size of the shared global heap. The default size is 8 Megabytes. The size may be specified with a 'k' or 'm' suffix; the 'k' specifies kilobytes, and 'm' specifies megabytes and these letters can be lower or upper case. For example, specifying 4M is the same as specifying 4194304.

If the program attempts to allocate more memory than available in the shared global heap (in the event that the aggregate size of HPF-distributed data exceeds the size of the global heap), a message is displayed and the program aborts. A rough estimate of the shared global heap size required can be determined by running the program with the
-stat mems option and looking at the "heap used" value. Refer to section 3.2, Execution Information, for more information on the -stat option.



NOTE


The -heapz option is not available or required on the CRAY T3E. Local portions of distributed arrays are allocated in local memory on each node. One-sided remote fetches/stores are used to access data on remote nodes.

Initialize the shared global heap using -heapinit <value>If the -heapz runtime switch is specified, a shared global heap is used for communication between HPF processes running on the same multi-processor. Normally, this heap is initialized to NANs; this switch specifies a different value to use to initialize the heap.

Specify an uninitialized shared global heap using -noheapinit Specifies that the shared global heap should be uninitialized.

3.1.5 Running PGHPF -Mrpm programs

RPM is the PGI Real Parallel Machine system. RPM supports HPF process spawning and communication among HPF processes on a group of homogeneous hosts. That is, hosts that are all of the same machine type running the same operating system. By default, RPM processes use UNIX sockets for communication. Processes on the same host, for example on a multiprocessor shared-memory system, may optionally use shared global memory for communication. PGHPF RPM programs that run on multiple hosts, each of which has multiple processors, will communicate using sockets when necessary and shared-memory where possible.

Upon program startup, RPM creates all processes needed to run the program. By default it uses rsh to create remote processes. It also establishes socket connections between each pair of processes (this phase is functionally equivalent to MPI). The processes then exchange hostname and process ID information, and determine how many other processes of the same job reside on the same host. If there is the only one process on the host, the process will always use sockets for communication. If there is more than one HPF process on a given host, and the -heapz RPM runtime option is used, communication between processes on the same host will be performed using a fast shared-memory mechanism.

The command line format for executing a PGHPF program compiled/linked using -Mrpm is:


% a.out <user_opts> -pghpf <hpf_opts>

where:

a.out
is the name of the executable program
<user_opts>
are the program's options
<hpf_opts>
are any of the valid options to the PGHPF runtime library

Table 3-5 lists the runtime options available for RPM programs.

Table 3-5 RPM-specific Runtime Library Options and Variables

Option


Environment Variable


Purpose


-host hostargs


PGHPF_HOST hostargs


Specify the hosts where additional processes are to be spawned.


-curhost hostname


PGHPF_CURHOST hostname


Specify alternate name for the
current host.


-debugger path


PGHPF_DEBUGGER path


Specify the default node-local debugger.


-g [ n | all ]


PGHPF_G [n | all]


Enable parallel debugging of PGHPF-generated intermediate Fortran files using a node-local debugger.


-heapinit <value>


PGHPF_HEAPINIT <value>


Specify the value to which shared global heap elements are initialized at startup.


-heapz <size>


PGHPF_HEAPZ <size>


Specify the shared global heap size.


-minxfer size


PGHPF_MINXFER size


Specify minimum unbuffered message size.


-mount arg


PGHPF_MOUNT arg


For systems that use auto mounted file systems, this option fixes a problem where the RPM library is not able to resolve NFS pathnames.


-noheapinit


PGHPF_NOHEAPINIT


Do not initialize the shared global heap.


-rsh shell


PGHPF_RSH


Specify an alternate shell to be used in spawning processes on remote systems.


Specify hosts using -host [ hostname[:n] | ,-file=path | ,-v | ,-dyn ]... The RPM runtime option -host and the PGHPF_HOST environment variable specify the hosts where HPF compute processes will be spawned. All items listed after -host must be comma separated. The first process always runs on the host where the program was invoked. If no -host option is specified, all processes run on the host where the program was invoked.

The following examples assume the program is started on a host named moe. The simplest form is a comma-separated list of hostnames:

% setenv PGHPF_HOST moe,larry,curly,bill

The hosts are chosen in the order specified. If two compute processes were specified, moe and larry would be chosen. If the program requested five processes, the hosts would be chosen in the following order: moe, larry, curly, bill, and moe.

A hostname can be appended with a power factor, for example:

% setenv PGHPF_HOST moe:50,larry:150,curly,bill:25

When any host has a power factor specified, the hosts are chosen in the order of the specified power factor. The default power factor is 100. In the above example, curly has a power factor of 100, moe has a power factor of 50, larry has a power factor of 150 and bill has a power factor of 25. The power factor specifies the following order for host selection: larry, curly, moe, bill. If multiple hosts have the same power factor, the order in which they are chosen is undefined.

The hostnames and optional power factors can be read from a file. Assuming the file hosts contained 3 lines:

moe 50
larry 150
curly

The following setting of PGHPF_HOST would be identical to the previous example:

% setenv PGHPF_HOST -file=hosts,bill:25

Multiple hostnames and -file options can be specified in any order desired. The order is significant only if no host has a power factor.

The -dyn option changes each host's effective power factor based on that host's current load average. The calculation is:

current_power = power / (load+1)

With the -dyn RPM runtime option, the calculated current_power factor is used to choose hosts. If a host fails to respond to the load average request, that host is ignored. This option can take some time to process since each host's load average is requested. Hosts are timed out after 10 seconds and ignored.

Normally, all of the above is done silently, and you will see no output to your screen other than that generated by your HPF program. The -v RPM runtime option displays informative messages listing the host on which each compute process is running.

A final example:

% setenv PGHPF_HOST -dyn,-file=hosts,bill:25,-v

The order of the -file, -dyn, and -v options is not important.

Specifying alternate spawning hostname using -curhost <hostname>When systems are interconnected by multiple networks, the hosts often have different names on different networks. Any of a remote host's alternative names may be specified with the -host option. The -curhost option is used to specify an alternative name for the current or local host on which the executable is invoked. The -host and -curhost options together can be used to specify the alternate hostnames that in turn select a different network.

Debug node-local Fortran intermediate files using -g and -debuggerThe -g RPM runtime option or PGHPF_G environment variable causes a node Fortran debugger to be invoked upon program startup. This option can be used to debug PGHPF-generated intermediate Fortran source code. The debugger is specified using the -debugger runtime option or the PGHPF_DEBUGGER environment variable to the pathname of the debugger. For example,

% setenv PGHPF_DEBUGGER /usr/ucb/dbx

If more than one debugger is present, entering commands is difficult. A better method for debugging is to create a shell script to create a window for each debugger. For example, create a script named rpmdbg (any name will do) containing the following.

exec xterm -e /usr/ucb/dbx $1

And then set PGHPF_DEBUGGER to rpmdbg. All PGHPF runtime options must be specified on the command line that invokes the program. The user-supplied runtime arguments must be specified with each debugger's run command. The file rpmdbg must be accessible in the current PATH.

The n and all optional arguments to the -g option and PGHPF_G variable specify debugging parameters. The positive integer n specifies a logical process number (this should be between 0 and the number of processors). The all option specifies debug all processes.

Specify minimum unbuffered message size using -minxfer <num>The -minxfer <num> RPM runtime option specifies the minimum message size that will not be buffered (in bytes). Messages of size greater than or equal to num bytes are sent as individual messages. Consecutive messages of size less than num bytes that are being sent to the same destination are buffered. The default value is 2048 bytes. Experimenting with this value may improve performance on some systems.

Specify NFS mount path using -mount dir0:replace0[,dir1:replace1]...When NFS file systems are mounted at different points on different hosts, pathnames valid on one host are not necessarily valid on other hosts. If file systems are not mounted at the same directories on remote systems running RPM processes, the remote RPM processes will not be able to find the executable file or input files. This can usually be corrected by specifying the -mount RPM runtime option or the PGHPF_MOUNT environment variable.

The value specified is a list of match strings and replacement strings.

-mount match0:replace0,match1:replace1,...

For example:

% a.out -pghpf -mount /home/u:/home/1,/home/g:/home/1

Any files in the /home/u subdirectory of the spawning host will be successfully found if they exist with the same relative pathname in the /home/1 subdirectory on the remote host. Likewise for /home/g.

Specify remote shell using -rsh <shellname>The -rsh option and the PGHPF_RSH environment variable specify the name of the shell used for creating remote processes. It is normally "rsh". If your favorite remote shell is "ksh", you could specify that it be used as follows:

% a.out -pghpf -rsh ksh

Enable fast shared-memory communications using -heapz <size>If there is more than one process on the same host, for example on a multiprocessor shared-memory system, RPM can create a shared-memory segment to enable faster communications. In this case, process-to-process communications are performed using simple memory-to-memory copies rather than socket calls. In most cases, this is much more efficient (in particular whenever there is a one-to-one mapping between HPF processes and physical processors). The RPM runtime option -heapz size specifies the size of the shared-memory segment that should be created.

In the presence of the -heapz option, all RPM processes allocate all HPF-distributed data within the shared-memory segment, referred to as a global heap. Communication between processes on the same host is implemented as a memory-to-memory copy within the global heap. Note that even in the presence of -heapz, communication between processes on different hosts is performed using sockets.



NOTE


The use of a shared global heap will generally improve performance on systems where there is a one-to-one correspondence between logical and physical processors. In general, you should never use the -heapz option with more processes than available processors, as performance will generally be much worse than with socket-based communications.

The -heapz size command line option and PGHPF_HEAPZ size environment variable specify the size of the shared global heap. The default size is zero, which forces all processes to use sockets for communication. The size may be specified with a 'k' or 'm' suffix; the 'k' specifies kilobytes, and 'm' specifies megabytes and these letters can be lower or upper case. For example, specifying 4M is the same as specifying 4194304.

If the program attempts to allocate more memory than available in the shared global heap, a message is displayed and the program aborts. A rough estimate of the shared global heap size required can be determined by running the program with the
-stat mems option and looking at the "heap used" value. Refer to section 3.2, Execution Information, for more information on the -stat option.

The -heapz option is recommended when all of the processes of a program are run on a single shared memory multi-processor system. However, if the processes of a program run on multiple systems, each group of processes on a single system will use a shared global heap for communication. Sockets are still used for communication between processes on different systems.

Initialize the shared global heap using -heapinit <value>If the -heapz runtime switch is specified, a shared global heap is used for communication between HPF processes running on the same multi-processor. Normally, this heap is initialized to NANs; this switch specifies a different value to use to initialize the heap.

Specify an uninitialized shared global heap using -noheapinit Specifies that the shared global heap should be uninitialized.

3.1.6 Running PGHPF -Mrpm1 programs

RPM1 is the PGI Real Parallel Machine system for a single processor. RPM1 is sometimes useful for debugging RPM programs. Programs linked using -Mrpm1 perform all of the initialization required for an RPM program, but run on a single processor without performing a UNIX fork() operation. RPM1 eases debugging by allowing a single node version of an HPF program to run and work in a manner very similar to a multi-node program, but with a simplified debugging environment in which a normal serial debugger can be used.

Table 3-6 lists the runtime options available for RPM1 programs. Each of these options is used in a fashion identical to that described above for RPM programs in section 3.1.5, Running PGHPF -Mrpm programs.

Table 3-6 RPM1-specific Runtime Library Options and Variables

Option


Environment Variable


Purpose


-debugger path


PGHPF_DEBUGGER path


Specify the default debugger.


-g [ n | all ]


PGHPF_G [n | all]


Enable parallel debugging.



<< >> Title Contents Index Home Help