<< >> Title Contents Index Home Help

8 HPF Extrinsics


HPF extrinsic procedures allow an HPF programmer to call non-HPF procedures (for example, procedures compiled using an F77 node compiler) or local HPF procedures. At the extrinsic procedure boundary, procedure arguments are mapped from the paradigm of the caller to the appropriate format for the called procedure. The EXTRINSIC prefix on a subroutine or function INTERFACE block declares the interface to use when calling the extrinsic procedure. To call a subroutine or function as an extrinsic, the HPF calling program must supply an explicit interface using an INTERFACE block. The result of calling an HPF extrinsic subroutine or function without an explicit interface is undefined.

By default, an HPF program executes in a single-threaded "HPF Global" execution model. In this model, the program logically executes under a single thread of control - for example a PRINT statement results in only one line of output, regardless of the number of processors executing the program. However, data parallel statements and constructs (FORALLs, array syntax statements, DO INDEPENDENT loops, etc) are executed in parallel using as many physical processors as are available to the program, depending on whether and how data is distributed. This is a key distinction between HPF programs and, for example, message-passing programs, which have multiple logical threads of control.

There are times when it is useful to escape from the HPF Global execution model to a multi-threaded "local" execution model, or perhaps a "serial" execution model. For example, this is useful for coding small portions of an HPF program using MPI message-passing, or for calling highly optimized library routines such as assembly-coded FFTs. PGHPF supports calls to extrinsics executing in the following local execution models:

* SERIAL - a single logical thread of control where one copy of the procedure is conceptually executing and the underlying target hardware is treated as a uni-processor.

* LOCAL - multiple threads-of-control, one per processor, where each thread is executing the same procedure; this model is often called SPMD (single program, multiple data).

The PGHPF compiler supports the EXTRINSIC keyword with extrinsic types HPF_LOCAL, HPF_SERIAL, F90_LOCAL, F90_SERIAL, F77_LOCAL, and F77_SERIAL. The HPF_CRAFT extrinsic type is supported only on the CRAY T3E. See the HPF_CRAFT User's Guide for more information on this extrinsic type. Examples of using HPF extrinsic procedures are shown in the sections below.

8.1 Extrinsic HPF_LOCAL

In the HPF_LOCAL extrinsic model, the extrinsic subprogram is compiled using the PGHPF compiler and has access to all language and most library features of HPF. However, it executes in a multi-threaded fashion with each processor operating only on the data "owned" by that processor. This includes all replicated data as well as the local portions of HPF-distributed arrays.

Rather than addressing the local portions of HPF-distributed arrays using the global index space, a local index space is used which spans only the data local to the processor. For example, a 1000-element array distributed over 4 processors in block fashion is distributed across the processors and viewed as follows in an HPF Global program unit:While the data is physically distributed, the single logical HPF Global thread of control can access any element of the array using its "global" index and variable name.

In an HPF_LOCAL extrinsic, the local portion of each distributed array is accessed as if it were a complete array in its own right. For example, the array above would be viewed as 4 separate 256-element arrays as follows:So, the dummy array to which each local array section is passed is accessed using a completely separate (local) index space on each processor. This programming model is identical to the standard MPI SPMD message-passing model.

A called HPF_LOCAL procedure may use the underlying communication primitives upon which the HPF runtime is based (for example, MPI) to communicate between processors. In addition, there are generic PGHPF send and receive routines that can be used with any of the PGHPF runtime libraries. The set of generic routines listed in this section may be expanded in the future and is supported on all systems.

For example, the HPF global program DOTP below defines an interface to the DOTP_BLK HPF local extrinsic procedure. Note that the interface block contains DISTRIBUTE and ALIGN directives applied to the dummy arguments. These are required, and determine how the arrays will be distributed upon entry to the extrinsic. It is not required that these directives match the HPF global distribution of the array. However, if they do not match, a potentially substantial performance penalty is incurred due to the re-mapping of the arrays at the procedure boundary (both upon entry and exit). If the local and global distributions do match, no re-mapping or argument copying will occur:

      PROGRAM DOTP
INTEGER*4 N
PARAMETER (N=1024)
REAL*8 X(N), Y(N), A
C
!HPF$ DISTRIBUTE (BLOCK) :: X
!HPF$ ALIGN (:) WITH X(:) :: Y
C
INTERFACE
EXTRINSIC (HPF_LOCAL) SUBROUTINE DOTP_BLK
& (RANK, SHAPE, N, X, Y, A)
INTEGER*4, INTENT(IN) :: RANK
INTEGER*4, INTENT(IN) :: SHAPE(:)
INTEGER*4, INTENT(IN) :: N
REAL*8, INTENT(IN) :: X(:)
REAL*8, INTENT(IN) :: Y(:)
REAL*8, INTENT(OUT) :: A
!HPF$ DISTRIBUTE (BLOCK) :: X
!HPF$ ALIGN (:) WITH X(:) :: Y
END SUBROUTINE DOTP_BLK
END INTERFACE
C
X = 1.0D0
Y = 2.0D0
CALL DOTP_BLK (SIZE(PROCESSORS_SHAPE()),
& PROCESSORS_SHAPE(),
& N, X, Y, A) PRINT *, "The dot product of X and Y is: ", A
C
RETURN
END

The called extrinsic HPF local routine DOTP_BLK is shown below. Again, note that the INTERFACE block for the extrinsic must contain data distribution directives that specify how the data should be distributed upon entry to the extrinsic. NOTE: If no data distribution directives are supplied, PGHPF will replicate all arguments across all processors. Note also, since the data is distributed, the called routine must determine which data it owns, and explicitly handle the communications and computations on that portion of the data.

DOTP_BLK computes the dot product of global vectors X and Y and returns the result in A on each processor. Each processor determines which portions of X and Y it owns, computes the dot product of the local portion, and then performs the necessary communication to complete the dot product on each processor.

      EXTRINSIC (HPF_LOCAL) SUBROUTINE DOTP_BLK 
& (RANK, SHAPE, N, X, Y, A)
INTEGER*4 RANK, SHAPE(:), N
REAL*8 X(:), Y(:), A
C
INCLUDE '/usr/pgi/<arch>/include/pglocal.f'
INTEGER MAXCPUS
PARAMETER (MAXCPUS = 2048)
INTEGER MYCPU, NCPUS, COORD(7)
INTEGER I, J
DOUBLE PRECISION TA(0:2047)
C
C Get my processor number and number of processors.
C
MYCPU = PGHPF_MYPROCNUM()
NCPUS = PGHPF_NPROCS()
C
C Determine processor arrangement information.
C
CALL PGHPF_PROCNUM_TO_COORD (MYCPU, RANK, SHAPE, COORD)
C
C Check for error conditions.
C
IF (RANK .NE. 1) THEN
PRINT *, "DOTP_BLK: Processor arrangement must be rank 1"
STOP
ENDIF
C
IF (N .LE. 0) RETURN
C
IF (SHAPE(1) .GT. MAXCPUS) THEN
PRINT *, "DOTP_BLK: # CPUs must be less than:",MAXCPUS+1
STOP
ENDIF
C
C Determine how many elements reside on this processor
C
BLKSZ = (N + SHAPE(1) - 1) / SHAPE(1)
MYCT = MIN((N - MYCPU * BLKSZ), BLKSZ)
MYCT = MAX(MYCT,0)
C
C Allocate an array to hold intermediate results and do the
C local dot product
C
TA(MYCPU) = 0.0D0
DO I = 1, MYCT
TA(MYCPU) = TA(MYCPU) + X(I) * Y(I)
ENDDO
C
C Broadcast the results to all other processors
C
IF (SHAPE(1) .GT. 1) THEN
DO I = 0, SHAPE(1) - 1
IF (I .EQ. MYCPU) THEN
DO J = 0, SHAPE(1) - 1
IF (J .NE. MYCPU) THEN
CALL PGHPF_CSEND (J,TA(MYCPU),1,1,PGLCL_REAL8)
ENDIF
ENDDO
ELSE
CALL PGHPF_CRECV (I,TA(I),1,1,PGLCL_REAL8)
ENDIF
ENDDO
ENDIF
C
C Complete global sum of intermediate results
C
A = 0.0D0
DO I = 0, SHAPE(1) - 1
A = A + TA(I)
ENDDO
C
RETURN
END

When using EXTRINSIC(HPF_LOCAL), the extrinsic is an HPF local program unit and must be compiled using PGHPF. The .o file produced by compiling the extrinsic using PGHPF with the -c option can then be linked with the HPF calling program by including it on the pghpf link line.

For example, if dotp_blk.F is the local HPF routine, compile and link it into the main program as follows:

% pghpf -c dotp_blk.F
% pghpf dotp.hpf dotp_blk.o

Even though the communications in the above example are performed using the generic PGHPF extrinsic communication routines (see section 8.8 below), there is no reason you can't use MPI routines directly (MPI_SEND, MPI_RECV, etc) as long as you are compiling and linking your PGHPF program using the -Mmpi option. Note that in this case MPI_INIT and MPI_FINALIZE are called by the PGHPF runtime libraries at program startup and shutdown. There is no need to call them again from within the body of the extrinsic.

An HPF_LOCAL routine may also use the HPF_LOCAL_LIBRARY procedures to query global arguments or for determining processor information. The supported HPF_LOCAL_LIBRARY routines are found in Appendix C of the PGHPF Reference Manual.

8.2 Extrinsic HPF_SERIAL

An HPF_SERIAL extrinsic is declared in a fashion similar to that shown for HPF_LOCAL procedures in the previous section. An HPF_SERIAL routine must be compiled using PGHPF and will execute on only one processor. The caller treats the extrinsic HPF_SERIAL procedure the same as an identically coded HPF procedure, although performance may differ. HPF_SERIAL extrinsics are useful for implementing portions of an application that are inherently serial and have no significant impact on the performance of the HPF program as a whole.

The routine will execute on only one processor. Dummy array references and common block array references within the called HPF_SERIAL routine will be distributed to the single processor at the call site to the HPF_SERIAL routine and redistributed back as needed upon return from the called HPF_SERIAL routine.

8.3 Extrinsic F90_LOCAL

An extrinsic F90_LOCAL routine is similar to an HPF_LOCAL routine with the exception that the language of the local routine is Fortran 90. An F90_LOCAL routine must be compiled by PGHPF using the -Mf90 compiler switch. In particular, there are many incompatibilities (modules, implementation of allocatable arrays, etc) that make it impractical to link object files compiled using an F90 node compiler into a PGHPF-compiled main program.

As with HPF_LOCAL, an F90_LOCAL routine may use the underlying communication primitives or may use the generic PGHPF send and receive routines. These routines are described below in section 8.8.

8.4 Extrinsic F90_SERIAL

An F90_SERIAL extrinsic routine must be compiled using the PGHPF -Mf90 option and will execute on only one processor. The caller treats the extrinsic F90_SERIAL procedure the same as an identically coded F90 procedure, although performance may differ. F90_SERIAL extrinsics are useful for implementing portions of an application that are inherently serial and have no significant impact on the performance of the HPF program as a whole.

The routine will execute on only one processor. Dummy array references and common block array references within the called F90_SERIAL routine will be distributed to the single processor at the call site to the F90_SERIAL routine and redistributed back as needed upon return from the called F90_SERIAL routine.

8.5 Extrinsic F77_LOCAL

An extrinsic F77_LOCAL routine is similar to an HPF_LOCAL routine with the exception that the language of the local routine is FORTRAN 77. An F77_LOCAL routine can be compiled by PGHPF using the -Mf90 compiler switch, or can be compiled using the target system's native FORTRAN 77 compiler. Alternatively, subprograms existing in pre-compiled object libraries can be called as F77_LOCAL extrinsics as long as they adhere to the calling conventions of FORTRAN 77. In particular, C routines with matching arguments can be called as F77_LOCAL extrinsics.

As with HPF_LOCAL, an F77_LOCAL routine may use the underlying communication primitives or may use the generic PGHPF send and receive routines. These routines are described below in section 8.8.

8.6 Extrinsic F77_SERIAL

An F77_SERIAL extrinsic routine can be compiled using the PGHPF -Mf90 option, or, on systems for which PGHPF operates in translator mode, can be compiled using the target system's native FORTRAN 77 compiler. The caller treats the extrinsic F77_SERIAL procedure the same as an identically coded F77 procedure. F77_SERIAL extrinsics are useful for implementing portions of an application that are inherently serial and have no significant impact on the performance of the HPF program as a whole.

The routine will execute on only one processor. Dummy array references and common block array references within the called F77_SERIAL routine will be distributed to the single processor at the call site to the F77_SERIAL routine and redistributed back as needed upon return from the called F77_SERIAL routine.

8.7 Extrinsic HPF_CRAFT

The HPF_CRAFT extrinsic type is available only on the CRAY T3E. An HPF_CRAFT routine must be compiled using PGHPF with the -Mcraft option, and will execute on the target processors using the CRAFT execution model. Refer to the HPF_CRAFT User's Guide for more information on this extrinsic type.

8.8 PGHPF Generic Communication Routines

The generic PGHPF local communication routines are available on all systems. Assuming PGHPF is installed in the directory /usr/pgi on your system, the data types for the C interfaces to the generic local communications routines are defined in the file /usr/pgi/arch/include/pglocal.h. (where arch is your system's architecture; e.g. sp2, sgi, t3e, linux86, or some other system). The data types for the Fortran interfaces are defined in /usr/pgi/arch/include/pglocal.f.

Send/receive non-character data

These routines allow a local extrinsic program unit to send or receive non-character data. These routines block until the data is delivered.

Fortran interface:

integer cpu, cnt, str, typ
integer adr(*)
call pghpf_csend(cpu, adr, cnt, str, typ)
call pghpf_crecv(cpu, adr, cnt, str, typ)

C interface:

	void __hpf_csend(int cpu, void *adr, int cnt, 
int str, int typ) void __hpf_crecv(int cpu, void *adr,
int cnt, int str, int typ)

The cpu argument is the PGHPF processor number for the remote partner, adr is the local data address, cnt is the number of data items to transfer, typ is the data type, and str is the stride between each item in the local array (in item units).

For performance reasons, data transferred by pghpf_csend and pghpf_crecv may not be buffered. Local extrinsic routines should be coded in such a way that processors "pair off" when exchanging messages. When one processor calls pghpf_csend, the partner processor must call pghpf_crecv. A simple way to decide who sends first is to compare the processor numbers, for example:

	me = pghpf_myprocnum()
if (partner .lt. me) then
call pghpf_csend(partner, x, ...)
call pghpf_crecv(partner, y, ...)
else
call pghpf_crecv(partner, y, ...)
call pghpf_csend(partner, x, ...)
end

Send/receive Fortran character data

These routines are used to send or receive character data. These routines block until the data is delivered.

Fortran interface:

integer cpu, cnt, str
character*(*) adr(*)
call pghpf_csendchar(cpu, adr, cnt, str)
call pghpf_crecvchar(cpu, adr, cnt, str)

The cpu argument is the PGHPF processor number for the remote partner, adr is the local data address, cnt is the number of character items to transfer, and str is the stride between each item in the local character array (in item units). Each character item is a fixed-length sequence of characters.

Note that pghpf_csend and pghpf_crecv do not allow a processor to send a message to itself. The code must handle this case if it can arise in the user's algorithm. For example, the preceding example could be extended as shown here:

	me = pghpf_myprocnum()
if (partner .eq. me) then
y = x
else if (partner .lt. me) then
call pghpf_csend(partner, x, ...)
call pghpf_crecv(partner, y, ...)
else
call pghpf_crecv(partner, y, ...)
call pghpf_csend(partner, x, ...)
end

8.9 PGHPF Generic Query Routines

In addition to possible calls to the underlying communication support routines and the PGHPF generic communication routines, there are a few routines common to both. This section covers these common routines.

Get number of processors

This routine returns the PGHPF runtime's notion of the number of processors for the current execution of the program.

Fortran interface:

integer pghpf_nprocs
external pghpf_nprocs
nprocs = pghpf_nprocs()

C interface:

int __hpf_nprocs()
nprocs = __hpf_nprocs()

Get my processor number

Returns the PGHPF runtime's notion of the current processor ID; this will be between 0 and pghpf_nprocs()-1.

Fortran interface:

integer pghpf_myprocnum
external pghpf_myprocnum
myprocnum = pghpf_myprocnum()

C interface:

int __hpf_myprocnum()
myprocnum = __hpf_myprocnum()

Translate PGHPF processor number to processor grid coordinates

Fortran interface:

integer procnum, rank, shape(rank), coord(rank)
call pghpf_procnum_to_coord(procnum, rank, shape, coord)

C interface:

void __hpf_procnum_to_coord
(int procnum, int rank, int *shape, int *coord)

The rank and shape arguments describe the processor grid. The PGHPF processor number given by procnum is translated to grid coordinates returned in coord. Grid coordinates are integers between 1 and the size of the corresponding grid dimension. If the processor number is outside the bounds of the processor grid, zeroes are returned in coord.

Translate processor grid coordinates to PGHPF processor number

Fortran interface:

integer procnum, rank, shape(rank), coord(rank)
integer pghpf_coord_to_procnum
external pghpf_coord_to_procnum
procnum = pghpf_coord_to_procnum(rank, shape, coord)

C interface:

int __hpf_coord_to_procnum(int rank, 
int *shape, int *coord)

The rank and shape arguments describe the processor grid. The processor grid coordinates in coord are translated to a PGHPF processor number. Grid coordinates are integers between 1 and the size of the corresponding grid dimension. If the coordinates are outside the bounds of the processor grid, -1 is returned.

8.10 PGHPF MPI Query Routines

PGHPF programs using MPI as the underlying communications protocol provide the following additional routine.

Translate HPF processor number to MPI processor identifier

This routine translates the PGHPF processor number to the processor identifier used by MPI.

Fortran interface:

integer pghpf_tid
external pghpf_tid
itid = pghpf_tid(iprocnum)

C interface:

int __hpf_tid(int procnum)
tid = __hpf_tid(procnum)


<< >> Title Contents Index Home Help