This guide is intended to help PGI customers download and replace the AMD Core Math Library in their installation for use with the PGI compilers.
The version of libacml that is current with this page is version 2.5.0. This guide was created to describe how to update libacml should the user wish to obtain the newer version.
Information about libacml can be found at AMD Core Math Library 2.5.
From the ACML web page:
"The AMD Core Math Library (ACML) incorporates BLAS, LAPACK and FFT routines, which are designed to be used by a wide range of software developers to obtain excellent performance from their applications running on AMD platforms. The highly optimized library contains numeric functions for mathematical, engineering, scientific and financial applications. ACML is available both as a 32-bit library, for compatibility with legacy x86 applications, and as a 64-bit library that is designed to fully exploit the large memory space and improved performance offered by the new AMD64 architecture. ACML is supported on both Linux and Microsoft® Windows® operating systems."
Various types of ACML libs are available at the ACML download page.
Download one or more of the linux *.tgz or Windows (acml-2-5-0-win32.exe) packages into some area. The *.tgz files need to be untar'd and gunzip'd, and either the install or the acml-2-5-0-win32.exe file should be executed to get license acceptance and install the libraries.
There are several 32-bit cpu types (with and without SSE instructions) and both 64-bit and 32-bit also provide an OpenMP version of the library. For the 64-bit Opteron/Athlon64 (not recommended for Xeon EM64T) type configurations, a highly tuned assembly level version of libacml, called libacml_mv, is also provided. It is necessary to choose the proper version to link so that the best performance is possible. For 32-bit applications, codes and libraries compiled for machines without SSE1 or SSE2 type instruction sets will also run on the upward compatible machines with some or all of the sse instructions. All 64-bit cpu types have SSE1 and SSE2 type instructions. The libraries that utilize the SSE1 and SSE2 instructions are typically higher performing.
The following table illustrates a simple compile and link of a fortran program with PGF90. Assume the ACML libraries are installed in the area /opt/acml2.5.0
Using ACML:
Syntax:
pgf90 foo.f -o foo -Mcache_align -tp cpu_type -Iinc_dir -Llib_dir -llib_name
Example:
pgf90 foo.f -o foo -Mcache_align -tp k8-64 -I/opt/acml2.5.0/pgi64/include -L/opt/acml2.5.0/pgi64/lib -lacml
| Addr Size | cpu_type | SSE1/SSE2 | inc_dir | lib_dir | lib_name |
|---|---|---|---|---|---|
| 32 | px, p6, athlon | no/no | /opt/acml2.5.0/pgi32_nosse/include | /opt/acml2.5.0/pgi32_nosse/lib | acml |
| 32 | athlonxp or piii | yes/no | /opt/acml2.5.0/pgi32_noSSE2/include | /opt/acml2.5.0/pgi32_noSSE2/lib | acml |
| 32 | p7 , k8-32 | yes/yes | /opt/acml2.5.0/pgi32/include | /opt/acml2.5.0/pgi32/lib | acml |
| 64 | k8-64 or p7-64 | yes/yes | /opt/acml2.5.0/pgi64/include | /opt/acml2.5.0/pgi64/lib | acml |
| 64 | k8-64 only | yes/yes | /opt/acml2.5.0/pgi64/include | /opt/acml2.5.0/pgi64/lib | acml_mv |
If you wish to make things simpler, and intend to compile for only one possible configuration of cpu type and number, you can copy the specific library from the installation directory and place the library in the lib or libso area of the PGI compiler area (e.g. /usr/pgi/linux86/5.2/lib). The PGI include area (e.g. /usr/pgi/linux86/5.2/include) can hold the acml.h header file. This will eliminate the need for -Iinc_dir -Llib_dir above.
For platforms with multiple-processors cpus or multiple cpus, multi-threaded libraries supporting OpenMP interfaces are provided. These mp-versions of libacml and libacml_mv should be called to take advantage of multi-cpu configurations. These versions create their own multi-threaded parallel regions that support the openmp environment variables. Note: If you call an ACML routine from inside an OpenMP parallel region that you have created, you should NOT use the multi-threaded ACML routines, because the parallel region creates the threads, and not the ACML routine.
Using ACML with OpenMP:
Syntax:
pgf90 foo.f -o foo -Mcache_align -mp -tp cpu_type -Iinc_dir -Llib_dir -llib_name
| Addr Size | cpu_type | SSE1?/SSE2? | inc_dir | lib_dir | lib_name |
|---|---|---|---|---|---|
| 32 | px,p6,athlon | no/no | /opt/acml2.5.0/pgi32_nosse/include | /opt/acml2.5.0/pgi32_nosse/lib | acml |
| 32 | athlonxp or piii | yes/no | /opt/acml2.5.0/pgi32_noSSE2/include | /opt/acml2.5.0/pgi32_noSSE2/lib | acml |
| 32 | p7 , k8-32 | yes/yes | /opt/acml2.5.0/pgi32_mp/include | /opt/acml2.5.0/pgi32_mp/lib | acml |
| 64 | k8-64 or p7-64 | yes/yes | /opt/acml2.5.0/pgi64_mp/include | /opt/acml2.5.0/pgi64_mp/lib | acml |
| 64 | k8-64 only | yes/yes | /opt/acml2.5.0/pgi64_mp/include | /opt/acml2.5.0/pgi64_mp/lib | acml_mv |
If you wish to make things simpler, and intend to compile for only one possible configuration of cpu type and number, you can copy the specific library from the installation directory and place the library in the lib or libso area of the PGI compiler area (e.g. /usr/pgi/linux86/5.2/lib). The PGI include area (e.g. /usr/pgi/linux86/5.2/include) can hold the acml.h header file. This will eliminate the need for -Iinc_dir -Llib_dir above.
In the directories under pgi32, pgi64, etc. there is an examples directory with tests for correctness of the installation. Edit the GNUMakefile to change the ACMLDIR variable to point to where the installation resides. To execute the examples, run:
gmake clean
gmake
The following are a list of known issues when using the PGI compiler with libacml.