Guide to Updating ACML

This guide is intended to help PGI customers download and replace the AMD Core Math Library in their installation for use with the PGI compilers.


Version Information

The version of libacml that is current with this page is version 2.5.0. This guide was created to describe how to update libacml should the user wish to obtain the newer version.

Application Notes

Information about libacml can be found at AMD Core Math Library 2.5.

From the ACML web page:

"The AMD Core Math Library (ACML) incorporates BLAS, LAPACK and FFT routines, which are designed to be used by a wide range of software developers to obtain excellent performance from their applications running on AMD platforms. The highly optimized library contains numeric functions for mathematical, engineering, scientific and financial applications. ACML is available both as a 32-bit library, for compatibility with legacy x86 applications, and as a 64-bit library that is designed to fully exploit the large memory space and improved performance offered by the new AMD64 architecture. ACML is supported on both Linux and Microsoft® Windows® operating systems."

Obtaining the Library

Various types of ACML libs are available at the ACML download page.

Configuration and Set-up Information

Download one or more of the linux *.tgz or Windows (acml-2-5-0-win32.exe) packages into some area. The *.tgz files need to be untar'd and gunzip'd, and either the install or the acml-2-5-0-win32.exe file should be executed to get license acceptance and install the libraries.

There are several 32-bit cpu types (with and without SSE instructions) and both 64-bit and 32-bit also provide an OpenMP version of the library. For the 64-bit Opteron/Athlon64 (not recommended for Xeon EM64T) type configurations, a highly tuned assembly level version of libacml, called libacml_mv, is also provided. It is necessary to choose the proper version to link so that the best performance is possible. For 32-bit applications, codes and libraries compiled for machines without SSE1 or SSE2 type instruction sets will also run on the upward compatible machines with some or all of the sse instructions. All 64-bit cpu types have SSE1 and SSE2 type instructions. The libraries that utilize the SSE1 and SSE2 instructions are typically higher performing.

Building

The following table illustrates a simple compile and link of a fortran program with PGF90™. Assume the ACML libraries are installed in the area /opt/acml2.5.0

Using ACML:
Syntax:
pgf90 foo.f -o foo -Mcache_align -tp cpu_type -Iinc_dir -Llib_dir -llib_name
Example:
pgf90 foo.f -o foo -Mcache_align -tp k8-64 -I/opt/acml2.5.0/pgi64/include -L/opt/acml2.5.0/pgi64/lib -lacml

Addr Size cpu_type SSE1/SSE2 inc_dir lib_dir lib_name
32 px, p6, athlon no/no /opt/acml2.5.0/pgi32_nosse/include /opt/acml2.5.0/pgi32_nosse/lib acml
32 athlonxp or piii yes/no /opt/acml2.5.0/pgi32_noSSE2/include /opt/acml2.5.0/pgi32_noSSE2/lib acml
32 p7 , k8-32 yes/yes /opt/acml2.5.0/pgi32/include /opt/acml2.5.0/pgi32/lib acml
64 k8-64 or p7-64 yes/yes /opt/acml2.5.0/pgi64/include /opt/acml2.5.0/pgi64/lib acml
64 k8-64 only yes/yes /opt/acml2.5.0/pgi64/include /opt/acml2.5.0/pgi64/lib acml_mv

If you wish to make things simpler, and intend to compile for only one possible configuration of cpu type and number, you can copy the specific library from the installation directory and place the library in the lib or libso area of the PGI compiler area (e.g. /usr/pgi/linux86/5.2/lib). The PGI include area (e.g. /usr/pgi/linux86/5.2/include) can hold the acml.h header file. This will eliminate the need for -Iinc_dir -Llib_dir above.

For platforms with multiple-processors cpus or multiple cpus, multi-threaded libraries supporting OpenMP interfaces are provided. These mp-versions of libacml and libacml_mv should be called to take advantage of multi-cpu configurations. These versions create their own multi-threaded parallel regions that support the openmp environment variables. Note: If you call an ACML routine from inside an OpenMP parallel region that you have created, you should NOT use the multi-threaded ACML routines, because the parallel region creates the threads, and not the ACML routine.

Using ACML with OpenMP:
Syntax:
pgf90 foo.f -o foo -Mcache_align -mp -tp cpu_type -Iinc_dir -Llib_dir -llib_name

Addr Size cpu_type SSE1?/SSE2? inc_dir lib_dir lib_name
32 px,p6,athlon no/no /opt/acml2.5.0/pgi32_nosse/include /opt/acml2.5.0/pgi32_nosse/lib acml
32 athlonxp or piii yes/no /opt/acml2.5.0/pgi32_noSSE2/include /opt/acml2.5.0/pgi32_noSSE2/lib acml
32 p7 , k8-32 yes/yes /opt/acml2.5.0/pgi32_mp/include /opt/acml2.5.0/pgi32_mp/lib acml
64 k8-64 or p7-64 yes/yes /opt/acml2.5.0/pgi64_mp/include /opt/acml2.5.0/pgi64_mp/lib acml
64 k8-64 only yes/yes /opt/acml2.5.0/pgi64_mp/include /opt/acml2.5.0/pgi64_mp/lib acml_mv

If you wish to make things simpler, and intend to compile for only one possible configuration of cpu type and number, you can copy the specific library from the installation directory and place the library in the lib or libso area of the PGI compiler area (e.g. /usr/pgi/linux86/5.2/lib). The PGI include area (e.g. /usr/pgi/linux86/5.2/include) can hold the acml.h header file. This will eliminate the need for -Iinc_dir -Llib_dir above.

Verifying Correctness

In the directories under pgi32, pgi64, etc. there is an examples directory with tests for correctness of the installation. Edit the GNUMakefile to change the ACMLDIR variable to point to where the installation resides. To execute the examples, run:

gmake clean
gmake

Known Issues and Limitations

The following are a list of known issues when using the PGI compiler with libacml.

Click me