PGI Guide to GotoBLAS2

This guide is intended to help build and test the GotoBLAS2 libraries using the PGI 2011 compilers.

Version Information

This guide was created for the current release of the GotoBLAS libararies packaged as GotoBLAS2-1.13. This information is for both x64 processors running 64-bit Linux and x86 processors running 32 bit Linux.

Application Notes

GotoBLAS2 is freely available from

From the website:

"GotoBLAS2 has been released by the Texas Advanced Computing Center as open source software under the BSD license. This product is no longer under active development by TACC, but it is being made available to the community to use, study, and extend. GotoBLAS2 uses new algorithms and memory techniques for optimal performance of the BLAS routines. The changes in this final version target new architecture features in microprocessors and interprocessor communication techniques; also, NUMA controls enhance multi-threaded execution of BLAS routines on node. The library features optimal performance on the following platforms:

Intel Nehalem and Atom systems
VIA Nanoprocessor
AMD Shanghai and Istanbul

The library includes the following features:

  • Configurations for a variety of hardware platforms
  • Incorporation of features of many ISAs (Instruction Set Architecture)
  • Implementation of NUMA controls to assure best process affinity and memory policy
  • Dynamic detection of multiple architecture components, which can be included in a single binary (for binary distributions)"

The original development stream—GotoBLAS—does not appear to be available for download from this website.

Obtaining the Source Code

GotoBLAS2-1.13 source code can be downloaded directly from 



Building and Testing GotoBLAS2

  1. Untar the GotoBLAS2 package:

      tar -xvzf GotoBLAS2-1.13.tar.gz
      cd GotoBLAS2
  2. Set the environment, build and test the code:

      make CC=pgcc FC=pgfortran


The completed libraries will be in the build directory and will need to be manually moved to the desired installation directory.

Known Issues and Limitations

GotoBLAS2 is built by default at the -O2 optimization level. Users may see slightly improved performance by editing the file "Makefile.rule" and changing COMMON_OPT to -fast:

   COMMON_OPT + -fast

The build process should be able to determine if your system is 32 bits or 64 bits. Should the build process get confused, you can add either BINARY=64 or BINARY=32 to the make command.

GotoBLAS2 can also support multiple architectures in one binary by uncommenting DYNAMIC_ARCH=1 in the file "Makefile.rule".

All of the GotoBLAS libraries are threaded and all attempt to use all of the processors available on your target computer. If you wish to control the number of threads, you need to specify the OMP_NUM_THREADS or GOTO_NUM_THREADS environment. Both of these environmental variables are identical except with respect to priority. The GOTO_NUM_THREADS environment is particularly useful for OpenMP users who want to control both the number of OpenMP tasks and the number of GotoBLAS threads

As of the end of 2011, GotoBLAS2 had not yet been updated to explicitly support the new AVX instruction processors from Intel or AMD. To build on an AMD AVX-enables Interlagos processor, specify an AMD non-AVX Barcelona target:

  make CC=pgcc FC=pgfortran TARGET=BARCELONA
Click me