This guide was created for the current release of the GotoBLAS libraries packaged as GotoBLAS2-1.13. This information is for both x64 processors running 64-bit Linux and x86 processors running 32 bit Linux.
GotoBLAS2 is freely available from TACC.
From the website:
"GotoBLAS2 has been released by the Texas Advanced Computing Center as open source software under the BSD license. This product is no longer under active development by TACC, but it is being made available to the community to use, study, and extend. GotoBLAS2 uses new algorithms and memory techniques for optimal performance of the BLAS routines. The changes in this final version target new architecture features in microprocessors and interprocessor communication techniques; also, NUMA controls enhance multi-threaded execution of BLAS routines on node. The library features optimal performance on the following platforms:
Intel Nehalem and Atom systems
VIA Nanoprocessor
AMD Shanghai and Istanbul
The library includes the following features:
The original development stream—GotoBLAS—does not appear to be available for download from this website.
GotoBLAS2-1.13 source code can be downloaded from TACC.
None.
Untar the GotoBLAS2 package:
tar -xvzf GotoBLAS2-1.13.tar.gz cd GotoBLAS2
Set the environment, build and test the code:
make CC=pgcc FC=pgfortran
The completed libraries will be in the build directory and will need to be manually moved to the desired installation directory.
GotoBLAS2 is built by default at the -O2 optimization level. Users may see slightly improved performance by editing the file "Makefile.rule" and changing COMMON_OPT to -fast:
COMMON_OPT + -fast
The build process should be able to determine if your system is 32 bits or 64 bits. Should the build process get confused, you can add either BINARY=64 or BINARY=32 to the make command.
GotoBLAS2 can also support multiple architectures in one binary by uncommenting DYNAMIC_ARCH=1 in the file "Makefile.rule".
All of the GotoBLAS libraries are threaded and all attempt to use all of the processors available on your target computer. If you wish to control the number of threads, you need to specify the OMP_NUM_THREADS or GOTO_NUM_THREADS environment. Both of these environmental variables are identical except with respect to priority. The GOTO_NUM_THREADS environment is particularly useful for OpenMP users who want to control both the number of OpenMP tasks and the number of GotoBLAS threads
As of the end of 2011, GotoBLAS2 had not yet been updated to explicitly support the new AVX instruction processors from Intel or AMD. To build on an AMD AVX-enables Interlagos processor, specify an AMD non-AVX Barcelona target:
make CC=pgcc FC=pgfortran TARGET=BARCELONA