12/02/2009

BLAS & LAPACK - Math Kernel for Scientists

1. The Standard Interface

BLAS (basic linear algebra software) and LAPACK (linear algebra package) are standards for linear algebra routines.

2. The Various Implementations

The reference BLAS [2] is the reference implementation of the BLAS standard. It is usually slower than machine-optimised versions, but can be used if no optimised libraries are accessible.

It is available from http://www.netlib.org/blas/.

The reference LAPACK [1] is the reference implementation of the LAPACK standard. Its performance is heavily dependent on the underlying BLAS implementation.

It is available from http://www.netlib.org/lapack/.

The Intel MKL (Math Kernel Library) implements (among others functionality, such as FFT) the BLAS and LAPACK functionality. It is optimised for Intel CPUs.

It's available from http://software.intel.com/en-us/intel-mkl/

The AMD ACML (AMD Core Math Library) is AMD’s optimised version of BLAS and LAPACK,
and also offers some other functionality (e.g. FFT).

It's available from http://developer.amd.com/acml.jsp

The Goto BLAS [3][4] is a very fast BLAS library, probably the fastest on the
x86 architecture.

It is available from http://www.tacc.utexas.edu/software_modules.php.

Its main contributor is Kazushige Gotō, who is famous for creating hand-optimized assembly routines for supercomputing and PC platforms that outperform best compiler generated codes. Some news report about him: "Writing the Fastest Code, by Hand, for Fun: A Human Computer Keeps Speedup Chips", "The Human Code".

The ATLAS (automatically tuned linear algebra software, [5]) contains the
BLAS and a subset of LAPACK. It automatically optimises the code for the machine on which it
is compiled.

It is available from http://math-atlas.sourceforge.net/.

3. Extensions to Cluster System (Distributed Memory Parallel Computer)

BLACS [6] are the basic linear algebra communication subprograms. They
are used as communication layer by ScaLAPACK. BLACS itself makes use of PVM (parallel
virtual machine) or MPI.

BLACS is available from http://www.netlib.org/blacs/

ScaLAPACK [7] is a library for linear algebra on distributed memory architectures. It implements routines from the BLAS and LAPACK standards. ScaLAPACK makes it possible to distribute matrices over the whole memory of a distributed memory machine, and use routines similar to the standard BLAS and LAPACK routines on them.

ScaLAPACK is available from http://www.netlib.org/scalapack/

PLAPACK [8] is also a library for linear algebra on distributed memory architectures. Unlike ScaLAPACK, it attempts to show that by adopting an object based coding style, already popularized by the Message-Passing Infrastructure (MPI), the coding of parallel linear algebra algorithms is simplified compared to the more traditional sequential coding approaches.

PLAPACK is available from http://www.cs.utexas.edu/~plapack/

[Reference]

[1] LAPACK User's Guide
[2] Basic Linear Algebra Subprograms for FORTRAN usage

[3] Anatomy of high-performance matrix multiplication
[4] High-performance implementation of the level-3 BLAS

[5] Automated Empirical Optimization of Software and the ATLAS project

[6] A user’s guide to the BLACS
[7] ScaLAPACK: A scalable Linear Algebra Library for Distributed Memory Concurrent Computers
[8] PLAPACK: Parallel Linear Algebra Libraries Design Overview
[9] PLAPACK vs ScaLAPACK

No comments: