Autotuners have been gaining popularity as an effective approach to producing high-quality portable code. The goal of autotuning systems is to automate the task of hand-tuning code for a given machine. In practice, several autotuning systems have closely approached or even exceeded the performance of hand-tuned code.
Roughly speaking, autotuners optimize the performance of a set of library kernels by generating many variants of a given kernel and benchmarking each variant by running on the target platform. The search process effectively tries many or all optimizations and hence may take hours to complete on the target platform. However, the search often needs to be performed only once, when the library is installed. The resulting code is often several times faster than naive implementations, and a single autotuner can be used to generate high-quality code for a wide variety of machines.
To date, most of the work in this area has focused on autotuning computational kernels from applications in scientific computing, such as linear algebra and signal processing. The following are examples.
Dense Linear Algebra
ATLAS is an autotuning system for the Basic Linear Algebra Subprograms (BLAS), the standard library for basic dense linear algebra kernels like dot products, matrix multiplication, and triangular solve. ATLAS is widely used and is faster than vendor-supplied BLAS on many platforms.
The first autotuning system for dense matrix multiply was PHiPAC. ATLAS uses many additional optimization techniques beyond those included in the original PHiPAC implementation, and also provides a complete implementation for the entire BLAS.
The Formal Linear Algebra Method Environment (FLAME) project focuses on code-generation for dense linear algebra algorithms. Like the SPIRAL project, FLAME derives implementations from high-level specifications.
Sparse Linear Algebra
OSKI: Optimized Sparse Kernel Interface
OSKI (project webpage) is an autotuning system for sparse linear algebra kernels, including sparse matrix-vector multiply (SpMV) and sparse triangular solve, among others. The main challenge in tuning these kernels is the need for matrix-dependent tuning at run-time, instead of only off-line tuning as has been used with ATLAS and PHiPAC. OSKI tunes by combining off-line benchmarking with efficient modeling and matrix analysis at run-time. The basic tuning technique was originally proposed in the SPARSITY system for SpMV; OSKI extends SPARSITY with new tuning techniques and new kernels.
Spectral Methods / Signal Processing
FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and of both real and complex data. Pre-optimzed wisdom is one of the features of FFTW.