Sparse Linear Algebra
From View
←Older revision | Newer revision→
Contents |
Description
Sparse matrix algorithms are used when input matrices have such a large number of zero entries that it becomes advantageous, for storage or efficiency reasons, to “squeeze” them out of the matrix representation. Compressed data structures, keeping only the non-zero entries and their indices, are the norm here. As an example, consider sparse matrix-vector multiplication. In the standard compressed row format, the code is
do i=1,n
do j=row_start(i),row_start(i+1)-1
y(i) = y(i) + val(j)*x(col_index(j))
enddo
enddo
Uniprocessor Mapping
The codes are quite complex with graph algorithm components that attempt to exploit the structure of the non-zeros. In addition there are a high number of integer operations and indexed accesses. Modern algorithms on cache-based machines also attempt to take advantage of dense blocks so that the high-performing Level 3 operations can be used.
Parallel Mapping
The dependence structure of many algorithms is extremely intricate because of the desire to only compute on non-zero entries.
Elaboration
Variations:
o Row Compressed Format (CSR): Compressed Sparse Row
o Block-Compressed Sparse Row (BCSR)
Examples:
o SpMV
o NAS-CG: (http://www.nas.nasa.gov/Software/NPB/)
o OSKI (bebop.cs.berkeley.edu/oski/)
o SuperLU (crd.lbl.gov/~xiaoye/SuperLU/) (http://crd.lbl.gov/~oliker/papers/siamreview02.pdf)
AutoTuning:
o OSKI (bebop.cs.berkeley.edu/oski/)
Specialized Languages:
o Possibly MatLab? (not obvious)
Specialized Hardware:
o Scatter-gather vector architectures are a good example of hardware that works well for sparse linear algebra.
How Current Computer Architectures Fail:
o Block Sizes: The BeBoP group does automatic searches for the optimal blocking arrangement for BCSR matrix encoding. (as shown by Jim Demmel’s diagrams in Dave’s slides).
o Stream Prefetch and Stanzas: For the CSR (row-compressed) sparse matrix encoding, it typically takes a comparatively long stream of consecutive stride-one memory accesses before the stream prefetch fully engages (as measured by the STRIAD benchmark in http://crd.lbl.gov/~oliker/papers/msp_2005.pdf). The result is poor efficiency for the CSR encoding unless explicit prefetches are inserted.
