Sparse Linear Algebra

From View

Jump to: navigation, search
Sparse linear algebra matrix diagram.
Sparse linear algebra matrix diagram.



Sparse matrix algorithms are used when input matrices have such a large number of zero entries that it becomes advantageous, for storage or efficiency reasons, to “squeeze” them out of the matrix representation. Compressed data structures, keeping only the non-zero entries and their indices, are the norm here. As an example, consider sparse matrix-vector multiplication. In the standard compressed row format, the code is

do i=1,n
  do j=row_start(i),row_start(i+1)-1
    y(i) = y(i) + val(j)*x(col_index(j))

Uniprocessor Mapping

The codes are quite complex with graph algorithm components that attempt to exploit the structure of the non-zeros. In addition there are a high number of integer operations and indexed accesses. Modern algorithms on cache-based machines also attempt to take advantage of dense blocks so that the high-performing Level 3 operations can be used.

Parallel Mapping

The dependence structure of many algorithms is extremely intricate because of the desire to only compute on non-zero entries.



o Row Compressed Format (CSR): Compressed Sparse Row

o Block-Compressed Sparse Row (BCSR)


o SpMV

o NAS-CG: (

o OSKI (

o SuperLU ( (


o OSKI (

Specialized Languages:

o Possibly MatLab? (not obvious)

Specialized Hardware:

o Scatter-gather vector architectures are a good example of hardware that works well for sparse linear algebra.

How Current Computer Architectures Fail:

o Block Sizes: The BeBoP group does automatic searches for the optimal blocking arrangement for BCSR matrix encoding. (as shown by Jim Demmel’s diagrams in Dave’s slides).

o Stream Prefetch and Stanzas: For the CSR (row-compressed) sparse matrix encoding, it typically takes a comparatively long stream of consecutive stride-one memory accesses before the stream prefetch fully engages (as measured by the STRIAD benchmark in The result is poor efficiency for the CSR encoding unless explicit prefetches are inserted.

Personal tools