Dense Linear Algebra
From View
←Older revision | Newer revision→
Contents |
Description
These are the classic vector and matrix operations, traditionally divided into Level 1 (vector/vector), Level 2 (matrix/vector), and Level 3 (matrix/matrix) operations. Data is typically laid out as a contiguous array and computations on elements, rows, columns, or matrix blocks are the norm. For example, matrix multiplication is commonly expressed as
do i=1,n
do j=1,n
do k=1,n
a(i,j) = a(i,j) + b(i,k)*c(k,j)
enddo
enddo
enddo
Uniprocessor Mapping
These codes are a very natural match for vector machines with data accesses mostly having unit and constant strides. For cache based machines, the development of block algorithms has resulted in performance for Level 3 operations that can approach the peak speed of the machine. Thus, many algorithms have been recast in order to exploit matrix/matrix operations as much as possible.
Parallel Mapping
In addition to memory hierarchy issues, data distribution for load balance becomes a priority here. For this reason 2-d block cyclic distributions have become the preferred ones for many matrix algorithms. Further, the use of computation/communication overlap has permitted these algorithms to scale very well, as exhibited by the tremendous performance of the Top 500 High Performance Linpack benchmark.
Elaboration
Credit
Original text by Parry Husbands.
