Structured Grids

From View

Jump to: navigation, search
Structured Grid computational organization
Structured Grid computational organization
Structured Grid application example
Structured Grid application example

Contents

Description

Data is arranged in a regular multidimensional grid (most commonly 2D or 3D, sometimes 4D, but rarely higher). Computation proceeds as a sequence of grid update steps. At each step, all points are updated using values from a small neighborhood around each point. The general form of a structured grid computation is as follows.

Image:structured_grid_eqn_01.png

An example of a structured grid computation is the Jacobi method for solving the Poisson Problem, where u is the updated grid and f is a constant grid:

Image:structured_grid_eqn_02.png

The updates are logically concurrent, but in practice are implemented as a sequential sweep through the computational domain. Updates may be in place, overwriting the previous version, or use two versions of the grid. In many cases the updates can be performed in parallel, if two versions of the grid are used or updates alternate (e.g., using a red-black pattern) so that values being updated are not being used as neighbors in the same step. These codes have a high degree of parallelism, and data access patterns are regular and statically determinable.

Uniprocessor Mapping

The code is highly vectorizable, using unit-stride or constant-stride memory accesses. The points can be visited in an order (both within one update step and across multiple sequential update steps) that provides spatial locality to make good use of long cache lines and temporal locality to allow cache reuse. Spatial locality is very high, and either hardware or software prefetching is effective given the predictable addressing pattern. Temporal locality is more limited. In one update step, the amount of temporal locality depends on the size of the neighborhood, as each data value is accessed once by each neighborhood that contains it. For large meshes, some neighboring values may not be preserved in cache due to cache-line aliasing, and even if they are preserved, each value is used only a constant number of times. (E.g., a point in a 3D mesh with a 6 immediate neighbors may have only 7 arithmetic operations for each point loaded from main memory.) In some cases, there is additional temporal locality across update steps, as the value of d tt may depend only on the values of d t in its neighborhood, and so a subset of the grid can be advanced multiple time steps using only locally produced data, although this complicates logic for loop control and address generation. This is somewhat rare in practice, as most algorithms require other computations, such as updates to boundary values or other data structures, between steps.

Parallel Mapping

This figure shows the pattern of ghost cell exchanges required for a 2D structured grid calculation.  The data is copied from the cells along the red lines and written into the ghost cells of the neighboring processors, which are highlighted with the blue lines.  This figure shows a ghost-cell depth of 1, but deeper ghost-cell regions can be used to trade bandwidth to hide additional latency.
Enlarge
This figure shows the pattern of ghost cell exchanges required for a 2D structured grid calculation. The data is copied from the cells along the red lines and written into the ghost cells of the neighboring processors, which are highlighted with the blue lines. This figure shows a ghost-cell depth of 1, but deeper ghost-cell regions can be used to trade bandwidth to hide additional latency.

Each processor can be statically assigned a contiguous subgrid, and can perform each update step locally and independently of other nodes. Each node only has to communicate and synchronize with neighboring nodes on the grid, exchanging data from the boundary of their sub-grids at the end of each step. The parallel efficiency is determined by the surface to volume ratio, and the width of the surface is determined by the width of the neighborhood surrounding each point. The redundant cells at the outer boundaries are termed “ghost zones”. Latency can be hidden by increasing the number of ghost-zones on the outer boundaries and exchanging more data less frequently.

Variant: Adaptive Mesh Refinement

Adaptive Mesh Refinement diagram
Enlarge
Adaptive Mesh Refinement diagram
Adaptive Mesh Refinement application
Enlarge
Adaptive Mesh Refinement application

The grid may be refined by overlaying higher-resolution grids across areas of interest, and the refinement process can proceed recursively and adaptively over time. The logical neighborhood of a given point may be on another grid patch at the same level of refinement or at a different level, thereby complicating indexing and reducing spatial locality. A significant amount of time, if not computational work, may be spent on the boundaries of all the patches and the prolongation and restriction operations that interpolate the data from coarse grids to refined grids and back again. On a parallel machine, this requires some form of dynamic load balancing to assign roughly equal amounts of work to each processor while minimizing the communication between them.

Elaboration

Credits

Original text by Krste Asanovic. Additions by Kathy Yelick, John Shalf, and Ras Bodik.

Personal tools