N-Body methods involve calculations that depend on interactions between many discrete points. There are many variations of the basic concept: in particle-particle methods, every point depends on all others, leading to an O(N^2) calculation. Hierarchical particle methods combine forces or potentials from multiple points to reduce the computational complexity to O(N log N) for Barnes-Hutt or O(N) for Fast Multipole.
This dwarf does not cover certain particle methods, such as particle-in-cell codes. PIC codes rely on a uniform grid of cells: individual particles interact with the grid (for example, depositing charges or forces); a structured grid calculation then processes the grid; finally, the grid interacts with individual particles, updating their position or other characteristics. Since there are no direct particle-particle interactions of any sort, PIC codes are not an example of the N-Body dwarf.
In a full N^2 algorithm, floating computation will quickly swamp memory bandwidth requirements. A simple approach will iterate through every point, adding forces from every other point with multiple unit-stride accesses through a large array. Cache-blocking can be used to increase the benefits of caching, but because the points can move, the structure will have to be rebalanced occasionally.
A better algorithm will use a divide-and-conquer strategy. Generally, these strategies rely on the ability to treat multiple distant points as a single point. The idea can be applied recursively: a distant group of points may be treated as a single point; that group may be further divided into subgroups for the purposes of performing the calculation for points within the group, and so on.
A recursive mapping of the points in this fashion is often represented as a tree. A large space may be broken into multiple subspaces; any of those subspaces that contain more than a single point may be further broken into multiple sub-subspaces, and so on. The result will be a tree that reflects the position of all points.
A common instance of hierarchical N-Body methods is described in "A Hierarchical O(n log n) force calculation algorithm", J. Barnes and P. Hut, Nature, v. 324, December 1986. The high-level version of the algorithm involves first building a tree as described above; then, for each subsquare, computing the center of mass and total mass that it contains; and finally, computing the force for each particle.
The force is calculated by using a ratio of the size of a subsquare to the the distance from the current particle to the center of mass of the subsquare. If the ratio is below a threshold, then the calculation is performed using the center of mass and total mass; if the ratio is above the threshold, then the calculation uses the sub-subsquares.
Fast Multipole Method
A faster, more accurate method is described in "Rapid Solution of Integral Equations of Classical Potential Theory", V. Rokhlin, J. Comp. Phys. v. 60, 1985 and "A Fast Algorithm for Particle Simulations", L. Greengard and V. Rokhlin, J. Comp. Phys. v. 73, 1987. This method, known as the Fast Multipole Method (or FMM), differs from Barnes-Hut by computing potentials at every point (rather than forces), and by using a fixed set of boxes with more information rather than a varying number of boxes with fixed information.
One of the most important issues with running parallel N-Body methods is load balancing (including dynamic rebalancing). As bodies move within the simulation, the hierarchy that is exploited in the advanced N-Body methods becomes more difficult to harness efficiently.
o You can get a particle/particle code from the Berkeley CS267 homeworks. E.g., see the Sharks and Fish at: (http://www.cs.berkeley.edu/~demmel/cs267/Sharks_and_Fish/ )
o MINDY (molecular dynamics code)
o IMPACT: Accelerator Modeling
o Bode and Ostriker: TPM (Tree Particle Mesh) cosmology code (http://www.astro.princeton.edu/~bode/TPM/)
o NBody/Barnes Hut in: http://crd.lbl.gov/~oliker/papers/jpdc_02.pdf
o UPIC is specialized framework for creating PIC codes (not a language, but offers the right subset of semantics to act a s toolkit for building these codes).
o MD-GRAPE: (http://www.research.ibm.com/grape/): Specialized for molecular dynamics calculations.
How Current Computer Architectures Fail:
o Pointer-chasing (Particle Tree codes): The tree codes can use a linked list representation of the particles. The list is used to represent the restricted locality of influence of each particle (the distance on this list is typically related to the locality of influence). Prefetch cannot help accelerate link-following in such lists, so such operations tend to be bounded by the round-trip latency to main memory. Transforming the code to support gang-lookups (multiple simultaneous outstanding pointer-lookups) currently requires considerable code transformation (could be supported by massive multithreading, but requires a lot of threads to hide the memory latency) Dedicated hardware for following linked lists that is located in DRAM can improve efficiency by reducing the latency of such operations, but may be difficult to get commodity memory vendors to support.
o More details here: http://www.cs.berkeley.edu/~demmel/cs267/lecture26/lecture26.html Most of the hierarchical n-body method discussion was adapted from this page.