Parallel Machines and Parameters
From View
←Older revision | Newer revision→
Introduction
- Microbenchmarks
- References
- Pathscale's white papers
- They compare versions of their low-latency interconnect with others.
- GAMMA: The Genoa Active Message MAchine
- Shows what ethernet can do.
- Myrinet 10GigE
- Motherboard benchmarks
- From Myrinet, shows which commercial x86 motherboards can deal with 10Gb speeds. Shows wide variations in achievable network bandwidth.
- Recent (Nov 06) deployment of Opterons with benchmarks
- Pathscale's white papers
Parallel Machines
| Machine Name | Bassi | DaVinci | Jacquard | Seaborg |
| Machine Type | IBM p575 | SGI Altix 350 | Opteron Cluster | IBM SP |
| Location | NERSC | NERSC | NERSC | NERSC |
| Nodes | 111 | 1 | 356 | 380 |
| Processors per Node | 8 (dual core) | 32 | 2 | 16 |
| Memory per Node | 32GB | 192GB | 6GB | 16GB(312), 32GB(64), 64GB(4) |
| Memory per Processor | 4GB | 6GB | 3GB | 1-4GB |
| Network | Federation | SGI NUMAlink | Infiniband | IBM Colony |
| Processor Type | 1.9GHz POWER5 | 1.4GHz Itanium2 | 2.2GHz Opteron | 0.375GHz POWER3 |
| DRAM bandwidth (GB/s) | ||||
| compute per processor (GFLOP/s) | 7.6 | 5.6 | 4.4 | 1.5 |
| gamma (seconds/flop) | 1.32e-10 | 1.79e-10 | 2.28e-10 | 6.67e-10 |
| Latency - intra-node (us) | 1.84 | 9.54 | ||
| Latency - inter-node (us) | 5.53 | 5.70 | 24.65 | |
| Bandwidth - intra-node (MB/s) | 7487.15 | 563.72 | ||
| Bandwidth - inter-node (MB/s) | 1704.28 | 821.89 | 360.05 |
alpha/beta/gamma Methodology
The latency and bandwidth parameters in the table were computed using OSU's MPI latency and bandwidth benchmarks (see the "Benchmarks" section of [1]) on 11/16/06. For the latency test they send a message of n bytes and then the other side responds with that same message size. For the bandwidth test they send a window of overlapped messages and wait for all of them to finish and send a 1 byte ack signaling the arrival of all the messages.
The latency above is the latency at low message sizes. The bandwidth above is the highest bandwidth achieved. However, performance models that use beta will probably do better if the actual beta they use is based on some sort of lookup based on the message size.
The “gamma” parameter is 1/(flop rate) in seconds. The “flop rate” used to compute the values of gamma in the table is the clock rate multiplied by the number of flops per cycle.
Note that the Bassi values differ from those published by NERSC. Were the same environment variables set? Similarly, how do Seaborg's numbers compare to NERSC's scalability report? And you've measured Jacquard's bandwidth as much higher than the interconnect's peak of 620 MB/s...
