Parallel Machines and Parameters

From View

Jump to: navigation, search


Parallel Machines

Machine Name Bassi DaVinci Jacquard Seaborg
Machine Type IBM p575 SGI Altix 350 Opteron Cluster IBM SP
Nodes 111 1 356 380
Processors per Node 8 (dual core) 32 2 16
Memory per Node 32GB 192GB 6GB 16GB(312),
Memory per Processor 4GB 6GB 3GB 1-4GB
Network Federation SGI NUMAlink Infiniband IBM Colony
Processor Type 1.9GHz POWER5 1.4GHz Itanium2 2.2GHz Opteron 0.375GHz POWER3
DRAM bandwidth (GB/s)
compute per processor (GFLOP/s) 7.6 5.6 4.4 1.5
gamma (seconds/flop) 1.32e-10 1.79e-10 2.28e-10 6.67e-10
Latency - intra-node (us) 1.84 9.54
Latency - inter-node (us) 5.53 5.70 24.65
Bandwidth - intra-node (MB/s) 7487.15 563.72
Bandwidth - inter-node (MB/s) 1704.28 821.89 360.05

alpha/beta/gamma Methodology

The latency and bandwidth parameters in the table were computed using OSU's MPI latency and bandwidth benchmarks (see the "Benchmarks" section of [1]) on 11/16/06. For the latency test they send a message of n bytes and then the other side responds with that same message size. For the bandwidth test they send a window of overlapped messages and wait for all of them to finish and send a 1 byte ack signaling the arrival of all the messages.

The latency above is the latency at low message sizes. The bandwidth above is the highest bandwidth achieved. However, performance models that use beta will probably do better if the actual beta they use is based on some sort of lookup based on the message size.

The “gamma” parameter is 1/(flop rate) in seconds. The “flop rate” used to compute the values of gamma in the table is the clock rate multiplied by the number of flops per cycle.

Note that the Bassi values differ from those published by NERSC. Were the same environment variables set? Similarly, how do Seaborg's numbers compare to NERSC's scalability report? And you've measured Jacquard's bandwidth as much higher than the interconnect's peak of 620 MB/s...

Personal tools