You have reached my webpage that completes Assignment 0 for CS240A taught during Spring '07 at UC Santa Barbara. This page is supposed to introduce a real-world application of parallel computing. Simulation of a N-body system has been implemented by Liu and Bhatt using Barnes-Hut algorithm on a Connection Machine supercomputer. Short descriptions of the problem, target machine and overview of results are given. References include the original paper by Liu and Bhatt, technical description, images and short history of Connection Machines and art based on N-body problem.
N-Body Problem is the name of a class of problems that have driven the development of dynamical systems theory during the history. The problem is stated as follows:
Ensamble of N bodies in space is given. Influence of bodies on movement of other bodies is modeled by interaction force (traditionally, gravity). If initial positions and velocities of bodies are known, give positions and velocities of bodies at time T.
Relevance of the N-body problem is not only theoretical, but practical as well. Applications can be found in macroscale (astrophysics, satellite positioning), mesoscale (complex fluid dynamics) and in micro- and nanoscale (molecular dynamics, materials science).
Naive implementation of this problem would involve computing velocity and position of each body by summing up influence of other bodies on it. This implementation yields a O(n2) algorithmical complexity which is feasible only for a small number of bodies. Liu and Bhatt have adapted a more efficient Barnes-Hut algorithm for execution on a parallel machine.
Barnes-Hut algorithm can be divided in two conceptually different tasks: organization of bodies in a bookkeeping tree and computation of acceleration for each body. In sequential implementations, the cost first task is negligible compared to the cost of the second. Due to distributed storage of bodies' data, parallel implementation of maintaining the bookkeeping tree is nontrivial and its computational cost increases.
Additional problems that are brought on by parallel implementation result from load-balancing requirement. Not every body needs the same number of computations in Barnes-Hut algorithm. Load that each body adds to processors varies in time and cannot be predicted without actually solving the problem.
Implementation of the simulation as described by Liu and Bhatt was initially developed for Connection Machine CM-5E. Connection Machines is the name of a series of parallel machines developed during 1980s and early 1990s. They are characterized by a large number of processing units connected by an efficient interconnect, called Fat-Tree network. Model CM-5E consists of up to 16,384 SPARC vector processors. Simulations performed in presented papers were performed on a configuration with 256 processing nodes. Acronym describing architecture of CM-5E is DM-MIMD which stands for Distributed Memory - Multiple Instruction (Stream) Multiple Data (Stream) architecture. Each processing node has 128 MB of memory and peak computing performance of 160 Mflop/s. Sustained rate of 77 Mflop/s per processor or 19.7 Gflop/s total was reported with peak performance during simulation at 113 Mflop/s per processor (29.9 Gflop/s total).
Similar machine, differing only in number of processors - 128, was ranked 63rd in 1994 Top 500 list. Its last appearance was in 1997 when it was ranked as 493rd. CM-5 machines were very popular in mid 1990s, but the vendor, Thinking Machines, changed its business strategy and terminated further research and development of hardware in mid 1990s.
Code for simulation was written in C with Connection Machne CMMD library (v.3.0). Vector processors were programmed in CDPEAC which interfaces C code with DPEAC assembly language for vector processors.
Although the code was benchmarked on CM-5E, authors report developing a platform-independent framework for computation of N-body problems. Framework was tested on network of UltraSPARC-II workstations. Four computation nodes, each with 128 MB of memory, were connected using network with bandwith of 100 Mbps per node. Implementation is based on MPI.
Most of the discussion following presentation of results in the paper is devoted to comparing efficiency of implementation of the bookkeeping task to previous parallel implementations. General speedup is reported only for the platform-independent framework where speedup of 3.10 to 3.5 is reported for different types of problems (gravitation field interaction and fluid filament interaction) and for different sizes of problems ( 8 kbodies - 256 kbodies).
Authors attempted to convince the reader that they managed to efficiently implement the bookkeeping part of the Barnes-Hut algorithm for N-body problem. Performance diagrams show that this is correct with respect to problem size scaling. The part of execution devoted to bookkeeping only doubles as problem size quadruples. In comparison, force computation portion quadruples for the same problem size increase. However, the cost of bookkeeping does not arise from problem-size but rather as a parallelization side-effect. Additional performance results that would show how computation percentage of bookkeeping task scales with number of processors would add to relevance of results. Nevertheless, authors present their implementation transparently and show that their algorithms attempts to maximize amount of information transmitted per message exchange.
A rather large percentage of maximum theoretical performance (sustained 48% and peak 71%) was reported for the implementation which means that algorithm was well parallelized. Computation of individual force contributions remains to be a bottleneck for this problem. This particular task is difficult to parallelize for individual body, however this paper argues that parallelization overhead is low in this particular implementatin. Consequently, individual processing units may devote most of their time to computing forces and accelerations which is the task a sequential machine would spend most of its time on as well.
In presented paper authors tried to improve on parallel implementation of different segments of Barnes-Hut algorithm for simulation of N-body problem. Authors succeeded in showing that their implementation was superior to other competing implementations at that time. However, experimental results that would show how their implementation scales with number of processors would definitely make presented results more relevant.
My name is Marko Budisic (website). I am a first-year graduate student with the Department of Mechanical Engineering. Professor Igor Mezic serves as my advisor. My work is in the theory of dynamical systems and possible applications of dynamical systems tools to real-life problems. As of lately, this came to mean systems of atoms on substrate, common in materials science. I am also interested in control systems theory and pretty much anything connected to computers.