next up previous
Next: About this document ...


Zen and the Art of Beowulf Clusters





Robert G. Brown
Duke University Physics Department
rgb@phy.duke.edu



November 8, 2006



Principles of Zen



Principles of Beowulf Clusters

(an obvious connection, right...? :-)



How to build a Parallel Cluster

There are a few details, of course. To design an optimal cluster for any given task one has to understand parallel computations and how to match them to cluster design.



So what ARE Parallel Computations?

Figure 1: Parallelization of a Task
\begin{figure}\centerline{
\centerline{\epsfbox{parallel_work.eps}}
}
\end{figure}



Amdahl's Law

The speedup $S$ experienced running a task on $P$ processors is less than or equal to:

\begin{displaymath}
S \le \frac{(\rm T_s + T_p)}{(\rm T_s + (T_p/P))}
\end{displaymath} (1)

where $T_s$ is the time program spends doing ``serial work'' and $T_p$ is the time spend doing ``parallelizable work'' split up on $P$ processors.

Limiting result, not horribly useful quantitatively except to tell you when there is no point in parallelizing something. Can do much better.



For example, we can account for the time spent communicating between processors, the time spent setting things up, and changes in the times to perform various tasks with different algorithms. Defining things like:

${\bf T_s}$ The original single-processor serial time.
${\bf T_{is}}$ The (average) additional serial time spent doing things like IPC's, setup, and so forth, per processor, in all parallelized tasks.
${\bf T_p}$ The original single-processor parallizable time.
${\bf T_{ip}}$ The (average) additional time spent by each processor doing just the setup and work that it does in parallel. This may well include idle time, which is often important enough to be accounted for separately.
we can obtain improved estimates of the speedup:
\begin{displaymath}
T_{\rm tot}(P) = T_s + P*T_{is} + T_p/P + T_{ip}.
\end{displaymath} (2)

or
\begin{displaymath}
S \approx \frac{T_s + T_p}{T_s + P*T_{is} + T_p/P +
T_{ip}}.
\end{displaymath} (3)



All You Need to Know
About Code Granularity

Granularity typically is somewhat controllable. Network speed and latency, scaling of computation to communication as a function of problem size, CPU/memory speed, program organization all control variables.

Fine grained tasks are ``bad'' for scaling to many nodes $N$. Coarse grained tasks are ``good''.

Beware nonlinearities! CPU/cache/memory/disk bottlenecks can create ``superlinear speedup'' and violate Amdahl's Law!



Huh? Whaddideesay?

In all the figures below, $T_s = 10$ (which sets our basic scale, if you like) and $T_p = 10,100,1000,10000,100000$. In the first three figures we just vary $T_{is} = 0,1,10$ for $T_{ip} = 1$ (fixed). Finally, the last figure is $T_{is} = 1$, but this time with a quadratic dependence $P^2*T_{is}$.

\begin{figure}\centerline{
\centerline{\epsfbox{bscaling.Tis0.eps}\epsfbox{bscaling.Tis1.eps}}
}
\end{figure}

\begin{figure}\centerline{
\centerline{\epsfbox{bscaling.Tis10.eps}\epsfbox{bscaling.Tis1.Psq.eps}}
}
\end{figure}



Designs: NOW/COW/Beowulf

Goal is optimizing overall performance per dollar. The following are appropriate for increasing fineness of program granularity:

These are suitable for increasingly fine granularity, at increasing cost and decreasing general purpose utility.

Schematics for the general designs follow, first a ``true beowulf'' and then a workstation cluster.



A True Beowulf


\begin{figure}\centerline{
\centerline{\epsfbox{beowulf.eps}}
}
\end{figure}



A Workstation Cluster


\begin{figure}\centerline{
\centerline{\epsfbox{cluster.eps}}
}
\end{figure}



Node Design and Cost

The following are some possible node configurations and prices:

The cheapest barebones clusters for learning and experimentation can cost surprisingly little. On www.clustermonkey.net, for example, you can find an article on a ``value cluster'' - an 8 node cluster that cost $2500 total in 2005. This is easily within the reach of individuals, clubs, or small schools.



Turnkey Clusters

Turnkey clusters can make sense if you are building a very specialized cluster and need help designing and installing it. A turnkey integrator will typically resell the hardware components to you pretty much at standard retail marked up to cover their ``integration fee'' for designing the cluster, installing the clusterware on it, and so forth. This ends up being anywhere from a 20% markup of OTC prices on up.



Cluster Networks



Parallel Program Support



Simple Example: xep (PVM Mandelbrot Set)

On a good day, this will work as a demo...



Physical Infrastructure Requirements



Physical Infrastructure Costs

Note well that recurring costs for operating a node can compete with the cost of the node! This favors getting relatively expensive nodes and dumping nodes quickly when obsolete!



Administrative Infrastructure

In summary, Min: $\sim$ 1 hour a day, on average, for a ``good'' 100+ node cluster; Max: full time job and then some for a ``bad'' cluster (depending on luck, hardware reliability, your general admin skills, your cluster admin skills, user support requirements, and the availability of cluster expertise in a distributed support environment).



Conclusions
Total Cost of Ownership (TCO) can range from: Wide range, provokes TCO fistfights in bars.

Still, beowulfish clusters often yield staggering productivity efficiency. Generally 3-10x more cost/benefit than comparable power ``big iron''. SO, literally everybody is buying or building them.



References and Resources



Conclusion and Example




Koan: Zen Wulf

Computers are linked
Fleeting packets join their power
Data Satori!




next up previous
Next: About this document ...
Robert G. Brown 2006-11-08