HPC and HTC Concepts
Key Concepts
High Performance Computing (HPC)
High Throughput Computing (HTC)
AKA "Supercomputing"
The Computer
Actual Rackable Server
Faster computing
Reduce time to result
Increase accuracy
Increase throughput
Faster computing
Pressure from computing requirements
Pressure from physics
Pressure from Physics
Physics limits GHz speed
Heat and power-draw increase super-linearly with speed
Pressure from computing requirements
Bottlenecks everywhere
Limits of RAM
Parallelism
Serial Process:
- A process in which its sub-processes happen sequentially in time.
- Only one sub-process is active at any given time.
Parallel Process:
- Process in which multiple sub-processes can be active simultaneously.
Parallelism
Serial Process:
- Speed depends only on the rate at which each sub-process will occur (e.g. processing unit clock speed).
Parallel Process:
- Speed depends on execution rate of each sub-process AND how many sub-processes can be made to occur simultaneously.
Parallel for Capacity
Example: Searching database of web-sites for some text (i.e. Google)
Searching sequentially through large volumes of text too time-consuming.
- Multiple servers hold different pages.
- Each server can report the result of each individual search.
- The more servers you add the quicker the search is.
Parallel for Capability
Example: Large scale weather simulation
Detailed description for atmosphere too large to run on today's desktop or server PCs.
- Multiple servers are needed to hold all grid data in memory.
- Servers need to quickly communicate to synchronise work over entire grid
- Communication between servers can become a bottleneck.
Parallelism
All computers today are designed for parallel execution
But...
Not all computation can be done in parallel!
Amdahl's Law
1 Processor
Amdahl's Law
2 Processors
Amdahl's Law
10 Processors
Amdahl's Law
- Amdahl's law will always be a limiting factor
$\frac{1}{\left(1-P\right)+\frac{P}{N}}$
- Some applications need to be re-written, algorithms re-thought to harness ever more parallel machines.
Amdahl's Law
- If 95% of the program can be parallelized, the theoretical maximum speedup using parallel computing would be 20x, no matter how many processors are used.
There are many parallel paradigms
Processor farms, pipelining, divide/conquer, geometric decomposition, cellular automata, algorithmic parallelism
- From a systems perspective:
- Shared memory
- Message passing
SMP (Symmetric Multi-Processing)
SMP (Symmetric Multi-Processing)
Beowulf Cluster
Infiniband Cluster
NUMA (Non-Uniform Memory Access)
Interconnect comparisons
Interconnect | Typical MPI latency (microseconds) | Typical bandwidth (MB/s) |
---|---|---|
1Gbs Ethernet | 60-90 | 90 |
10Gbs Ethernet | 12-20 | 800 |
Infiniband | 2-4 | 250-1200 |
NUMALink 4 | ~1 | 3000 |
QPI | ~0.5 | 20000 |
GPU accelerator
Only a subset of problems can be optimised
Not all problems can be parallelised
Not all parallel problems can be ported to accelerators
In summary
Developments in CPU architecture still continue: Moore’s law still valid
Improvements to CPU architecture consist of increasing internal parallelism: hyper-threading, wider SIMD units, more cores per chip
Accelerators are bringing more computing cycles to the table (albeit of a special type).
And that is just the compute side of the story
Data storage and Network infrastructure need to keep up with computation and other data producing technologies...
upgrading network infrastructure
re-thinking data storage
- SSD’s
- parallel file systems
- Multi-tiered storage
- Map-Reduce data platforms
Supercomputing
Used efficiently, Supercomputers let you get more done faster.
They can be useful for a large number of types of work.
Applications
The Cluster
Overview
The Node
Compute Node
- Just computes - little else
- Private IP address - no user access
Login Node
- User login
- Interaction with job scheduler
- Public IP address - connects to external network
Generic Software stack
Real applications
FEA - Finite Element Analysis
- The simulation of hard physical materials, e.g. metal, plastic
- Crash test, product design, suitability for purpose
- Examples: MSC Nastran, Ansys, LS-Dyna, Abaqus, ESI PAMCrash, Radioss
CFD - Computational Fluid Dynamics
- The simulation of soft physical materials, gases and fluids
- Engine design, airflow, oil reservoir modelling
- Examples: Fluent, Star-CD, CFX
Real applications
Geophysical Sciences
- Seismic Imaging - taking echo traces and building a picture of the sub-earth geology
- Reservoir Simulation - CFD specific to oil asset management
- Examples: Omega, Landmark VIP and Pro/Max, Geoquest Eclipse
Life Sciences
- Understanding the living world - genome matching, protein folding, drug design, bioinformatics, organic chemistry
- Examples: BLAST, Gaussian, LAMMPS, Trinity, Amber, NAMD
Real applications
High Energy Physics
- Understanding the atomic and sub-atomic world
- Software from Fermi-Lab or CERN, or home-grown
Financial Modelling
- Meeting internal and external financial targets particularly regarding investment positions
- VaR - Value at Risk - assessing the impact of economic and political factors on the bank’s investment portfolio
- Trader Risk Analysis - what is the risk on a trader’s position, a group of traders
Operating system
The vast majority of Clusters in the world use some flavour of Unix or Linux for their OS.
The most common form of interaction with these systems is a "shell" or "command line".
(Which we are going to learn about using Legion’s Login nodes.)
Operating system
Clusters are very frequently used as a shared facility.
As such, work needs to be scheduled via a batch system.
Jobs are queued and prioritised based on requested resources.
(These are the focus of tomorrow’s session.)