calculate_pi
- Calculates π by numerically integrating a curve.
\(\int_{0}^{1}\frac{4}{1+x^2}\text{d}x=\pi\)
cd ~/Scratch
cp -r /shared/ucl/apps/examples/calculate_pi_dir ./
cd calculate_pi_dir
make
./calculate_pi
#!/bin/bash -l
#$ -l h_rt=0:10:00
#$ -cwd
./calculate_pi
#$ -l h_rt=0:15:00
#$ -l memory=1M
#$ -l tmpfs=10G
|:---:|:---| | qsub
| submit job | | qstat
| view queue status and job info | | qdel
| stop & delete a job | | qrsh
| start an interactive session |
$ qsub submit.sh
Your job 3521045 ("submit.sh") has been submitted
$ qsub -terse submit.sh
3521045
Special comments are options for qsub
Check man qsub
for full lists
Every cluster is a little different
$ qstat
job-ID prior name user state submit/start at
-----------------------------------------------------------------
3521045 0.00000 submit.sh ccaaxxx qw 01/14/2014 14:51:54
Letter | Status |
---|---|
q |
queued |
w |
waiting |
r |
running |
E |
error |
t |
transferring |
h |
held |
qstat -j 3521045
(gives a lot of output)
(Demo)
Most common problems:
~/Scratch
(and thus not writable)Note that qstat -j
cuts off the end of the error message - try e.g. qexplain 53893
to see full error message.
$ qdel 3521045
ccaaxxx has deleted job 3521045
Exercise: Run the simple calculate_pi
program as a job.
Exercise: To see what environment variables are set by the scheduler, try making a job script that runs env
and puts the output in a file.
Exercise: sort
the file, and compare it to your current environment to see what has changed.
Scratch
directory/shared/ucl/apps/examples/openmp_pi_dir
directorymake
, and try running itcd ~/Scratch
cp -r /shared/ucl/apps/examples/openmp_pi_dir ./
cd openmp_pi_dir
make
./openmp_pi
#$ -pe smp 4
OMP_NUM_THREADS=4
to tell OpenMP you're only using 4 instead of all#$ -pe smp 4
Exercise: Try modifying the script from before to run the new program.
Exercise: Run versions with 1, 2, 3, and 4 cores, and compare the timings.
#!/bin/bash -l
#$ -l h_rt=0:10:00
#$ -pe smp 4
#$ -cwd
./openmp_pi
cd ~/Scratch
cp -r /shared/ucl/apps/examples/mpi_pi_dir ./
cd mpi_pi_dir
make
./mpi_pi
# This won't always work on clusters
#$ -pe mpi 36
machines
fileNote that each requested core gets the amount of memory requested.
#!/bin/bash -l
#$ -l h_rt=0:10:00
#$ -pe mpi 4
#$ -cwd
gerun ./mpi_pi
Exercise: Try modifying the script from before to run the new program, using 4, 8, 12, and 24 cores and the mpi
parallel environment.
#$ -t 3 <- (only runs one job)
#$ -t 1-3
#$ -t 1-7:2
This queues an array of jobs which only differ in how the $SGE_TASK_ID
variable is set.
Exercise: Try modifying the serial job script (calculate_pi
) to run 4 jobs as an array.
Exercise: calculate_pi
can take an argument to tell it how many steps to use. Try using this with $SGE_TASK_ID
to run using 300, 500, and 700 steps.
#!/bin/bash -l
#$ -l h_rt=0:10:00
#$ -t 1-4
#$ -cwd
./calculate_pi ${SGE_TASK_ID}0
The Lustre parallel filesystem performs worst when creating and writing to lots of little files.
Arrays of jobs often create files like this.
To help performance, run this type of job using the local storage on the node, and copy the files over when the job is complete.
Local Storage: $TMPDIR
#!/bin/bash -l
#$ -l h_rt=0:10:00
#$ -t 1-40000
#$ -cwd
cd $TMPDIR
$HOME/my_programs/make_lots_of_files \
--some-option=$SGE_TASK_ID
Then either:
cp * $SGE_WORK_DIR
or
cp -r $TMPDIR $SGE_WORK_DIR
Or, better for lots of files:
cd $SGE_O_WORK_DIR
tar -czf $JOB_ID.$SGE_TASK_ID.tar.gz $TMPDIR
zip -f $JOB_ID.$SGE_TASK_ID.zip $TMPDIR
Modules system helps to set up environment for applications.
Check module avail
to see what modules exist.
$ module avail
--- /shared/ucl/apps/modulefiles/core ----
gerun rcps-core/1.0.0
mrxvt/0.5.4 screen/4.2.1
[...]
(Demo)
Most modules add one or more programs to your $PATH
.
$ htop
bash: htop: command not found
$ module load htop
$ htop
You will see a colourful interactive process viewer.
$ module unload htop
$ htop
bash: htop: command not found
$ module show htop
(Demo)
Some modules depend on or conflict with other modules
Module 'a' depends on one of the module(s) 'b'
Module 'a' conflicts with the
currently loaded module(s) 'b'
(Demo)
e.g. r/recommended
loads a collection of other modules and then the R module.
#!/bin/bash -l
#$ -l h_rt=0:10:00
#$ -cwd
module unload compilers mpi
module load r/recommended
R --no-save --slave <<EOF >r.output.$JOB_ID
runif(50,0,1)
EOF
(generates a bunch of random numbers)
Other systems (e.g. Emerald) may use a slightly different scheduler system, so the scripts can be slightly different -- consult the relevant documentation.
#$ -pe mpi 24
#PBS -l nodes=2:ppn=12
#$ -pe smp 12
#PBS -l nodes=1:ppn=12
#$ -l h_rt=1:00:00
#PBS -l walltime=1:00:00
#$ -l memory=4G
#PBS -l mem=4gb
Legion: https://wiki.rc.ucl.ac.uk/mediawiki-1.23.9/images/a/ad/Legion_ref_sheet.pdf