XClose

Introduction to High Performance and High Throughput Computing

Home
Menu

First Day

Exercise

Create and copy a directory tree. Use the command "man cp" for more information

Okay, so, to create the directories:

mkdir -p one_dir/two_dir/three_dir

Then if you try to copy them, you'll see: $ cp one_dir one_dir_copy cp: omitting directory `one_dir'

If you check the man page, you'll see:

       -R, -r, --recursive
              copy directories recursively

Which means you should use:

$ cp -r one_dir one_dir_copy

Exercise

Write a script that creates five directories named calculation_?, where ? is a number.

mk_dirs.sh

#!/bin/bash

# Dumb method without loop
mkdir calculation_1 calculation_2 calculation_3 calculation_4 calculation_5

# Method using loop with all terms written out
for i in 1 2 3 4 5
do
  mkdir calculation_$i
done

# Method using loop running seq to get terms
for i in $(seq 1 5)
do
  mkdir calculation_$i
done

# Method using loop with C syntax
for (( i=1; i<=5; i++ ))
do
  mkdir calculation_$i
done

Exercise

write a parent_script.sh that creates and executes the child_script.sh

write a parent_script.sh that creates and executes 10 different child_script.sh that print out their individual number

Okay, so first:

parent_script.sh

#!/bin/bash

echo "Parent script running."

echo "Making child script..."
cat <<EOF >child_script.sh
#!/bin/bash
echo "Child script running."
EOF

chmod +x child_script.sh

echo "Going to run child script."
./child_script.sh
echo "Child script finished."

Second:

parent_script.sh

#!/bin/bash

echo "Parent script running."


for i in $(seq 10)
do
echo "Making child script number $i in child_script_$i.sh ..."
cat <<EOF >child_script_$i.sh
#!/bin/bash
echo "Child script $i running."
EOF

chmod +x child_script_$i.sh
./child_script_$i.sh

done

Exercise

In ~/Scratch...

  • Create the following directory tree:

work work/input_data work/results work/program

mkdir work work/input_data work/results work/program
  • Create the file "input.txt" with some random lines in it.
echo "random line 1" >input.txt
echo "random line 2" >>input.txt
  • Move the file to input_data and rename it in the same command to control01.txt
mv input.txt work/input_data/control01.txt
  • Create the directory tree in one line only: work/experiment/results/report
mkdir -p work/experiment/results/report
  • Delete all directory trees created in one single command without explicit reference to any of the directory and file names except "work".
rm -r work

Exercise

Change the permission of a full directory tree with one single chmod command (look in the man pages for more information).

Because chmod -r would change the permissions of a file, chmod only uses -R for recursive action, so this is:

chmod -R g+w a_dir

for example, to give group write permissions to every file in a directory and that directory itself.

When typing the command "ls /sh", press the tab key after typing "/sh". What happens?

It should auto-complete as far as it can without being ambiguous. On Legion and Aristotle, this should expand to /shared.

Exercise

Create a "Hello world"-like script using command line tools and execute it.

Copy and alter your script to redirect output to a file using >.

Alter your script to use >> instead of >. What effect does this have on its behaviour?

So, one file:

hello_world.sh

#!/bin/bash
echo "Hello World!"

Then:

hello_world_redirect_1.sh

#!/bin/bash
echo "Hello world!" > output.txt

Then:

hello_world_redirect_2.sh

#!/bin/bash
echo "Hello world!" >> output_append.txt

output.txt will be overwritten each time, while output_append.txt will be appended to because of the difference between > and >>.

Exercise

Use seq 1 75 > numbers.txt to generate a file containing a list of numbers. Use the head, tail, and less commands to look at it, then use grep to search it for a number.

Use a combination of head and tail to get an exact line number

$ seq 1 75 > numbers.txt
$ less numbers.txt
$ head numbers.txt
1
2
3
4
5
6
7
8
9
10
$ tail numbers.txt
66
67
68
69
70
71
72
73
74
75
$ head -n 5 numbers.txt
1
2
3
4
5
$ head -n 5 numbers.txt | tail -n 1
5
$ grep 0 numbers.txt
10
20
30
40
50
60
70

Exercise

Using two nested scripts, show that the value of an exported variable in the environment where you launch the scripts, propagates all the way down to the second script.

So, two files:

parent.sh

#!/bin/bash

echo Setting variables in parent script:

variable_1=ham
variable_2=eggs

export variable_1

echo "variable_1: $variable_1"
echo "variable_2: $variable_2"

echo Running child script...

./child.sh

echo Child script finished, printing out parent script variable values again:
echo "variable_1: $variable_1"
echo "variable_2: $variable_2"

child.sh

#!/bin/bash

echo Child script variable values:
echo "variable_1: $variable_1"
echo "variable_2: $variable_2"

If you run these, you should see:

$ ./parent.sh
Setting variables in parent script:
variable_1: ham
variable_2: eggs
Running child script...
Child script variable values:
variable_1: ham
variable_2:
Child script finished, printing out parent script variable values again:
variable_1: ham
variable_2: eggs

Exercise

Use the command env to discover more.

You should see a long list of variables something like this:

LC_PAPER=en_GB.UTF-8
MODULE_VERSION_STACK=3.2.6
SSH_CONNECTION=128.41.10.106 61693 144.82.108.231 22
MODULESHOME=/shared/ucl/apps/modules/3.2.6/Modules/3.2.6
LESSOPEN=||/usr/bin/lesspipe.sh %s
CC=icc
HOSTNAME=login06

Exercise

Using $1 and $2, write a script that print both variables to the screen.

So, for bonus points and to show some other related variables, here's a little extra:

cmd_args.sh

#!/bin/bash

echo "This is the first argument:  $1"
echo "This is the second argument: $2"
echo "This is the number of arguments: $#"
echo "And this is all the arguments: $@"
echo "\$* and \$@ do almost the same thing, but you probably always want to use \$@"
echo "\$* gets expanded slightly differently and can break arguments"

Second Day

Environment within a Job

Exercise

To see what environment variables are set by the scheduler, try making a job script that runs env and puts the output in a file.

You can take the job script from earlier in the notes as a starting point here, which looks like this:

calculate_pi.sh

#!/bin/bash -l
#$ -l h_rt=0:10:00
#$ -cwd

./calculate_pi

Then we just want to change it to run env and redirect the standard output (stdout) to a file, using the > operator, like this:

env_to_file.sh

#!/bin/bash -l
#$ -l h_rt=0:10:00
#$ -cwd

env >env.output.from_job

Submit this using qsub and you should have a file env.output.from_job containing all the environment variables defined in the shell the job is running as.

Exercise

Now sort the file, and compare it to your current environment to see what has changed.

The method shown here is only one of a few possible ways to do this.

Let’s first sort the file we got from the last step, by putting the file as an argument to the sort command, and redirecting the output into a new file:

sort env.output.from_job >env.output.from_job.sorted

Then get a sorted file containing your current environment:

env | sort >env.output.current.sorted

Now compare the two, side-by-side:

sdiff env.output.from_job.sorted env.output.current.sorted

Requesting Threads

Exercise

Try modifying the script from before to run the new program.

Exercise

Run versions with 1, 2, 3, and 4 cores, and compare the timings.

As before, we can take the previous script and edit it a bit, adding in the new option to request some more cores, getting something like this:

openmp_pi.sh

#!/bin/bash -l
#$ -l h_rt=0:10:00
#$ -pe smp 4
#$ -cwd

./openmp_pi

At this point you can just use 4 different jobs, setting the smp requested value for each.

You should find that the time taken divides roughly as the number of cores you request.

Multinode jobs

Exercise

Try modifying the script from before to run the new program, using 8 cores and the mpi parallel environment.

Since you’re given the 4 core version, it’s just a matter of changing the 4 to an 8. This should give you:

mpi_pi.sh

#!/bin/bash -l
#$ -l h_rt=0:10:00
#$ -pe mpi 8
#$ -cwd

gerun ./mpi_pi

You should find again that increasing this number reduces the time taken, with diminishing returns past a certain point. The output will tell you which nodes your job runs on, so you can tell whether it’s run on more than one actual node or not.

Requesting an Array Job

Exercise

Try modifying the serial job script for calculate_pi to run 4 jobs as an array.

So, all we have to do is add in the array job option to the serial job script, to get:

array_pi.sh

#!/bin/bash -l
#$ -t 1-4
#$ -l h_rt=0:10:00
#$ -cwd

./calculate_pi

Exercise

calculate_pi can take an argument to tell it how many steps to use. Try controlling this with $SGE_TASK_ID.

We’re only using 4 jobs here, and 4 steps of calculating pi is hardly any calculation, so we can do more steps either by altering the way the array tasks are numbered:

# Gives 100,200,300,400 in SGE_TASK_ID variable
#$ -t 100-400:100

Or by adding some zeroes onto the end of the variable where it's used in the script:

array_pi-vary_steps.sh

#!/bin/bash -l
#$ -t 1-4
#$ -l h_rt=0:10:00
#$ -cwd

./calculate_pi ${SGE_TASK_ID}00

Module Prerequisites

Exercise

Successfully load the latest Graphviz module (graphviz/2.38.0/gnu-4.9.2).

This is mostly just an exercise in recognising what the modules system is telling you in terms of what requires what. Once you work out all the dependencies, you end up having to have typed:

module unload compilers
module load compilers/gnu/4.9.2
module load swig/3.0.5/gnu-4.9.2
module load qt/4.8.6/gnu-4.9.2
module load ghostscript/9.16/gnu-4.9.2
module load python/2.7.9
module load lua/5.3.1
module load perl/5.22.0
module load graphviz/2.38.0/gnu-4.9.2