First Day
Exercise
Create and copy a directory tree. Use the command "man cp" for more information
Okay, so, to create the directories:
mkdir -p one_dir/two_dir/three_dir
Then if you try to copy them, you'll see:
$ cp one_dir one_dir_copy
cp: omitting directory `one_dir'
If you check the man
page, you'll see:
-R, -r, --recursive
copy directories recursively
Which means you should use:
$ cp -r one_dir one_dir_copy
Exercise
Write a script that creates five directories named
calculation_?
, where ? is a number.
mk_dirs.sh
#!/bin/bash
# Dumb method without loop
mkdir calculation_1 calculation_2 calculation_3 calculation_4 calculation_5
# Method using loop with all terms written out
for i in 1 2 3 4 5
do
mkdir calculation_$i
done
# Method using loop running seq to get terms
for i in $(seq 1 5)
do
mkdir calculation_$i
done
# Method using loop with C syntax
for (( i=1; i<=5; i++ ))
do
mkdir calculation_$i
done
Exercise
write a
parent_script.sh
that creates and executes thechild_script.sh
write a
parent_script.sh
that creates and executes 10 differentchild_script.sh
that print out their individual number
Okay, so first:
parent_script.sh
#!/bin/bash
echo "Parent script running."
echo "Making child script..."
cat <<EOF >child_script.sh
#!/bin/bash
echo "Child script running."
EOF
chmod +x child_script.sh
echo "Going to run child script."
./child_script.sh
echo "Child script finished."
Second:
parent_script.sh
#!/bin/bash
echo "Parent script running."
for i in $(seq 10)
do
echo "Making child script number $i in child_script_$i.sh ..."
cat <<EOF >child_script_$i.sh
#!/bin/bash
echo "Child script $i running."
EOF
chmod +x child_script_$i.sh
./child_script_$i.sh
done
Exercise
In
~/Scratch
...
- Create the following directory tree:
work
work/input_data
work/results
work/program
mkdir work work/input_data work/results work/program
- Create the file "
input.txt
" with some random lines in it.
echo "random line 1" >input.txt
echo "random line 2" >>input.txt
- Move the file to
input_data
and rename it in the same command tocontrol01.txt
mv input.txt work/input_data/control01.txt
- Create the directory tree in one line only:
work/experiment/results/report
mkdir -p work/experiment/results/report
- Delete all directory trees created in one single command without explicit reference to any of the directory and file names except "
work
".
rm -r work
Exercise
Change the permission of a full directory tree with one single chmod command (look in the man pages for more information).
Because chmod -r
would change the permissions of a file, chmod
only uses -R
for recursive action, so this is:
chmod -R g+w a_dir
for example, to give group write permissions to every file in a directory and that directory itself.
When typing the command "
ls /sh
", press the tab key after typing "/sh
". What happens?
It should auto-complete as far as it can without being ambiguous. On Legion and Aristotle, this should expand to /shared
.
Exercise
Create a "Hello world"-like script using command line tools and execute it.
Copy and alter your script to redirect output to a file using
>
.Alter your script to use
>>
instead of>
. What effect does this have on its behaviour?
So, one file:
hello_world.sh
#!/bin/bash
echo "Hello World!"
Then:
hello_world_redirect_1.sh
#!/bin/bash
echo "Hello world!" > output.txt
Then:
hello_world_redirect_2.sh
#!/bin/bash
echo "Hello world!" >> output_append.txt
output.txt
will be overwritten each time, while output_append.txt
will be appended to because of the difference between >
and >>
.
Exercise
Use
seq 1 75 > numbers.txt
to generate a file containing a list of numbers. Use thehead
,tail
, andless
commands to look at it, then usegrep
to search it for a number.Use a combination of
head
andtail
to get an exact line number
$ seq 1 75 > numbers.txt
$ less numbers.txt
$ head numbers.txt
1
2
3
4
5
6
7
8
9
10
$ tail numbers.txt
66
67
68
69
70
71
72
73
74
75
$ head -n 5 numbers.txt
1
2
3
4
5
$ head -n 5 numbers.txt | tail -n 1
5
$ grep 0 numbers.txt
10
20
30
40
50
60
70
Exercise
Using two nested scripts, show that the value of an exported variable in the environment where you launch the scripts, propagates all the way down to the second script.
So, two files:
parent.sh
#!/bin/bash
echo Setting variables in parent script:
variable_1=ham
variable_2=eggs
export variable_1
echo "variable_1: $variable_1"
echo "variable_2: $variable_2"
echo Running child script...
./child.sh
echo Child script finished, printing out parent script variable values again:
echo "variable_1: $variable_1"
echo "variable_2: $variable_2"
child.sh
#!/bin/bash
echo Child script variable values:
echo "variable_1: $variable_1"
echo "variable_2: $variable_2"
If you run these, you should see:
$ ./parent.sh
Setting variables in parent script:
variable_1: ham
variable_2: eggs
Running child script...
Child script variable values:
variable_1: ham
variable_2:
Child script finished, printing out parent script variable values again:
variable_1: ham
variable_2: eggs
Exercise
Use the command
env
to discover more.
You should see a long list of variables something like this:
LC_PAPER=en_GB.UTF-8
MODULE_VERSION_STACK=3.2.6
SSH_CONNECTION=128.41.10.106 61693 144.82.108.231 22
MODULESHOME=/shared/ucl/apps/modules/3.2.6/Modules/3.2.6
LESSOPEN=||/usr/bin/lesspipe.sh %s
CC=icc
HOSTNAME=login06
Exercise
Using
$1
and$2
, write a script that print both variables to the screen.
So, for bonus points and to show some other related variables, here's a little extra:
cmd_args.sh
#!/bin/bash
echo "This is the first argument: $1"
echo "This is the second argument: $2"
echo "This is the number of arguments: $#"
echo "And this is all the arguments: $@"
echo "\$* and \$@ do almost the same thing, but you probably always want to use \$@"
echo "\$* gets expanded slightly differently and can break arguments"
Second Day
Environment within a Job
Exercise
To see what environment variables are set by the scheduler, try making a job script that runs env and puts the output in a file.
You can take the job script from earlier in the notes as a starting point here, which looks like this:
calculate_pi.sh
#!/bin/bash -l
#$ -l h_rt=0:10:00
#$ -cwd
./calculate_pi
Then we just want to change it to run env
and redirect the standard output (stdout) to a file, using the >
operator, like this:
env_to_file.sh
#!/bin/bash -l
#$ -l h_rt=0:10:00
#$ -cwd
env >env.output.from_job
Submit this using qsub
and you should have a file env.output.from_job
containing all the environment variables defined in the shell the job is running as.
Exercise
Now
sort
the file, and compare it to your current environment to see what has changed.
The method shown here is only one of a few possible ways to do this.
Let’s first sort the file we got from the last step, by putting the file as an argument to the sort
command, and redirecting the output into a new file:
sort env.output.from_job >env.output.from_job.sorted
Then get a sorted file containing your current environment:
env | sort >env.output.current.sorted
Now compare the two, side-by-side:
sdiff env.output.from_job.sorted env.output.current.sorted
Requesting Threads
Exercise
Try modifying the script from before to run the new program.
Exercise
Run versions with 1, 2, 3, and 4 cores, and compare the timings.
As before, we can take the previous script and edit it a bit, adding in the new option to request some more cores, getting something like this:
openmp_pi.sh
#!/bin/bash -l
#$ -l h_rt=0:10:00
#$ -pe smp 4
#$ -cwd
./openmp_pi
At this point you can just use 4 different jobs, setting the smp
requested value for each.
You should find that the time taken divides roughly as the number of cores you request.
Multinode jobs
Exercise
Try modifying the script from before to run the new program, using 8 cores and the
mpi
parallel environment.
Since you’re given the 4 core version, it’s just a matter of changing the 4 to an 8. This should give you:
mpi_pi.sh
#!/bin/bash -l
#$ -l h_rt=0:10:00
#$ -pe mpi 8
#$ -cwd
gerun ./mpi_pi
You should find again that increasing this number reduces the time taken, with diminishing returns past a certain point. The output will tell you which nodes your job runs on, so you can tell whether it’s run on more than one actual node or not.
Requesting an Array Job
Exercise
Try modifying the serial job script for
calculate_pi
to run 4 jobs as an array.
So, all we have to do is add in the array job option to the serial job script, to get:
array_pi.sh
#!/bin/bash -l
#$ -t 1-4
#$ -l h_rt=0:10:00
#$ -cwd
./calculate_pi
Exercise
calculate_pi
can take an argument to tell it how many steps to use. Try controlling this with$SGE_TASK_ID
.
We’re only using 4 jobs here, and 4 steps of calculating pi is hardly any calculation, so we can do more steps either by altering the way the array tasks are numbered:
# Gives 100,200,300,400 in SGE_TASK_ID variable
#$ -t 100-400:100
Or by adding some zeroes onto the end of the variable where it's used in the script:
array_pi-vary_steps.sh
#!/bin/bash -l
#$ -t 1-4
#$ -l h_rt=0:10:00
#$ -cwd
./calculate_pi ${SGE_TASK_ID}00
Module Prerequisites
Exercise
Successfully load the latest Graphviz module (
graphviz/2.38.0/gnu-4.9.2
).
This is mostly just an exercise in recognising what the modules system is telling you in terms of what requires what. Once you work out all the dependencies, you end up having to have typed:
module unload compilers
module load compilers/gnu/4.9.2
module load swig/3.0.5/gnu-4.9.2
module load qt/4.8.6/gnu-4.9.2
module load ghostscript/9.16/gnu-4.9.2
module load python/2.7.9
module load lua/5.3.1
module load perl/5.22.0
module load graphviz/2.38.0/gnu-4.9.2