This course page was updated until March 2022 when I left Durham University. For future updates, please visit the new version of the course pages.
Accessing Durham Supercomputing facilities #
Most of the exercises in this course will require that you use one of Durham’s supercomputing systems. This will either be Hamilton, or (for some Physics students) COSMA. If you’re signed up on the course, you should have been given access to Hamilton semi-automatically (I signed everyone up).
Access to Hamilton #
For many of the exercises in the course, we will be using the Hamilton supercomputer. If you are registered on the course on blackboard, you should have been given access to Hamilton.
Since we’ll be logging in a lot, I provide some tips on how to configure ssh
for swifter login.
Access to COSMA #
Some of you may have access to the Physics-run COSMA system. You can also use this system, the same information about setting up ssh login also applies.
The rest of the guide is a quick start on using supercomputing systems where you must compile code and then submit it via a batch scheduler. The focus is on Hamilton, but COSMA uses the same scheduling system, so most things will work with only minor changes.
Supercomputing Durham: Hamilton Quick Start Guide #
This is adapted from a quickstart guide from Tobias Weinzierl.
It is intended to get you up and running on the Hamilton supercomputer quickly. It is not replacement of any of the official documentation.
Logging in and transferring code #
You access Hamilton via ssh with
$ ssh USERNAME@hamilton.dur.ac.uk
When writing commands to execute in the shell, I use a $
to indicate
the prompt. You should not type this character.
Where USERNAME
is your CIS username. Since we’ll do this a lot, you
can also see some tips on how to
configure ssh
for swifter logins.
If you can’t log in to Hamilton, contact me so we can sort things out.
Hamilton doesn’t mount any of the Durham shared drives, so you have to
manually transfer any files you want. You can do this with
scp
. For example, if you are on
your local machine then
$ scp somefile.c USERNAME@hamilton.dur.ac.uk:
copies somefile.c
into your home directory on Hamilton. The other
option is to directly download files when you are logged in. Some of
the exercises in the course will provide more details on how to do
this.
Compilation environment #
As is common with supercomputers, there are many different compiler versions available on Hamilton. These are managed with environment modules so that different Hamilton users can control which compilers and tools they get.
Often in this course we’ll use the Intel compiler for which we need to load two modules
$ module load intel/xe_2018.2
$ module load gcc/9.3.0
This makes the Intel compiler tools available and loads a recent version of gcc. After executing these commands you can check the versions you have
$ icc --version
icc (ICC) 18.0.2 20180210
Copyright (C) 1985-2018 Intel Corporation. All rights reserved.
$ gcc --version
gcc (GCC) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
The exercises will typically enumerate the modules you need.
Running code #
When we log in to the Hamilton system, we access the “login” node. You can use this to compile code and run some profile analysis programs, however DO NOT use it for running your simulations. In common with most supercomputers, Hamilton consists of two parts.
- Login nodes (this is where we’ve been so far);
- Compute nodes (this is where you want to run your code).
To run code on the compute nodes we need to submit a job to the scheduler. This program takes care of allocate simulations to the compute nodes to maximise throughput for all users of the system. On Hamilton, the scheduler is called Slurm.
To use the scheduler we have to create a script that provides a recipe for Hamilton to run our code. This script is a shell script that contains some magic comments that describe things to the scheduler. The individual exercises contain some examples, as does the Hamilton documentation. Here is a simple example for a serial job.
#!/bin/bash
#SBATCH --job-name="myjob"
#SBATCH -o myjob.%A.out
#SBATCH -e myjob.%A.err
#SBATCH -p test.q
#SBATCH -t 00:05:00
#SBATCH --nodes=1
#SBATCH --cpus-per-task=24
#SBATCH --mail-type=ALL
#SBATCH --mail-user=YOUREMAIL@durham.ac.uk
source /etc/profile.d/modules.sh
module load intel/xe_2018.2
module load gcc/9.3.0
./myexecutable
Some things to note. This is a shell script executed with bash (as
indicated by the shebang-line). Lines beginning with #SBATCH
are
parsed by the job submission command sbatch
and are used to provide
options to it. Here we selected a particular queue test.q
and said
the job will run for a maximum of five minutes (-t 00:05:00
). The
other options control the size of the job and where output is sent.
Run man sbatch
on the Hamilton login node to see details of these
flags.
A typical reason your job might fail is because you did not load the necessary modules, so don’t forget to do so!
This is also useful for reproducibility, since it helps you to record exactly the software you used to produce the results.
After all the comments we have the commands that will be run on the
compute node. The first makes the module
command available. We then
load the same modules we used during compilation. Potentially we might
also load other modules here to gain access to profiling commands.
Finally is the command to run our code (here named myexecutable
).
Having saved this submission script (say as myjob.slurm
) we submit
it sbatch
$ sbatch myjob.slurm
This job is now submitted to the queue and will run when a slot is available.
You can see what jobs you currently have in the queue with
$ squeue -u $USER
For more details on using Hamilton, you’re encouraged to check their documentation. We’ll also recapitulate aspects of this guide when we carry out the exercises.