This course page was updated until March 2022 when I left Durham University. For future updates, please visit the new version of the course pages.
Parallelisation of a simple loop #
As usual, we’ll be running these exercises on Hamilton or COSMA, so remind yourself of how to log in and transfer code if you need to.
Obtaining the code #
The code for this exercise is in the code/add_numbers
of the repository.
We’ll be working in the openmp
Working from the repository
branch again and create a new
branch for this exercise.Parallelising the loop #
Compile and run the code with OpenMP enabled.
Try running with different numbers of threads. Does the runtime change?
You should use a reasonably large value for
Check the add_numbers
routine in add_numbers.c
. Annotate it with
appropriate OpenMP pragmas to parallelise the loop.
Does the code now have different runtimes when using different numbers of threads?
This code can be parallelised using a simple parallel for.
we annotate the for loop with#pragma omp parallel for default(none) shared(n_numbers, numbers) reduction(+:result) schedule(static)
If I do this, I see that the code now takes less time with fewer threads.
Different schedules #
Experiment with different loop schedules. Which work best? Which work worst?
Produce a strong scaling plot for the computation as a function of the number of threads using the different schedules you investigated.
Don’t forget to do this on a compute node (submit a job script with
) to avoid timing variability.What do you observe?
This is what I get for some different schedules when computing on a vector of one million numbers, I did not run multiple times to avoid timing variability.