# Simple MPI parallelism #

In this exercise we’re going to compute an approximation to the value of π using a simple Monte Carlo method. We do this by noticing that if we randomly throw darts at a square, the fraction of the time they will fall within the incircle approaches π.

Consider a square with side-length $2r$ and an inscribed circle with radius $r$.

The ratio of areas is

$$ \frac{A_\text{circle}}{A_\text{square}} = \frac{\pi r^2}{4 r^2} = \frac{\pi}{4}. $$

If we therefore draw $X$ uniformly at random from the distribution $\mathcal{U}(0, r) \times \mathcal{U}(0, r)$, then the probability that \(X\) is in the circle is

$$ p_\text{in} = \frac{\pi}{4}. $$

We can therefore approximate π by picking $N_\text{total}$ random points and counting the number, $N_\text{circle}$, that fall within the circle

$$ \pi_\text{numerical} = 4 \frac{N_\text{circle}}{N_\text{total}} $$

## Obtaining the code #

The code for this exercise lives in the `code/parallel/pi/`

subdirectory in the repository, as `pi.py`

.

## Working from the repository

I provide a simple serial implementation that uses numpy to generate the random numbers.

## Exercise

Adapt the code so that it runs for a range of different choices of the number of samples,

`N`

. Plot the error in the estimated value of $\pi$ as a function of`N`

.What relationship do you observe between the accuracy of the approximate result and

`N`

?## Hint

A double-precision approximation to $\pi$ is available as`numpy.pi`

.

## Parallelisation with MPI #

We’re now going to parallelise this computation with MPI.

If you’re running on Hamilton don’t forget to load the right modules

`python/3.6.8 gcc/8.2.0 intelmpi/gcc/2019.6`

The code already imports `mpi4py`

, but does not distribute the work.

## Exercise

Adapt the

`run`

function so that the total samples are (approximately) evenly distributed between all the ranks in the given communicator.You’ll now have each process computing a partial answer, so combine them with

`MPI_Allreduce`

.

### Parallelising the random number generation #

Running this code in parallel presents us with a slight problem. We need to think about how to provide statistically independent random number streams on the different processes.

Fortunately, modern versions of numpy have us covered. Their documentation describes how to obtain random numbers in parallel.

## Exercise

Replace the use of the`default_rng`

generator with a Generator that will produce a different stream on each process. Remember that the`comm.rank`

is unique to each process in the communicator.

### Benchmarking #

We’ll now briefly look at how this code scales, by carrying out some simple strong and weak scaling tests.

## Exercise

Use

`MPI.Wtime()`

to measure the length of time the`run`

function takes on each process.Use the maximum over all processes for your plots.

- Produce a strong scaling plot using a total of $N=10^8$ points.
- Produce weak scaling plots using $N=10^4$, $N=10^5$, $N=10^6$, and $N=10^7$ points per process.
What observations do you make about the scaling behaviour?