This is an exercise from the Extras part of the Tutorial UvA HPC course 2021-01-22.
In this advanced part of our HPC Cloud tutorial we ask you to play around with a parallel processing technique on a shared-memory system. For this puspose, we will be running a Monte Carlo simulation to calculate an approximation of the value of π.
NOTE:
You are now in the advanced section of the workshop. You have your laptop and an Internet connection. We expect you will be able to find out more on your own about things that we hardly/don’t explain but which you think you need. For example, if we were you, at this point we would’ve already googled for several things:
- Monte Carlo simulation
- Monte Carlo pi
- OpenMP cheatsheet
We provide you with an implementation of that simulation using OpenMP
. You will be asked to perform multiple runs of each program, so that fluctuations caused by e.g. network can be middled out. The output of each program includes results for run time in wall-clock, user and system time.
This exercise will let you use OpenMP, first with a serial implementation within a single multicore VM and then with diffrent parallel implementations. Please observe if the differences are significant or not for the scenarios below.
template
that will use your existing Course Image:
template
Launch a VM from that template
sudo apt-get update && sudo apt-get install build-essential
Optionally verify gcc and GNU make installation and version with
gcc -v
andmake -v
respectively.
wget http://doc.hpccloud.surfsara.nl/UvA-20210122/code/gridpi-mp.tar
tar -xvf gridpi-mp.tar
cd gridpi-mp/
ls -l
gridpi-serial.c
calculates π in a simple,
serial implementation. Have a look inside the file, e.g. cat gridpi-serial.c
gridpi-serial.c
program:gcc -std=c99 -Wall -Werror -pedantic gridpi-serial.c -o gridpi-serial
./gridpi-serial
Food for brain b1:
- Do you see significant differences between real and user time? Can you explain?
gridpi-mp-simple.c
is a first turn over gridpi-simple.c
to use OpenMP.gridpi-mp-simple.c
program:gcc -std=c99 -Wall -Werror -pedantic -fopenmp gridpi-mp-simple.c -lm -o gridpi-mp-simple
./gridpi-mp-simple
Food for brain c1:
- How many threads are running?
- Can you explain the differences in the code between this file and that of the previous exercise? In particular:
- What runs in parallel? What not?
- Which variables are used where?
gridpi-mp-alt.c
tries to optimise on gridpi-mp-simple.c
gridpi-mp-alt.c
program:gcc -std=c99 -Wall -Werror -pedantic -fopenmp gridpi-mp-alt.c -lm -o gridpi-mp-alt
./gridpi-mp-alt
Food for brain d1:
- How many threads are running?
- Can you explain the differences in the code between this file and those of previous exercises b) and c)? In particular:
- What runs in parallel? What not?
- Which variables are used where?
gridpi-mp-reduction.c
uses another approach to optimise on gridpi-mp-simple.c
gridpi-mp-reduction.c
program:gcc -std=c99 -Wall -Werror -pedantic -fopenmp gridpi-mp-reduction.c -lm -o gridpi-mp-reduction
./gridpi-mp-reduction
Food for brain e1:
- How many threads are running?
- Can you explain the differences in the code between this file and those of previous exercises b), c) and d)? In particular:
- What runs in parallel? What not?
- Which variables are used where?
Food for brain e2:
Replace your VM with one that has more cores (hint: make a new
template
or update the current). Then run some batches of each of the exercises b), c), d) and e) again.
- How do times with more cores compare to those before?
- Does the performance scale for all of the implementations? Do you see any number where it ceases to make sense to scale? Can you explain?
This section is meant as extra questions that we thought would be nice for you to investigate, and we invite you to do/think about them even after the workshop is finished.
Bonus1: Make a batch of several runs (e.g.: 100) per exercise (b), c), d), e)) and calculate the average runtime and standard deviation. What do you observe?
(hint: make a table where each row is each exercise, one column is the average time and one is the deviation you measured).
Bonus2: Play around with the parameters in the source files (e.g. POINTS_ON_AXIS)
(hint: add an extra column to the table for each parameter you change). Any insight?
Bonus3: Can you draw some curves (graphs) with the measurements you have gathered? How do they compare?
NOTE: Do not forget to shutdown your VM when you are done with your performance tests.
If you want more of the advanced exercises on the HPC Cloud, see Extras.