Assignment 3
Due Date
- November 28th (this assignment can be done in pairs)
Problem Statement
Your goal is to parallelize on XSEDE's Bridges-2 supercomputer a toy particle simulator (similar particle simulators are used in mechanics, biology, astronomy, etc.) that reproduces the behaviour shown in the following animation using the GPUs attached to the nodes:
The range of interaction forces (cutoff) is limited as shown in grey for a selected particle. Density is set sufficiently low so that given \(n\) particles, only \(O(n)\) interactions are expected.
Suppose we have a code that runs in time \(T = O(n)\) on a single processor. We'd like you to write parallel code that takes advantage of the GPU accelerators to improve the time.
Correctness and Performance
A simple correctness check which computes the minimal distance between 2 particles during the entire simulation is provided. A correct simulation will have particles stay at greater than 0.4 (of cutoff) with typical values between 0.7 and 0.8. A simulation were particles don't interact correctly will be less than 0.4 (of cutoff) with typical values between 0.01 and 0.05.
Adding the checks inside the GPU code provides too much of an overhead so an autocorrect executable is provided that checks the output txt file for the values mentioned above.
While the job-bridges-*
scripts we are providing have small numbers of particles (2000) to allow for the \(O(n^2)\) algorithm to finish execution, the final code will be tested with values in the range of auto-bridges-gpu
(Note you have to uncomment the serial runs and autograder executable).
Grading
Your grade for part 2 will depend on the scaling and speedup sustained by your GPU code on Bridges with the final grade being the sum of the parts.
-
GPU Scaling will be tested via fitting multiple runs of the code to a line on a log/log plot and calculating the slope of that line. This will determine whether your code is attaining the \(O(n)\) desired complexity versus the starting \(O(n^2)\). With an achieved result of \(O(n^x)\) you will receive:
- if \(x\) is between 1.4 and 1.2 you will receive a scaling between 0 and 40 proportional to \(x\) (e.g. 1.3 gives a score of 20)
- if \(x\) is below 1.2 you will receive a scaling score of 40
-
GPU Speedup will be tested by comparing the runs with serial \(O(n)\) code and finding the average over a range of particle sizes.
- depending on the average speedup the score will be:
- if the speedup is between 2 and 4 you will receive a score between 0 and 40 proportional to it (e.g. 3 gives a score of 20)
- if the speedup is between 4 and 8 you will receive a score between 40 and 60 proportional to it (e.g. 7 gives a score of 55)
- if the speedup is above 8 you will receive a score of 60
- if the scaling score for the GPU is 0 (aka still have \(O(n^2)\) code ) the score for the speedup will be set to 0
- depending on the average speedup the score will be:
Source files
The starting files with their descriptions can be found in the starting code GPU folder.
To download all the files directly to Bridges-2 you can use the following command:
$ wget --no-check-certificate https://portal.nersc.gov/project/mp309/xsede-cs267-hw2-gpu-02-2021.tar.gz
Login, Compilation and Job submission
The easiest way to access the machines is to login directly with your own ssh client to login.xsede.org
and from there ssh into the correct machine. For this assignment we will be using Bridges-2.
To unarchive the files from the tar archive use the following command:
$ tar -xvf xsede-cs267-hw2-gpu-02-2021.tar.gz
The Makefile is built with the cuda compiler. You must first load cuda with one of the commands below, then simply type make
to compile the code.
$ module load cuda
or
source modules.sh
To submit jobs on the Bridges-2 system you will use the SLURM
interface for example:
$ sbatch job-brigdes-gpu
To check the status of running jobs you can use the following squeue
command where $USER
should be replaced with your username
$ squeue -u $USER
For more details on sbatch
commands please see Bridges-2' documentation page.
Submission Instructions
You will submit your files via Gradescope. Your submission should be a single zip archive name: hw3.zip
. It should contain (please do use exactly these formats and naming conventions):
serial.cu
andgpu.cu
Makefile
, only if you modified ithw3.pdf
, your write-up.
Your write-up should contain:
- The names of the people in your group (and each member's contribution)
- A plot in log-log scale showing your parallel codes performance and a description of the methods you used to achieve it
- A description of the synchronization you used in the GPU implementation