Slurm Job script examples#
In the following, example batch scripts for different types of jobs are given. Note that all of them have to be adapted to your specific application and the target system. You can find more examples in the cluster-specific or application-specific documentation.
Parallel jobs#
How parallel jobs should be submitted depends on the type of parallelization (OpenMP, MPI) and architecture (single-node, multi-node).
Job script with OpenMP#
OpenMP applications can only make use of one node, therefore --nodes=1
and --ntasks=1
are necessary. The number of allocated CPUs --cpus-per-task
and therefore OpenMP threads (OMP_NUM_THREADS
) depends on the system and has to be adjusted accordingly.
OpenMP is not Slurm-aware, so you need to specify OMP_NUM_THREADS
explicitly in your script. It should match the number of cores requested via --cpus-per-task
.
For more efficient computation, OpenMP threads should be pinned to the compute cores. This can be achieved by the following environment variables:OMP_PLACES=cores, OMP_PROC_BIND=true
. For more information, see e.g. the HPC Wiki.
In this example, the executable will be run using 64 OpenMP threads (i.e. one per physical core) for a total job walltime of 1 hours.
#!/bin/bash -l
#
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=64
#SBATCH --time=1:00:00
#SBATCH --export=NONE
unset SLURM_EXPORT_ENV
# set number of threads to requested cpus-per-task
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
./application
Job script with MPI#
Define the number of MPI processes that should be started via the number of nodes --nodes
and the number of MPI processes per node --ntasks-per-node
.
Alternatively, you can also specify the total number of MPI processes via --ntasks
. However, this will not guarantee that the processes are evenly distributed. This might lead to load imbalances if the --ntasks
does not correspond to "full" nodes!
In this example, the executable will be run on 4 nodes with 72 MPI processes per node.
#!/bin/bash -l
#
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=72
#SBATCH --time=2:00:00
#SBATCH --export=NONE
unset SLURM_EXPORT_ENV
srun ./application
Job script for hybrid MPI/OpenMP#
Adjust the number of MPI processes via --nodes
and --ntasks-per-node
and the corresponding number of OpenMP threads per MPI process via
--cpus-per-task
according to the available hardware configuration and the needs of your application.
OpenMP is not Slurm-aware, so you need to specify OMP_NUM_THREADS
in your script. It should match the number of cores requested via
--cpus-per-task
.
For more efficient computation, OpenMP threads should be pinned to the compute cores. This can be achieved by the following environment
variables:OMP_PLACES=cores, OMP_PROC_BIND=true
. For more information, see e.g. the HPC Wiki.
Warning
In recent Slurm versions, the value of --cpus-per-task
is no longer automatically propagated to srun
, leading to errors in the application start. This value has to be set manually via the variable SRUN_CPUS_PER_TASK
.
In this example, the executable will be run on 2 nodes using 2 MPI processes per node with 8 OpenMP threads each (i.e. one per physical core) for a total job walltime of 4 hours.
#!/bin/bash -l
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=8
#SBATCH --time=4:00:00
#SBATCH --export=NONE
unset SLURM_EXPORT_ENV
# set number of threads to requested cpus-per-task
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
# for Slurm version >22.05: cpus-per-task has to be set again for srun
export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK
srun ./hybrid_application
GPU jobs#
GPUs are only available in TinyGPU and Alex. To submit a job to one of
those clusters, you have to specify the number of GPUs that should be
allocated to your job. This is done via the Slurm option
--gres=gpu:<number_gpus_per_node>
. The available number of GPUs per
node differs between 4 and 8, depending on the cluster and node
configuration.
Single-node job#
The number of requested GPUs has to be smaller or equal to the available number of GPUs per node. In this case, the compute nodes are not allocated exclusively but are shared among several jobs -- the GPUs themselves are always granted exclusively. The corresponding share of the resources of the host system (CPU cores, RAM) is automatically allocated.
If you are using hybrid OpenMP/MPI or pure MPI code, adjust
--cpus-per-task
or --ntasks
according to the above examples. Do not
specify more cores/tasks than available for the number of requested
GPUs!