Slurm Job script examples#

In the following, example batch scripts for different types of jobs are given. Note that all of them have to be adapted to your specific application and the target system. You can find more examples in the cluster-specific or application-specific documentation.

Parallel jobs#

How parallel jobs should be submitted depends on the type of parallelization (OpenMP, MPI) and architecture (single-node, multi-node).

Job script with OpenMP#

OpenMP applications can only make use of one node, therefore --nodes=1 and --ntasks=1 are necessary. The number of allocated CPUs --cpus-per-task and therefore OpenMP threads (OMP_NUM_THREADS) depends on the system and has to be adjusted accordingly.

OpenMP is not Slurm-aware, so you need to specify OMP_NUM_THREADS explicitly in your script. It should match the number of cores requested via --cpus-per-task.

For more efficient computation, OpenMP threads should be pinned to the compute cores. This can be achieved by the following environment variables:OMP_PLACES=cores, OMP_PROC_BIND=true. For more information, see e.g. the HPC Wiki.

In this example, the executable will be run using 64 OpenMP threads (i.e. one per physical core) for a total job walltime of 1 hours.

#!/bin/bash -l
#
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=64
#SBATCH --time=1:00:00
#SBATCH --export=NONE

unset SLURM_EXPORT_ENV

# set number of threads to requested cpus-per-task
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK 

./application

Job script with MPI#

Define the number of MPI processes that should be started via the number of nodes --nodes and the number of MPI processes per node --ntasks-per-node.

Alternatively, you can also specify the total number of MPI processes via --ntasks. However, this will not guarantee that the processes are evenly distributed. This might lead to load imbalances if the --ntasks does not correspond to "full" nodes!

In this example, the executable will be run on 4 nodes with 72 MPI processes per node.

#!/bin/bash -l
#
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=72
#SBATCH --time=2:00:00
#SBATCH --export=NONE

unset SLURM_EXPORT_ENV

srun ./application

Job script for hybrid MPI/OpenMP#

Adjust the number of MPI processes via --nodes and --ntasks-per-node and the corresponding number of OpenMP threads per MPI process via --cpus-per-task according to the available hardware configuration and the needs of your application.

OpenMP is not Slurm-aware, so you need to specify OMP_NUM_THREADS in your script. It should match the number of cores requested via --cpus-per-task.

For more efficient computation, OpenMP threads should be pinned to the compute cores. This can be achieved by the following environment variables:OMP_PLACES=cores, OMP_PROC_BIND=true. For more information, see e.g. the HPC Wiki.

Warning

In recent Slurm versions, the value of --cpus-per-task is no longer automatically propagated to srun, leading to errors in the application start. This value has to be set manually via the variable SRUN_CPUS_PER_TASK.

In this example, the executable will be run on 2 nodes using 2 MPI processes per node with 8 OpenMP threads each (i.e. one per physical core) for a total job walltime of 4 hours.

#!/bin/bash -l

#SBATCH --nodes=2
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=8
#SBATCH --time=4:00:00
#SBATCH --export=NONE

unset SLURM_EXPORT_ENV

# set number of threads to requested cpus-per-task
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
# for Slurm version >22.05: cpus-per-task has to be set again for srun
export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK

srun ./hybrid_application

GPU jobs#

GPUs are only available in TinyGPU and Alex. To submit a job to one of those clusters, you have to specify the number of GPUs that should be allocated to your job. This is done via the Slurm option --gres=gpu:<number_gpus_per_node>. The available number of GPUs per node differs between 4 and 8, depending on the cluster and node configuration.

Single-node job#

The number of requested GPUs has to be smaller or equal to the available number of GPUs per node. In this case, the compute nodes are not allocated exclusively but are shared among several jobs -- the GPUs themselves are always granted exclusively. The corresponding share of the resources of the host system (CPU cores, RAM) is automatically allocated.

If you are using hybrid OpenMP/MPI or pure MPI code, adjust --cpus-per-task or --ntasks according to the above examples. Do not specify more cores/tasks than available for the number of requested GPUs!

#!/bin/bash -l
#SBATCH --gres=gpu:1
#SBATCH --time=01:00:00
./cuda_application