VASP#
Vienna Ab initio Simulation Package (VASP) is a computer program for atomic scale materials modeling, e.g. electronic structure calculations and quantum-mechanical molecular dynamics, from first principles.
Availability / Target HPC systems#
VASP requires an individual license. VASP comes with a license which prohibits us to "just install and everyone can use it". We have to individually check each VASP user.
Notes#
- Parallelization and optimal performance:
- (try to) always use full nodes (
PPN=20
for Meggie) NCORE=5/10
&PPN=20
results in optimal performance in almost all cases, in generalNCORE
should be a divisor ofPPN
- OpenMP parallelization is supposed to supersede
NCORE
- use
KPAR
if possible
- (try to) always use full nodes (
- Compilation:
- use
-Davoidalloc
- use Intel toolchain and MKL
- in case of very large jobs with high memory requirements add
-heap-arrays 64
to Fortran flags before compilation (only possible for Intelifort
)
- use
- Filesystems:
- Occasionally VASP user reported failing I/O on Meggie's
$FASTTMP
(/lxfs
), this might be a problem with Lustre and Fortran I/O. Please try to use the fix described here: https://github.com/RRZE-HPC/getcwd-autoretry-preload - Since VASP does not do parallel MPI I/O,
$WORK
is more appropriate than$FASTTMP
- For medium sized jobs, even node local
/dev/shm/
might be an option
- Occasionally VASP user reported failing I/O on Meggie's
-
Walltime limit:
- VASP can only be gracefully stopped by creating the file
STOPCAR
https://www.vasp.at/wiki/index.php/STOPCAR automatic creation is shown in the example scripts below
- VASP can only be gracefully stopped by creating the file
-
On Fritz: At the moment we provide VASP 5.4.x and VASP 6.3.x modules to eligible users. The module
vasp6/6.3.0-hybrid-intel-impi-AVX2-with-addons
includes DFTD4, libbeef, and sol_compat/VASPsol. -
On Alex: At the moment we provide two different VASP 6.2.3 and 6.3.0 modules to eligible users:
vasp/6.x.y-nccl
-- NCCL stands for Nvidia Collective Communication Library and is basically a library for direct GPU to GPU communication. However, NCCL only allows one one MPI rank per GPU. In 6.2.1 you can disable NCCL via the Input-file, but sadly the testsuite will still fail.
-
vasp/6.x.y-nonccl
-- in certain cases, one MPI rank per GPU is not enough to saturate a single A100. When you use multiple ranks per GPU, you should also use the so called MPS server. See "Multi-Process Service (MPS daemon)" above on how to start MPS even in case of multiple GPUs.The VASP 6.3.0 has been compiled with the new support for HDF5.
-
For benchmarking VASP on Fritz and Alex, you can watch: HPC Cafe: VASP Benchmarks
Sample job scripts#
parallel Intel MPI job on Meggie#
#! /bin/bash -l
#
#SBATCH --nodes=4
#SBATCH --tasks-per-node=20
#SBATCH --time=24:00:00
#SBATCH --job-name=my-vasp
#SBATCH --mail-user=my.mail
#SBATCH --mail-type=ALL
#SBATCH --export=NONE
unset SLURM_EXPORT_ENV
#enter submit directory
cd $SLURM_SUBMIT_DIR
#load modules
module load intel
module load intelmpi
module load mkl
#set PPN and pinning
export PPN=20
export I_MPI_PIN=enable
#define executable:
VASP=/path-to-your-vasp-installation/vasp
#create STOPCAR with LSTOP 1800s before reaching walltimelimit
lstop=1800
#create STOPCAR with LABORT 600s before reaching walltimelimit
labort=600
#automatically detect how much time this batch job requested and adjust the
# sleep accordingly
TIMELEFT=$(squeue -j $SLURM_JOBID -o %L -h)
HHMMSS=${TIMELEFT#*-}
[ $HHMMSS != $TIMELEFT ] && DAYS=${TIMELEFT%-*}
IFS=: read -r HH MM SS <<< $TIMELEFT
[ -z $SS ] &&
[ -z $SS ] &&
#timer for STOP = .TRUE.
SLEEPTIME1=$(( ( ( ${DAYS:-0} * 24 + 10#${HH} ) * 60 + 10#${MM} ) * 60 + 10#$SS - $lstop ))
echo "Available runtime: ${DAYS:-0}-${HH:-0}:${MM:-0}:${SS}, sleeping for up to $SLEEPTIME1, thus reserving $lstop for clean stopping/saving results"
#timer for LABORT = .TRUE.
SLEEPTIME2=$(( ( ( ${DAYS:-0} * 24 + 10#${HH} ) * 60 + 10#${MM} ) * 60 + 10#$SS - $labort ))
echo "Available runtime: ${DAYS:-0}-${HH:-0}:${MM:-0}:${SS}, sleeping for up to $SLEEPTIME2, thus reserving $labort for clean stopping/saving results"
(sleep ${SLEEPTIME1} ; echo "LSTOP = .TRUE." > STOPCAR) &
lstoppid=!$
(sleep ${SLEEPTIME2} ; echo "LABORT = .TRUE." > STOPCAR) &
labortpid=!$
mpirun -ppn $PPN $VASP
pkill -P $lstoppid
pkill -P $labortpid
Hybrid OpenMP/MPI job (multi-node) on Fritz#
#!/bin/bash -l
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=18
#SBATCH --partition=multinode
#SBATCH --time=01:00:00
#SBATCH --export=NONE
unset SLURM_EXPORT_ENV
module load vasp6/6.3.2-hybrid-intel-impi-AVX512
# set number of threads to requested cpus-per-task
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
echo "OMP_NUM_THREADS=$OMP_NUM_THREADS"
export OMP_PLACES=cores
export OMP_PROC_BIND=true
srun /apps/vasp6/6.3.2-hybrid-intel-AVX512/bin/vasp_std >output_filename
Performance tests for VASP-6 on fritz#
The calculations were performed using the binary file from module
vasp6/6.3.2-hybrid-intel-impi-AVX512 for the ground state structure
of sodium chloride, namely rock salt, downloaded from The Materials
Project. In order to enforce the same
number of SCF iterations and ensure convergence, which in turn could be
relevant to the tasks and calculations considered by VASP, we set
NELMIN=26
and NELM=26
.
- Single point calculations with PBE exchange-correlation functional
- Supercell containing 64 atoms
- 2x2x2 k-points
ALGO=FAST
,ENCUT=500
,PREC=High
,LREAL=Auto
,LPLANE=True
,NCORE=4
,KPAR=4
Per-node speedup is defined as reference time divided by the product of the time of run and the number of nodes in each calculation i.e. Tref /(T*nodes). Tref is the time of calculations on one node with only MPI.
- Single point calculations with PBE exchange-correlation functional
- Supercell containing 512 atoms
- No k-points
ALGO=FAST
,ENCUT=500
,PREC=High
,LREAL=Auto
,LPLANE=True
,NCORE=4
Per-node speedup is defined as reference time divided by the product of the time of run and the number of nodes in each calculation i.e. Tref /(T*nodes). Tref is the time of calculations on one node with only MPI.
- Single point calculations with HSE06 exchange-correlation functional
- Supercell containing 64 atoms
- 2x2x2 k-points
ALGO=Damped
,TIME=0.4
,ENCUT=500
,PREC=High
,LREAL=Auto
,LPLANE=True
,NCORE=4
,KPAR=4
- Please note that in the hybrid OpenMP/MPI execution of VASP for
HSE06 calculations, the default stack memory for OpenMP is
insufficient and you should explicitly increase the value,
otherwise your run might crash. The calculations in this section
are run with
export OMP_STACKSIZE=500m
added to the submit script.
Per-node speedup is defined as reference time divided by the product of the time of run and the number of nodes in each calculation i.e. Tref /(T*nodes). Tref is the time of calculations on one node with only MPI.