VASP#

Vienna Ab initio Simulation Package (VASP) is a computer program for atomic scale materials modeling, e.g. electronic structure calculations and quantum-mechanical molecular dynamics, from first principles.

Availability / Target HPC systems#

VASP requires an individual license. VASP comes with a license which prohibits us to "just install and everyone can use it". We have to individually check each VASP user.

Notes#

Parallelization and optimal performance:
- (try to) always use full nodes (PPN=20 for Meggie)
- NCORE=5/10 & PPN=20 results in optimal performance in almost all cases, in general NCORE should be a divisor of PPN
- OpenMP parallelization is supposed to supersede NCORE
- use KPAR if possible
Compilation:
- use -Davoidalloc
- use Intel toolchain and MKL
- in case of very large jobs with high memory requirements add -heap-arrays 64 to Fortran flags before compilation (only possible for Intel ifort)
Filesystems:
- Occasionally VASP user reported failing I/O on Meggie's $FASTTMP (/lxfs), this might be a problem with Lustre and Fortran I/O. Please try to use the fix described here: https://github.com/RRZE-HPC/getcwd-autoretry-preload
- Since VASP does not do parallel MPI I/O, $WORK is more appropriate than $FASTTMP
- For medium sized jobs, even node local /dev/shm/ might be an option
Walltime limit:
- VASP can only be gracefully stopped by creating the file STOPCAR https://www.vasp.at/wiki/index.php/STOPCAR automatic creation is shown in the example scripts below
On Fritz: At the moment we provide VASP 5.4.x and VASP 6.3.x modules to eligible users. The module vasp6/6.3.0-hybrid-intel-impi-AVX2-with-addons includes DFTD4, libbeef, and sol_compat/VASPsol.
On Alex: At the moment we provide two different VASP 6.2.3 and 6.3.0 modules to eligible users:
- vasp/6.x.y-nccl -- NCCL stands for Nvidia Collective Communication Library and is basically a library for direct GPU to GPU communication. However, NCCL only allows one one MPI rank per GPU. In 6.2.1 you can disable NCCL via the Input-file, but sadly the testsuite will still fail.
vasp/6.x.y-nonccl -- in certain cases, one MPI rank per GPU is not enough to saturate a single A100. When you use multiple ranks per GPU, you should also use the so called MPS server. See "Multi-Process Service (MPS daemon)" above on how to start MPS even in case of multiple GPUs.

The VASP 6.3.0 has been compiled with the new support for HDF5.
For benchmarking VASP on Fritz and Alex, you can watch: HPC Cafe: VASP Benchmarks

Sample job scripts#

parallel Intel MPI job on Meggie#

#! /bin/bash -l
#
#SBATCH --nodes=4
#SBATCH --tasks-per-node=20
#SBATCH --time=24:00:00
#SBATCH --job-name=my-vasp
#SBATCH --mail-user=my.mail
#SBATCH --mail-type=ALL
#SBATCH --export=NONE
unset SLURM_EXPORT_ENV

#enter submit directory
cd $SLURM_SUBMIT_DIR

#load modules
module load intel
module load intelmpi
module load mkl

#set PPN and pinning
export PPN=20
export I_MPI_PIN=enable

#define executable:
VASP=/path-to-your-vasp-installation/vasp

#create STOPCAR with LSTOP 1800s before reaching walltimelimit
lstop=1800

#create STOPCAR with LABORT 600s before reaching walltimelimit
labort=600

#automatically detect how much time this batch job requested and adjust the 
# sleep accordingly 
TIMELEFT=$(squeue -j $SLURM_JOBID -o %L -h)
HHMMSS=${TIMELEFT#*-}
[ $HHMMSS != $TIMELEFT ] && DAYS=${TIMELEFT%-*} 
IFS=: read -r HH MM SS <<< $TIMELEFT
[ -z $SS ] &&
[ -z $SS ] &&
#timer for STOP = .TRUE.
SLEEPTIME1=$(( ( ( ${DAYS:-0} * 24 + 10#${HH} ) * 60 + 10#${MM} ) * 60 + 10#$SS - $lstop ))
echo "Available runtime: ${DAYS:-0}-${HH:-0}:${MM:-0}:${SS}, sleeping for up to $SLEEPTIME1, thus reserving $lstop for clean stopping/saving results"

#timer for LABORT = .TRUE.
SLEEPTIME2=$(( ( ( ${DAYS:-0} * 24 + 10#${HH} ) * 60 + 10#${MM} ) * 60 + 10#$SS - $labort ))
echo "Available runtime: ${DAYS:-0}-${HH:-0}:${MM:-0}:${SS}, sleeping for up to $SLEEPTIME2, thus reserving $labort for clean stopping/saving results"

(sleep ${SLEEPTIME1} ; echo "LSTOP = .TRUE." > STOPCAR) &
lstoppid=!$
(sleep ${SLEEPTIME2} ; echo "LABORT = .TRUE." > STOPCAR) &
labortpid=!$

mpirun -ppn $PPN $VASP 

pkill -P $lstoppid
pkill -P $labortpid

Hybrid OpenMP/MPI job (multi-node) on Fritz#

#!/bin/bash -l
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=18
#SBATCH --partition=multinode
#SBATCH --time=01:00:00
#SBATCH --export=NONE

unset SLURM_EXPORT_ENV
module load vasp6/6.3.2-hybrid-intel-impi-AVX512

# set number of threads to requested cpus-per-task
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
echo "OMP_NUM_THREADS=$OMP_NUM_THREADS"
export OMP_PLACES=cores
export OMP_PROC_BIND=true

srun /apps/vasp6/6.3.2-hybrid-intel-AVX512/bin/vasp_std >output_filename

Performance tests for VASP-6 on fritz#

The calculations were performed using the binary file from module vasp6/6.3.2-hybrid-intel-impi-AVX512 for the ground state structure of sodium chloride, namely rock salt, downloaded from The Materials Project. In order to enforce the same number of SCF iterations and ensure convergence, which in turn could be relevant to the tasks and calculations considered by VASP, we set NELMIN=26 and NELM=26.

Single point calculations with PBE exchange-correlation functional
Supercell containing 64 atoms
2x2x2 k-points
ALGO=FAST, ENCUT=500, PREC=High, LREAL=Auto, LPLANE=True, NCORE=4, KPAR=4

VASP_NaCl064_PBE

Per-node speedup is defined as reference time divided by the product of the time of run and the number of nodes in each calculation i.e. T_ref /(T*nodes). T_ref is the time of calculations on one node with only MPI.

Single point calculations with PBE exchange-correlation functional
Supercell containing 512 atoms
No k-points
ALGO=FAST, ENCUT=500, PREC=High, LREAL=Auto, LPLANE=True, NCORE=4

VASP_NaCl512_PBE

Per-node speedup is defined as reference time divided by the product of the time of run and the number of nodes in each calculation i.e. T_ref /(T*nodes). T_ref is the time of calculations on one node with only MPI.

Single point calculations with HSE06 exchange-correlation functional
Supercell containing 64 atoms
2x2x2 k-points
ALGO=Damped, TIME=0.4, ENCUT=500, PREC=High, LREAL=Auto, LPLANE=True, NCORE=4, KPAR=4
Please note that in the hybrid OpenMP/MPI execution of VASP for HSE06 calculations, the default stack memory for OpenMP is insufficient and you should explicitly increase the value, otherwise your run might crash. The calculations in this section are run with export OMP_STACKSIZE=500m added to the submit script.

VASP_NaCl064_HSE

Per-node speedup is defined as reference time divided by the product of the time of run and the number of nodes in each calculation i.e. T_ref /(T*nodes). T_ref is the time of calculations on one node with only MPI.

Further information#

https://www.vasp.at/