VASP#
Vienna Ab initio Simulation Package (VASP) is a computer program for atomic scale materials modeling, e.g. electronic structure calculations and quantum-mechanical molecular dynamics, from first principles.
Availability / Target HPC systems#
VASP requires an individual license. VASP comes with a license which prohibits us to "just install and everyone can use it". We have to individually check each VASP user.
Notes#
- Parallelization and optimal performance:
- (try to) always use full nodes (72 for Fritz)
NCORE=2/4/8/18/36results in optimal performance on Fritz cases, in generalNCOREshould be a divisor ofPPN- OpenMP parallelization is supposed to supersede
NCORE - use
KPARif possible
- Compilation:
- use
-Davoidalloc - use Intel toolchain and MKL
- in case of very large jobs with high memory requirements add
-heap-arrays 64to Fortran flags before compilation (only possible for Intelifort)
- use
- Filesystems:
- Occasionally VASP user reported failing I/O on some
$FASTTMP(/lxfs), this might be a problem with Lustre and Fortran I/O. Please try to use the fix described here: https://github.com/RRZE-HPC/getcwd-autoretry-preload - Since VASP does not do parallel MPI I/O,
$WORKis more appropriate than$FASTTMP - For medium sized jobs, even node local
/dev/shm/might be an option
- Occasionally VASP user reported failing I/O on some
-
Walltime limit:
- VASP can only be gracefully stopped by creating the file
STOPCARhttps://www.vasp.at/wiki/index.php/STOPCAR automatic creation is shown in the example scripts below
- VASP can only be gracefully stopped by creating the file
-
On Fritz: At the moment we provide VASP 5.4.x and VASP 6.3.x modules to eligible users. The module
vasp6/6.3.0-hybrid-intel-impi-AVX2-with-addonsincludes DFTD4, libbeef, and sol_compat/VASPsol. -
On Alex: At the moment we provide two different VASP 6.2.3 and 6.3.0 modules to eligible users:
vasp/6.x.y-nccl-- NCCL stands for Nvidia Collective Communication Library and is basically a library for direct GPU to GPU communication. However, NCCL only allows one one MPI rank per GPU. In 6.2.1 you can disable NCCL via the Input-file, but sadly the testsuite will still fail.
-
vasp/6.x.y-nonccl-- in certain cases, one MPI rank per GPU is not enough to saturate a single A100. When you use multiple ranks per GPU, you should also use the so called MPS server. See "Multi-Process Service (MPS daemon)" above on how to start MPS even in case of multiple GPUs.The VASP 6.3.0 has been compiled with the new support for HDF5.
-
For benchmarking VASP on Fritz and Alex, you can watch: HPC Cafe: VASP Benchmarks
Sample job scripts#
Hybrid OpenMP/MPI job (multi-node) on Fritz#
#!/bin/bash -l
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=18
#SBATCH --partition=multinode
#SBATCH --time=01:00:00
#SBATCH --export=NONE
unset SLURM_EXPORT_ENV
module load vasp6/6.3.2-hybrid-intel-impi-AVX512
#enter submit directory
cd $SLURM_SUBMIT_DIR
# set number of threads to requested cpus-per-task
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
echo "OMP_NUM_THREADS=$OMP_NUM_THREADS"
export OMP_PLACES=cores
export OMP_PROC_BIND=true
srun /apps/vasp6/6.3.2-hybrid-intel-AVX512/bin/vasp_std >output_filename
Performance tests for VASP-6 on fritz#
The calculations were performed using the binary file from module
vasp6/6.3.2-hybrid-intel-impi-AVX512 for the ground state structure
of sodium chloride, namely rock salt, downloaded from The Materials
Project. In order to enforce the same
number of SCF iterations and ensure convergence, which in turn could be
relevant to the tasks and calculations considered by VASP, we set
NELMIN=26 and NELM=26.
- Single point calculations with PBE exchange-correlation functional
- Supercell containing 64 atoms
- 2x2x2 k-points
ALGO=FAST,ENCUT=500,PREC=High,LREAL=Auto,LPLANE=True,NCORE=4,KPAR=4
Per-node speedup is defined as reference time divided by the product of the time of run and the number of nodes in each calculation i.e. Tref /(T*nodes). Tref is the time of calculations on one node with only MPI.
- Single point calculations with PBE exchange-correlation functional
- Supercell containing 512 atoms
- No k-points
ALGO=FAST,ENCUT=500,PREC=High,LREAL=Auto,LPLANE=True,NCORE=4
Per-node speedup is defined as reference time divided by the product of the time of run and the number of nodes in each calculation i.e. Tref /(T*nodes). Tref is the time of calculations on one node with only MPI.
- Single point calculations with HSE06 exchange-correlation functional
- Supercell containing 64 atoms
- 2x2x2 k-points
ALGO=Damped,TIME=0.4,ENCUT=500,PREC=High,LREAL=Auto,LPLANE=True,NCORE=4,KPAR=4- Please note that in the hybrid OpenMP/MPI execution of VASP for
HSE06 calculations, the default stack memory for OpenMP is
insufficient and you should explicitly increase the value,
otherwise your run might crash. The calculations in this section
are run with
export OMP_STACKSIZE=500madded to the submit script.
Per-node speedup is defined as reference time divided by the product of the time of run and the number of nodes in each calculation i.e. Tref /(T*nodes). Tref is the time of calculations on one node with only MPI.