Skip to content

Quantum Espresso#

Quantum Espresso is an integrated suite of open-source computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves, and pseudopotentials.

Availability / Target HPC systems#

  • parallel computers: main target machines
  • throughput cluster Woody: might be useful for small systems, manually distributed phonon calculations

Notes on parallelization in general#

  • please note that QE has five command-line arguments that can be provided to the binary at run time: -nimage, -npools, -nband, -ntg, -ndiag (the short hands, respectively: -ni, -nk, -nb, -nt, -nd). They can influence the run time considerably.
  • try to stick to one k-point / node
  • do not use Hyperthreading (disabled on most systems of NHR@FAU anyways)
  • use image parallelization e.g. for NEB / phonon calculations via the use of -ni
  • ask for help with the parallelization of phonon calculation
  • use gamma point version (KPOINTS GAMMA) instead of KPOINTS AUTOMATIC
  • k-point parallelization:
    • 1 k-point per node .e.g. -nk #nnodes
    • -nk must be a divisor of #MPI tasks
  • -nd for #bands > 500
  • -nt 2,5,10 as a last resort only, and if nr3 < #MPI tasks, nr3 is the third dimension of the FFT mesh

Sample job scripts#

MPI job (single-node) on Fritz#

#!/bin/bash -l
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=72
#SBATCH --partition=singlenode
#SBATCH --time=01:00:00
#SBATCH --export=NONE

unset SLURM_EXPORT_ENV
module load qe/7.1
# set number of threads to requested cpus-per-task
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
echo "OMP_NUM_THREADS=$OMP_NUM_THREADS"

srun pw.x -i input.in >output_filename

Hybrid OpenMP/MPI job (multi-node) on Fritz#

#!/bin/bash -l
#SBATCH --nodes=8
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=18
#SBATCH --partition=multinode
#SBATCH --time=01:00:00
#SBATCH --export=NONE

unset SLURM_EXPORT_ENV
module load qe/7.1

# set number of threads to requested cpus-per-task
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
echo "OMP_NUM_THREADS=$OMP_NUM_THREADS"
export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK
export OMP_PLACES=cores
export OMP_PROC_BIND=true

srun pw.x -i input.in >output_filename

Performance tests for Quantum Espresso 7.1 on Fritz#

We performed the calculations using the binary file from module qe/7.1 for the ground state structure of sodium chloride, namely rock salt, downloaded from The Materials Project All wave-function optimizations of our single-point runs were converged in 14 iterations without enforcing the number of SCF iterations. The calculations are performed at the level of PBE exchange-correlation functional with PAW file (downloaded from PseudoDojo) which has nine valence electrons for sodium and seven for chlorine.

  • System:
    • Single point calculations
    • Supercell containing 512 atoms
    • Gamma point k-points
    • ecutwfc=36.0, ecutrho = 144.0, conv_thr = 1.0d-11, mixing_beta = 0.7
    • None of the performance-related arguments (mentioned at the top of this page) was used. The program makes choices that may not be the most optimal ones. For example, the default choice of QE for our system regarding the sub-groups in the diagonalization was ScaLAPACK distributed-memory algorithm (size of sub-group: 8*8 procs), it is not an optimal setup. In our benchmark, we compare the relative run time for different combinations of MPI processes and OpenMP threads and a perfect choice of the QE performance parameters would be system dependent as well as being a complicated task. Therefore, we do not tune the performance-related parameters. Nevertheless, we encourage users to tune the five parameters in a production run, in particular, if it is a computationally demanding run or a large set of similar small-scale individual runs. Please note that the following graph should be considered as a qualitative behavior of the parallel performance of QE.

Node speedup

Per-node speedup is defined as reference time divided by the product of the time of run and the number of nodes in each calculation i.e. Tref /(T*nodes). Tref is the time of calculations on one node with only MPI.

Further information#