Compiler#
This page provides an overview about installed compilers and their usage on our clusters.
Overview#
compiler | GCC | LLVM | Intel Classic | Intel oneAPI | NVHPC | Nvidia CUDA |
---|---|---|---|---|---|---|
environment module | gcc |
llvm |
intel |
intel |
nvhpc |
cuda |
C | gcc |
clang |
icc |
icx |
nvc |
nvcc |
C++ | g++ |
clang++ |
icpc |
icpx |
nvc++ |
nvcc |
Fortran | gfortran |
flang |
ifort |
ifx |
nvfortran |
|
optimize for current host | -march=native |
-march=native |
Intel host: -march=native or -xHost , AMD host: -mfma -mavx2 |
Intel host: -march=native or -xHost , AMD host: -mfma -mavx2 |
-tp=native |
GPU: -arch=native , CPU: -Xcompiler -march=native (1) |
enable OpenMP | -fopenmp |
-fopenmp |
-qopenmp |
-qopenmp or -fiopenmp (2) |
-mp |
t.d.b. |
vendor documentation | GCC | Clang | Intel (3) | Intel (3) | Nvidia | Nvidia |
Notes:
-
(1)
-Xcompiler
passes the following option to the host compiler. Adapt-march=native
to the flag, your host compiler uses to automatically detect the host's CPU type. -
(2) The compiler also recognizes the deprecated option
-fopenmp
. However-fopenmp
enables the usage of the LLVM OpenMP runtime instead of the Intel OpenMP runtime (with possible Intel extensions). -
(3) Intel does not have stable links to latest documentation, so you have to search it on your own in the Documentation Library.
Missing compiler module or version#
If a compiler module is missing or the version you are needing is not available you can either
- install the compiler locally, e.g. compile it manually or via Spack or
- contact us under hpc-support@fau.de and request installation on the whole cluster (we will decide if it makes sense)
Compiling for architectures supporting AVX-512#
The frequency of Intel processors can decrease, when AVX(2) or AVX-512 instruction are used. Typically AVX-512 floating point instructions cause a larger frequency reduction then AVX(2) instructions.
Because of this reason, compilers might generate code that only uses the lower half of AVX-512 registers (ymm
part), instead of the full width (zmm
).
To force GCC, Clang, and Intel Classic/oneAPI compilers to use the full zmm
registers use the flags:
- GCC, Clang, Intel OneAPI:
-mprefer-vector-width=512
- Intel Classic:
-qopt-zmm-usage=high
Generally we cannot give a recommendation which instruction set to target and if the usage of full-width registers is beneficial on an AVX-512 capable systems. You have to do your own benchmarks for your application.
Optimizations flags#
All compilers support the optimization level -O3
, which typically allows for
high and safe optimizations.
With -Ofast
compiler typically enable non-standard compliant floating-point
optimizations that can deliver more performance than with -O3
.
However, this can lead to errors, exceptions, or divergence of your simulations, depending on the numerical stability of your applications. If in doubt, consult the application's vendor/authors and the compiler's documentation.
Targeting multiple architectures#
Targeting multiple CPU micro-architectures can be handy if a application should be optimized for cluster where partitions have with different micro-architectures.
Intel
Intel compilers can generate multiple code paths optimized for different CPU micro-architectures in one binary. This increases the size of the binary.
For compilation specify:
-march=...
: the oldest supported architecture-ax
: a comma separated list of all newer architectures to be supported
For example:
This will produce a binary that requires at least the Ice Lake Server micro-architecture, but also generates an optimized code path for the Sapphire Rapids micro-architecture.
GCC and LLVM
Having multiple optimized code for different micro-architectures, is with GCC and LLVM not (easily) possible.
However, at compile time you can specify a minimum supported architecture with -march=...
and instruct
the compiler to tune for a newer one with -mtune=...
.
In this example the binary requires at least Ice Lake Server micro-architecture
or later and will be tuned for the Sapphire Rapids micro-architecture.
GNU compiler collection#
module: gcc/<version>
gcc
of the OS#
Without loading a gcc
module, the installed gcc
shipped with the OS is
used and it is rather old.
Intel and Intel classic compilers#
module: intel/<version>
oneAPI DPC++ compiler#
The intel
module also provides the oneAPI DPC++ compiler dpcpp
.
Compiling for AMD systems#
The Intel and Intel classic compilers might not generate the best code when using
-march=native
or -xHost
on an AMD system. On an AMD systems supporting AVX2
we recommend to use the flags:
Endianness conversion for Fortran#
Little-endian (LE) is used by x86-based processors. LE means the least-significant byte (LSB) of a multi-byte word is stored first. This format is used for unformatted Fortran data files.
To transparently import big-endian (BE) files, e.g. from files produced on IBM Power or NEC SX systems, the Intel Fortran compilers can convert the endianness automatically during read and write operations, even for different Fortran units.
Setting the environment variable F_UFMTENDIAN
enables the conversion.
Examples of possible values for F_UFMTENDIAN
are:
F_UFMTENDIAN |
treat input/output as |
---|---|
big |
BE |
little |
LE, the default |
big:10,20 |
LE, except for units 10 and 20 |
"big;little:8" |
BE, except for unit 8 |
To treat input and output as big-endian:
NVHPC compilers#
module: nvhpc/<version>
The Nvidia HPC compilers (NVHPC) were formerly known as the PGI compilers.
Nvidia CUDA compilers#
module: cuda/<version>
The Nvidia CUDA C and C++ compiler driver compiles CUDA code.
For the host part, nvcc
relies on an installed host compiler.
By default, this is gcc
, but others are supported too, e.g. see at Nvidia.
Nvidia GPU capabilities#
card | compute capability | functional capability (FC) | virtual architecture (VA) |
---|---|---|---|
V100 | 7.0 | sm_70 |
compute_70 |
A100 | 8.0 | sm_80 |
compute_80 |
A40 | 8.6 | sm_86 |
compute_86 |
Geforce RTX 2080 Ti | 7.5 | sm_75 |
compute_75 |
GeForce RTX 3080 | 8.6 | sm_86 |
compute_86 |
More information can be found at Nvidia:
- Your GPU Compute Capability
- CUDA C++ Programming Guide, 16. Compute Capabilities
- NVIDIA CUDA Compiler Driver NVCC, 5. GPU Compilation
Compiling for certain capabilities#
Capabilities can be specified via the -gencode
flag:
where VA
is the virtual architecture and FC
the functional capability.
It is possible to specify multiple virtual architectures by repeating the -gencode
flag.
For example:
Multiple values for code
must be wrapped into \"
.