Test cluster#
The test and benchmark cluster is an environment for porting software to new CPU architectures and running benchmark tests. It comprises a variety of nodes with different processors, clock speeds, memory speeds, memory capacity, number of CPU sockets, etc. There is no high-speed network, and MPI parallelization is restricted to one node. The usual NFS filesystems are available.
This is a testing ground. Any job may be canceled without prior notice.
System overview#
This is a quick overview of the systems including their host names (frequencies are nominal values) - NDA systems are not listed:
Hostname | CPU | RAM | Accelerators |
---|---|---|---|
applem1studio |
Apple M1 Ultra 20-Core Processor | 64 GiB RAM | 64-Core GPU |
aquavan1 |
Dual AMD EPYC 9474F CPU (96 cores) | 2304 GiB RAM (24x 96 GB DDR5-5600) | 8x AMD MI300X (192 GiB HBM3) |
broadep2 |
Dual Intel Xeon "Broadwell" CPU E5-2697 v4 (2x 18 cores + SMT) @ 2.30GHz |
128 GiB RAM | |
casclakesp2 |
Dual Intel Xeon "Cascade Lake" Gold 6248 CPU (2x 20 cores + SMT) @ 2.50GHz |
384 GiB RAM | |
euryale |
Dual Intel Xeon "Broadwell" CPU E5-2620 v4 (2x 8 cores) @ 2.10GHz |
64 GiB RAM | AMD RX 6900 XT (16 GB) |
genoa1 |
Dual AMD EPYC 9654 "Genoa" CPU (2x 96 cores + SMT) @ 2.40GHz |
768 GiB RAM | |
genoa2 |
Dual AMD EPYC 9354 "Genoa" CPU (2x 32 cores + SMT) @ 3.25GHz |
768 GiB RAM | Nvidia L40 (48 GiB GDDR6) Nvidia L40s (48 GiB GDDR6) |
gracehop1 |
Nvidia Grace Hopper GH200 (72 cores) | 480 GiB RAM | Nvidia H100 (96 GiB HBM3) |
gracesup1 |
Nvidia Grace Superchip (2x 72 cores) | 480 GiB RAM | |
hasep1 |
Dual Intel Xeon "Haswell" E5-2695 v3 CPU (2x 14 cores + SMT) @ 2.30GHz |
64 GiB RAM | |
icx32 |
Dual Intel Xeon "Ice Lake" Platinum 8358 CPU (2x 32 cores + SMT) @ 2.60GHz |
256 GiB RAM | Nvidia L4 (24 GB) |
icx36 |
Dual Intel Xeon "Ice Lake" Platinum 8360Y CPU (2x 36 cores + SMT) @ 2.40GHz |
256 GiB RAM | |
ivyep1 |
Dual Intel Xeon "Ivy Bridge" E5-2690 v2 CPU (2x 10 cores + SMT) @ 3.00GHz |
64 GiB RAM | |
lukewarm |
Dual ARM Ampere Altra Max M128-30, ARM aarch64 (2x 128 cores) @ 2.8 GHz |
512 GB RAM (DDR4-3200) | |
medusa |
Dual Intel Xeon "Cascade Lake" Gold 6246 CPU (2x 12 cores + SMT) @ 3.30GHz |
192 GiB RAM | Nvidia Geforce RTX 2070 SUPER (8 GiB GDDR6) Nvidia Geforce RTX 2080 SUPER (8 GiB GDDR6) Nvidia Quadro RTX 5000 (16 GiB GDDR6) Nvidia Quadro RTX 6000 (24 GiB GDDR6) |
milan1 |
Dual AMD EPYC 7543 "Milan" CPU (2x 32 cores + SMT) @ 2.8 GHz |
256 GiB RAM | AMD MI210 (64 GiB HBM2e) |
naples1 |
Dual AMD EPYC 7451 "Naples" CPU (2x 24 cores + SMT) @ 2.3 GHz |
128 GiB RAM | |
optane1 |
Dual Intel Xeon "Ice Lake" Platinum 8362 CPU (2x 32 cores + SMT) @ 2.80 GHz |
256 GiB RAM 1024 GiB Optane Memory |
|
rome1 |
Single AMD EPYC 7452 "Rome" CPU (32 cores + SMT) @ 2.35 GHz |
128 GiB RAM | |
rome2 |
Dual AMD EPYC 7352 "Rome" CPU (2x 24 cores + SMT) @ 2.3 GHz |
256 GiB RAM | AMD MI100 (32 GiB HBM2) |
skylakesp2 |
Intel Xeon "Skylake" Gold 6148 CPU (2x 20 cores + SMT) @ 2.40GHz |
96 GiB RAM | |
warmup |
Dual Cavium/Marvell "ThunderX2" (ARMv8) CN9980, ARM aarch64 (2x 32 cores + 4-way SMT) @ 2.20 GHz |
128 GiB RAM | |
bergamo1 |
Dual AMD EPYC 9754 "Bergamo" CPU (2x 128 cores + SMT) @ 2.25GHz |
1.5 TiB RAM | |
saprap2 |
Dual Intel Xeon "SapphireRapids" Platinum 8470 CPU (2x 52 cores + SMT) @ 2.0GHz |
512 GiB RAM |
GPU availability#
Technical specifications of all more or less recent GPUs available at NHR@FAU (either in the test cluster or in TinyGPU):
RAM | BW [GB/s] | Ref Clock [GHz] | #SM (or #CU) / #Cores | TDP [W] | SP [TFlop/s] | DP [TFlop/s] | Host | Host CPU (base clock frequency) | |
---|---|---|---|---|---|---|---|---|---|
Nvidia Geforce GTX1080 | 8 GB GDDR5 | 320 | 1.607 | 20 / 2560 | 180 | 8.87 | 0.277 | tg03x (JupyterHub) |
Intel Xeon Broadwell E5-2620 v4 (8 C, 2.10GHz) |
Nvidia Geforce GTX1080Ti | 11 GB GDDR5 | 484 | 1.48 | 28 / 3584 | 250 | 11.34 | 0.354 | tg04x (JupyterHub) |
Intel Xeon Broadwell E5-2620 v4 (2x 8 C, 2.10GHz) |
Nvidia Geforce RTX2070Super | 8 GB GDDR6 | 448 | 1.605 | 40 / 2560 | 215 | 9.06 | 0.283 | medusa |
Intel Xeon Cascade Lake Gold 6246 (2x 12 C, 3.30GHz) |
Nvidia Quadro RTX5000 | 16 GB GDDR6 | 448 | 1.62 | 48 / 3072 | 230 | 11.15 | 0.48 | medusa |
Intel Xeon Cascade Lake Gold 6246 (2x 12 C, 3.30GHz) |
Nvidia Geforce RTX2080Super | 8 GB GDDR6 | 496 | 1.65 | 48 / 3072 | 250 | 11.15 | 0.348 | medusa |
Intel Xeon Cascade Lake Gold 6246 (2x 12 C, 3.30GHz) |
Nvidia Geforce RTX2080Ti | 11 GB GDDR6 | 616 | 1.35 | 68 / 4352 | 250 | 13.45 | 0.42 | tg06x (TinyGPU) |
Intel Xeon Skylake Gold 6134 (2x 8 Cores + SMT, 3.20GHz) |
Nvidia Quadro RTX6000 | 24 GB GDDR6 | 672 | 1.44 | 72 / 4608 | 260 | 16.31 | 0.51 | medusa |
Intel Xeon Cascade Lake Gold 6246 (2x 12 C, 3.30GHz) |
Nvidia Geforce RTX3080 | 10 GB, GDDR6X | 760 | 1.440 | 68 / 8704 | 320 | 29.77 | 0.465 | tg08x (TinyGPU) |
Intel Xeon Ice Lake Gold 6226R (2x 32 cores + SMT, 2.90GHz) |
Nvidia Tesla V100 (PCIe, passive) | 32 GB HBM2 | 900 | 1.245 | 80 / 5120 | 250 | 14.13 | 7.066 | tg07x (TinyGPU) |
Intel Xeon Skylake Gold 6134 (2x 8 Cores + SMT, 3.20GHz) |
Nvidia A40 (passive) | 48 GB GDDR6 | 696 | 1.305 | 84 / 10752 | 300 | 37.42 | 1.169 | genoa2 |
AMD Genoa 9354 (2x 32 cores + SMT, 3.25 GHz) |
Nvidia A100 (SMX4/NVLink, passive) | 40 GB HBM2 | 1555 | 1.410 | 108 / 6912 | 400 | 19.5 | 9.7 | tg09x (TinyGPU) |
AMD Rome 7662 (2x 64 Cores, 2.0GHz) |
Nvidia L40 (passive) | 48 GB GDDR6 | 864 | 0.735 | 142 / 18176 | 300 | 90.52 | 1.414 | genoa2 |
AMD Genoa 9354 (2x 32 cores + SMT, 3.25 GHz) |
AMD Instinct MI100 (PCIe Gen4, passive) | 32 GB HBM2 | 1229 | 1.502 | 120 / 7680 | 300 | 21.1 | 11.5 | rome2 |
AMD Rome 7352 (2x 24 cores + SMT, 2.3 GHz) |
AMD Radeon VII | 16 GB HBM2 | 1024 | 1.4 | 60 / 3840 | 300 | 13.44 | 3.36 | interlagos1 |
AMD Interlagos Opteron 6276 |
AMD Instinct MI210 (PCIe Gen4, passive) | 64 GB HBM2e | 1638 | 1.0 | 104 / 6656 | 300 | 22.6 | 22.6 | milan1 |
AMD Milan 7543 (2×32 cores + SMT, 2.8 GHz) |
AMD Instinct MI300X (PCIe Gen5, OAM) | 192 GB HBM3 | 5300 | 1.0 | 304 / 19456 | 750 | 163.44 | 81.72 | aquavan1 |
AMD Genoa 9474F (2×48 cores, 3.6 GHz) |
Accessing the Test cluster#
Access to the test cluster is restricted. Contact hpc-support@fau.de to request access.
See configuring connection settings or SSH in general for configuring your SSH connection.
You might configure proxy jump over csnhr.nhr.fau.de
and usage of SSH private key for the test cluster's frontend node testfront.rrze.fau.de
.
If successfully configured, the test cluster can be accessed via SSH by:
SSH configuration template similar to the general template
Software#
The frontend runs Ubuntu 22.04 LTS.
All software on NHR@FAU systems, e.g. (commercial) applications, compilers and libraries, is provided using environment modules. These modules are used to setup a custom environment when working interactively or inside batch jobs.
For available software see:
Most software is centrally installed using Spack. By default, only a subset of packages installed via Spack is shown. To see all installed packages, load the 000-all-spack-pkgs
module.
You can install software yourself by using the user-spack functionality.
Containers, e.g. Docker, are supported via Apptainer.
Filesystems#
On all front ends and nodes the filesystems $HOME
, $HPCVAULT
, and $WORK
are mounted.
For details see the filesystems documentation.
The nodes have local hard disks of very different capacities and speeds. Do not expect a production environment.
Batch processing#
Resources are controlled through the batch system Slurm.
Access to the nda
partition is restricted and benchmark results must not be published without further consideration.
Maximum job runtime is 24 hours.
The currently available nodes can be listed using:
To select a node, you can either use the host name or a feature name
from sinfo
with salloc
or sbatch
:
--nodes=1 --constraint=featurename ...
-w hostname ...
Specific constraints#
target | Slurm constraint | description |
---|---|---|
hardware performance counters | hwperf |
Enables access to hardware performance counters, required for likwid-perfctr . |
NUMA balancing | numa_off |
Disables NUMA balancing, enabled by default. |
transparent huge pages | thp_always |
Automatically use transparent huge pages, by default set to madvise . |
Specify a constraint with -C <CONSTRAINT>
or --constraint=<CONSTRAINTS>
when using salloc
or sbatch
.
Multiple constraints require quoting and are concatenated via &
, e.g. -C "hwperf&thp_always"
Interactive job#
The environment from the calling shell, like loaded modules, will be inherited by the interactive job.
Interactive jobs can be requested by using salloc
instead of sbatch
and specifying the respective options on the command line:
This will give you a shell on the node hostname
for the specified amount of time.
Batch job#
The following job script will allocate node icx36
for 6 hours:
#!/bin/bash -l
#
#SBATCH --nodes=1
#SBATCH -w icx36
#SBATCH --time=06:00:00
#SBATCH --export=NONE
unset SLURM_EXPORT_ENV
module load ...
./a.out
Attach to a running job#
See the general documentation on batch processing.