Test cluster#

The test and benchmark cluster is an environment for porting software to new CPU architectures and running benchmark tests. It comprises a variety of nodes with different processors, clock speeds, memory speeds, memory capacity, number of CPU sockets, etc. There is no high-speed network, and MPI parallelization is restricted to one node. The usual NFS filesystems are available.

This is a testing ground. Any job may be canceled without prior notice.

System overview#

This is a quick overview of the systems including their host names (frequencies are nominal values) - NDA systems are not listed:

Hostname	CPU	RAM	Accelerators
`applem1studio`	Apple M1 Ultra 20-Core Processor	64 GiB RAM	64-Core GPU
`aquavan1`	Dual AMD EPYC 9474F CPU (96 cores)	2304 GiB RAM (24x 96 GB DDR5-5600)	8x AMD MI300X (192 GiB HBM3)
`aquavan2`	Quad AMD Instinct MI300A (24 Core)	512 GiB RAM (4x 128 GB HBM3)	4x AMD MI300A
`broadep2`	Dual Intel Xeon "Broadwell" CPU E5-2697 v4 (2x 18 cores + SMT) @ 2.30GHz	128 GiB RAM
`casclakesp2`	Dual Intel Xeon "Cascade Lake" Gold 6248 CPU (2x 20 cores + SMT) @ 2.50GHz	384 GiB RAM
`euryale`	Dual Intel Xeon "Broadwell" CPU E5-2620 v4 (2x 8 cores) @ 2.10GHz	64 GiB RAM	AMD RX 6900 XT (16 GB)
`genoa1`	Dual AMD EPYC 9654 "Genoa" CPU (2x 96 cores + SMT) @ 2.40GHz	768 GiB RAM
`genoa2`	Dual AMD EPYC 9354 "Genoa" CPU (2x 32 cores + SMT) @ 3.25GHz	768 GiB RAM	Nvidia L40 (48 GiB GDDR6) Nvidia L40s (48 GiB GDDR6)
`genoa3`	Dual AMD EPYC 9684X "Genoa" CPU (2x 96 cores) @ 1.935GHz	768 GiB RAM
`gracehop1`	Nvidia Grace Hopper GH200 (72 cores)	480 GiB RAM	Nvidia H200 (96 GiB HBM3)
`gracesup1`	Nvidia Grace Superchip (2x 72 cores)	480 GiB RAM
`hasep1`	Dual Intel Xeon "Haswell" E5-2695 v3 CPU (2x 14 cores + SMT) @ 2.30GHz	64 GiB RAM
`icx32`	Dual Intel Xeon "Ice Lake" Platinum 8358 CPU (2x 32 cores + SMT) @ 2.60GHz	256 GiB RAM	Nvidia L4 (24 GB)
`icx36`	Dual Intel Xeon "Ice Lake" Platinum 8360Y CPU (2x 36 cores + SMT) @ 2.40GHz	256 GiB RAM
`ivyep1`	Dual Intel Xeon "Ivy Bridge" E5-2690 v2 CPU (2x 10 cores + SMT) @ 3.00GHz	64 GiB RAM
`lukewarm`	Dual ARM Ampere Altra Max M128-30, ARM aarch64 (2x 128 cores) @ 2.8 GHz	512 GB RAM (DDR4-3200)
`medusa`	Dual Intel Xeon "Cascade Lake" Gold 6246 CPU (2x 12 cores + SMT) @ 3.30GHz	192 GiB RAM	Nvidia Geforce RTX 2070 SUPER (8 GiB GDDR6) Nvidia Geforce RTX 2080 SUPER (8 GiB GDDR6) Nvidia Quadro RTX 5000 (16 GiB GDDR6) Nvidia Quadro RTX 6000 (24 GiB GDDR6)
`milan1`	Dual AMD EPYC 7543 "Milan" CPU (2x 32 cores + SMT) @ 2.8 GHz	256 GiB RAM	AMD MI210 (64 GiB HBM2e)
`naples1`	Dual AMD EPYC 7451 "Naples" CPU (2x 24 cores + SMT) @ 2.3 GHz	128 GiB RAM
`optane1`	Dual Intel Xeon "Ice Lake" Platinum 8362 CPU (2x 32 cores + SMT) @ 2.80 GHz	256 GiB RAM 1024 GiB Optane Memory
`rome1`	Single AMD EPYC 7452 "Rome" CPU (32 cores + SMT) @ 2.35 GHz	128 GiB RAM
`rome2`	Dual AMD EPYC 7352 "Rome" CPU (2x 24 cores + SMT) @ 2.3 GHz	256 GiB RAM	AMD MI100 (32 GiB HBM2) ~~AMD MI210 (64 GiB HBM2)~~ broken
`skylakesp2`	Intel Xeon "Skylake" Gold 6148 CPU (2x 20 cores + SMT) @ 2.40GHz	96 GiB RAM
`warmup`	Dual Cavium/Marvell "ThunderX2" (ARMv8) CN9980, ARM aarch64 (2x 32 cores + 4-way SMT) @ 2.20 GHz	128 GiB RAM
`bergamo1`	Dual AMD EPYC 9754 "Bergamo" CPU (2x 128 cores + SMT) @ 2.25GHz	1.5 TiB RAM
`saprap2`	Dual Intel Xeon "SapphireRapids" Platinum 8470 CPU (2x 52 cores + SMT) @ 2.0GHz	512 GiB RAM
`turin1`	Dual AMD EPYC 9555 "Turin" CPU (2x 64 cores + SMT) @ 3.2GHz	768 GiB RAM
`turin2`	Dual AMD EPYC 9825 "Turin" CPU (2x 144 cores + SMT) @ 2.2GHz	256 GiB RAM
`granrap2`	Dual Intel Xeon 6 Performance 6787P (2x 86 cores + SMT) @ 2.0GHz	512 GiB RAM
`spark1` (currently not available)	Nvidia GB10 CPU (Cortex-X925 + Cortex-A725) CPU 20-Core Processor	128 GiB RAM	Nvidia GB10 (shared RAM/VRAM)
`blackultra1`	Dual Intel Xeon 6740P (2x 48 cores + SMT) @ 2.1GHz	2048 GiB RAM	8x NVIDIA B300 SXM6 AC

GPU availability#

Technical specifications of all more or less recent GPUs available at NHR@FAU (either in the test cluster or in TinyGPU):

	RAM	BW [GB/s]	Ref Clock [GHz]	#SM (or #CU) / #Cores	TDP [W]	SP [TFlop/s]	DP [TFlop/s]	Host	Host CPU (base clock frequency)
Nvidia Geforce GTX1080	8 GB GDDR5	320	1.607	20 / 2560	180	8.87	0.277	`tg03x` (JupyterHub)	Intel Xeon Broadwell E5-2620 v4 (8 C, 2.10GHz)
Nvidia Geforce GTX1080Ti	11 GB GDDR5	484	1.48	28 / 3584	250	11.34	0.354	`tg04x` (JupyterHub)	Intel Xeon Broadwell E5-2620 v4 (2x 8 C, 2.10GHz)
Nvidia Geforce RTX2070Super	8 GB GDDR6	448	1.605	40 / 2560	215	9.06	0.283	`medusa`	Intel Xeon Cascade Lake Gold 6246 (2x 12 C, 3.30GHz)
Nvidia Quadro RTX5000	16 GB GDDR6	448	1.62	48 / 3072	230	11.15	0.48	`medusa`	Intel Xeon Cascade Lake Gold 6246 (2x 12 C, 3.30GHz)
Nvidia Geforce RTX2080Super	8 GB GDDR6	496	1.65	48 / 3072	250	11.15	0.348	`medusa`	Intel Xeon Cascade Lake Gold 6246 (2x 12 C, 3.30GHz)
Nvidia Geforce RTX2080Ti	11 GB GDDR6	616	1.35	68 / 4352	250	13.45	0.42	`tg06x` (TinyGPU)	Intel Xeon Skylake Gold 6134 (2x 8 Cores + SMT, 3.20GHz)
Nvidia Quadro RTX6000	24 GB GDDR6	672	1.44	72 / 4608	260	16.31	0.51	`medusa`	Intel Xeon Cascade Lake Gold 6246 (2x 12 C, 3.30GHz)
Nvidia Geforce RTX3080	10 GB, GDDR6X	760	1.440	68 / 8704	320	29.77	0.465	`tg08x` (TinyGPU)	Intel Xeon Ice Lake Gold 6226R (2x 32 cores + SMT, 2.90GHz)
Nvidia Tesla V100 (PCIe, passive)	32 GB HBM2	900	1.245	80 / 5120	250	14.13	7.066	`tg07x` (TinyGPU)	Intel Xeon Skylake Gold 6134 (2x 8 Cores + SMT, 3.20GHz)
Nvidia A100 (SMX4/NVLink, passive)	40 GB HBM2	1555	1.410	108 / 6912	400	19.5	9.7	`tg09x` (TinyGPU)	AMD Rome 7662 (2x 64 Cores, 2.0GHz)
Nvidia L40 (passive)	48 GB GDDR6	864	0.735	142 / 18176	300	90.52	1.414	`genoa2`	AMD Genoa 9354 (2x 32 cores + SMT, 3.25 GHz)
Nvidia L40s (passive)	48 GB GDDR6	864	1.11	142 / 18176	300	91.61	1.431	`genoa2`	AMD Genoa 9354 (2x 32 cores + SMT, 3.25 GHz)
Nvidia GB10 (currently not available)	128 GB LPDDR5x	273	1.665			31.03	15.51	`spark1`	Nvidia GB10 CPU (Cortex-X925 + Cortex-A725)
AMD Instinct MI100 (PCIe Gen4, passive)	32 GB HBM2	1229	1.502	120 / 7680	300	21.1	11.5	`rome2`	AMD Rome 7352 (2x 24 cores + SMT, 2.3 GHz)
AMD Radeon VII	16 GB HBM2	1024	1.4	60 / 3840	300	13.44	3.36	`interlagos1`	AMD Interlagos Opteron 6276
AMD Instinct MI210 (PCIe Gen4, passive)	64 GB HBM2e	1638	1.0	104 / 6656	300	22.6	22.6	`milan1`	AMD Milan 7543 (2×32 cores + SMT, 2.8 GHz)
AMD Instinct MI300X (PCIe Gen5, OAM)	192 GB HBM3	5300	1.0	304 / 19456	750	163.44	81.72	`aquavan1`	AMD Genoa 9474F (2×48 cores, 3.6 GHz)
AMD Instinct MI300A	128 GB HBM3	5300	2.1	228 / 14592	550	122.6	61.3	`aquavan2`	AMD Zen4 (4x 24 Core, 3.7 GHz)
NVIDIA B300 SXM6 AC	269 GB HBM3e	1800	1.7	160 / 20480	1100	144	72	`blackultra1`	Intel Xeon 6740P (2x 48 Core, 2.1 GHz)

Accessing the Test cluster#

Access to the test cluster is restricted. Contact hpc-support@fau.de to request access.

See configuring connection settings or SSH in general for configuring your SSH connection.

You might configure proxy jump over csnhr.nhr.fau.de and usage of SSH private key for the test cluster's frontend node testfront.rrze.fau.de.

If successfully configured, the test cluster can be accessed via SSH by:

ssh testfront.rrze.fau.de

SSH configuration template similar to the general template

Host testfront.nhr.fau.de testfront.rrze.uni-erlangen.de
    HostName testfront.rrze.uni-erlangen.de
    User <HPC account>
    ProxyJump csnhr.nhr.fau.de
    IdentityFile ~/.ssh/id_ed25519_nhr_fau
    IdentitiesOnly yes
    PasswordAuthentication no
    PreferredAuthentications publickey
    ForwardX11 no
    ForwardX11Trusted no

Software#

The frontend runs Ubuntu 22.04 LTS.

All software on NHR@FAU systems, e.g. (commercial) applications, compilers and libraries, is provided using environment modules. These modules are used to setup a custom environment when working interactively or inside batch jobs.

For available software see:

Most software is centrally installed using Spack. By default, only a subset of packages installed via Spack is shown. To see all installed packages, load the 000-all-spack-pkgs module. You can install software yourself by using the user-spack functionality.

Containers, e.g. Docker, are supported via Apptainer.

Filesystems#

On all front ends and nodes the filesystems $HOME, $HPCVAULT, and $WORK are mounted. For details see the filesystems documentation.

The nodes have local hard disks of very different capacities and speeds. Do not expect a production environment.

Batch processing#

Resources are controlled through the batch system Slurm.

Access to the nda partition is restricted and benchmark results must not be published without further consideration.

Maximum job runtime is 24 hours.

The currently available nodes can be listed using:

sinfo -o "%.14N %.9P %.11T %.4c %.8z %.6m %.35f"

To select a node, you can either use the host name or a feature name from sinfo with salloc or sbatch:

--nodes=1 --constraint=featurename ...
-w hostname ...

Specific constraints#

target	Slurm constraint	description
hardware performance counters	`hwperf`	Enables access to hardware performance counters, required for `likwid-perfctr`.
NUMA balancing	`numa_off`	Disables NUMA balancing, enabled by default.
transparent huge pages	`thp_always`	Automatically use transparent huge pages, by default set to `madvise`.

Specify a constraint with -C <CONSTRAINT> or --constraint=<CONSTRAINTS> when using salloc or sbatch. Multiple constraints require quoting and are concatenated via &, e.g. -C "hwperf&thp_always"

Interactive job#

The environment from the calling shell, like loaded modules, will be inherited by the interactive job.

Interactive jobs can be requested by using salloc instead of sbatch and specifying the respective options on the command line:

salloc --nodes=1 -w hostname --time=hh:mm:ss

This will give you a shell on the node hostname for the specified amount of time.

Batch job#

The following job script will allocate node icx36 for 6 hours:

#!/bin/bash -l 
#
#SBATCH --nodes=1
#SBATCH -w icx36
#SBATCH --time=06:00:00 
#SBATCH --export=NONE 

unset SLURM_EXPORT_ENV 

module load ...

./a.out

Attach to a running job#

See the general documentation on batch processing.