VASP - Center for High Performance Computing

Description

The Vienna Ab initio Simulation Package (VASP) is a computer program for atomic scale materials modelling, e.g. electronic structure calculations and quantum-mechanical molecular dynamics, from first principles.

VASP is a licensed program, with the license being sold on per research group basis. Users need to be members of a research group that has purchased the license. New licensees need to contact CHPC to request to be added to the list of groups authorized to use VASP. Please, provide the proof of purchase with this request. There are two kinds of licenses. Research groups that have purchased VASP license in the past (prior to ~2020) likely own version 5.x license. More recent licensees get the version 6.x license, which is current. CHPC manages both versions 5.x and 6.x in separate user groups. Groups that own only version 5.x license are only allowed to use this versions, and will need to pay an upgrade fee (1500 Euros as of January 2023) to be able to use version 6.x.

Using VASP

CHPC provides version 5.4.4 as the last of the 5.x series, and updates the version 6.x series as needed. Note that in version 5.4.4, OpenMP based parallelism is not optimal, while it is important for performance on the Notchpeak AMD nodes. For better performance on the AMD nodes, it is necessary to use version 6.x.

Because of the sub-optimal OpenMP performance, version 5.4.4 is offered without OpenMP support. On the other hand, version 6.3.2 which has better OpenMP performance, especially on the AMD nodes, is offered with MPI support.

To use VASP 5.x:

module load intel-oneapi-compilers/2021.4.0 intel-oneapi-mpi/2021.1.1 vasp/5.4.4
mpirun -np $SLURM_NTASKS vasp_std

To use VASP 6.x on the Intel nodes, it is not necessary to use OpenMP threading, the multi MPI process parallelization works the best:

module load intel-oneapi-compilers/2021.4.0 intel-oneapi-mpi/2021.1.1 vasp/6.3.2
mpirun -genv OMP_NUM_THREADS=1 -np $SLURM_NTASKS vasp_std

On the Notchpeak AMD nodes, it is advantageous to use multiple OpenMP threads per MPI process. In performance evaluation detailed below, we have determined that on the 64 CPU core node, the best performance is achieved using 8 MPI processes, running 8 OpenMP threads each, as:

module load intel-oneapi-compilers/2021.4.0 intel-oneapi-mpi/2021.1.1 vasp/6.3.2
mpirun -genv OMP_NUM_THREADS=8 -np 8 vasp_std

Performance notes

For reference, below are some timings for VASP 6.3.2 on the AMD Milan 64 core and Intel Ice Lake 56 core Notchpeak nodes, running 20 SCF iterations of a 744 particle system. The runtimes are in seconds, lower is better.

First for pure MPI parallelization, no OpenMP threads

AMD Milan CPU Intel Ice Lake CPU
CPUs Runtime Scaling CPUs Runtime Scaling
64 505.36 14.82 56 244.31 22.49
32 523.64 14.31 28 306.64 17.91
16 623.27 12.02 14 512.80 10.71
8 1014.69 7.38 8 808.78 6.79
4 1881.89 3.98 4 1562.55 3.52
2 3715.09 2.02 2 2835.92 1.94
1 7491.70 1.00 1 5493.43 1.00

In bold is the fastest runtime on the whole node, which shows the Intel node to perform about 2x as fast as the AMD node. The parallel scaling is better on the Intel CPU as well.

Now let's look at combining less MPI tasks with more OpenMP threads, loading all the CPU cores on the node. Due to its design, the AMD CPU architecture should benefit from this more. In the table below the "p" stands for MPI task, "t" for OpenMP thread, so for example "8p8t" uses 8 MPI tasks each running 8 OpenMP threads.

AMD Milan CPU Intel Ice Lake CPU
64p1t 505.36 56p1t 244.31
32p2t 472.61 28p2t 249.31
16p4t 390.01 14p4t 253.50
8p8t 345.34 8p7t 239.17

Notice that the AMD CPU significantly benefits from the OpenMP threading, which is why we recommend to use it. The Intel CPU is slightly faster, but, close to the noise level, which is why we think that it is not necessary to use OpenMPI at least up to the Ice Lake generation of the Intel CPUs.