Skip to content

VASP

Description

The Vienna Ab initio Simulation Package (VASP) is a computer program for atomic scale materials modelling, e.g. electronic structure calculations and quantum-mechanical molecular dynamics, from first principles.

VASP is a licensed program, with the license being sold on per research group basis. Users need to be members of a research group that has purchased the license. New licensees need to contact CHPC to request to be added to the list of groups authorized to use VASP. Please, provide the proof of purchase with this request. There are two kinds of licenses. Research groups that have purchased VASP license in the past (prior to ~2020) likely own version 5.x license. More recent licensees get the version 6.x license, which is current. CHPC manages both versions 5.x and 6.x in separate user groups. Groups that own only version 5.x license are only allowed to use this versions, and will need to pay an upgrade fee (1500 Euros as of January 2023) to be able to use version 6.x.

Using VASP

CHPC provides version 5.4.4 as the last of the 5.x series, and updates the version 6.x series as needed. Note that in version 5.4.4, OpenMP based parallelism is not optimal, while it is important for performance on the Notchpeak AMD nodes. For better performance on the AMD nodes, it is necessary to use version 6.x.

Because of the sub-optimal OpenMP performance, version 5.4.4 is offered without OpenMP support. On the other hand, versions 6.x which have better OpenMP performance, especially on the AMD nodes, is offered with MPI support.

To use VASP 5.x:

module load intel-oneapi-compilers/2021.4.0 intel-oneapi-mpi/2021.1.1 vasp/5.4.4
mpirun -np $SLURM_NTASKS vasp_std

To use VASP 6.x on the Intel nodes, it is not necessary to use OpenMP threading, the multi MPI process parallelization works the best:

module load gcc/11.2.0  openmpi/4.1.6 vasp/6.4.2
mpirun -x OMP_NUM_THREADS=1 -np $SLURM_NTASKS vasp_std

On the Notchpeak AMD nodes, it is advantageous to use multiple OpenMP threads per MPI process. In performance evaluation detailed below, we have determined that on the 64 CPU core node, the best performance is achieved using 8 MPI processes, running 8 OpenMP threads each, as:

module load gcc/11.2.0  openmpi/4.1.6 vasp/6.4.2
mpirun -x OMP_NUM_THREADS=8 -np 8 vasp_std

GPU binaries

VASP can run on Nvidia GPUs, using OpenACC for GPU offloading. See the VASP documentation for details and recommendations on how to run. As of this writing, we have versions 6.4.2 and 6.5.1 built with GPU support. They are built with Nvidia HPC compilers, which newer versions require newer generations of CPUs than what is on the Lonepeak cluster. For this reason, we need to add the extra modules before loading the NVHPC, OpenMPI and VASP modules:

module use /uufs/chpc.utah.edu/sys/modulefiles/spack/linux-rocky8-x86_64/Core/linux-rocky8-sandybridge
module load nvhpc/24.3 openmpi/5.0.3-gpu vasp/6.5.1-gpu
mpirun -x OMP_NUM_THREADS=4 -np 2 vasp_std

GPU build of VASP allows to use one GPU per MPI task. Thus in the example above, we are using two MPI tasks and two GPUs. To better utilize the CPU parts of the code (some parts still run on CPUs), it is advantageous to use more OpenMP threads per MPI task. In this case, we are using four CPUs per MPI task to utilize four threads. Your ratio of GPUs/MPI tasks/OpenMP threads will depend on the CPU and GPU layout of the node you run on.

Note that at present, CHPC does not have many GPU nodes of the same kind on Notchpeak that connect with high speed InfiniBand network, and the GPU nodes on Granite are not connected optimally for high speed GPU to GPU communication, therefore we don't recommend running the GPU build of VASP distributed over more than one node.

Special VASP builds

There are numerous plugins that allow VASP to produce additional output or provide additional functionality. We build these modified VASP binaries by request, and have the following as of the time of this writing.

Wannier90

Wannier90 computes maximally-localised Wannier functions (MLWF). VASP has a special option to include Wannier90. We have VASP versions 5.4.4 and 6.5.1 built with Wannier90, available for CPU only as:

module load gcc/15.1.0  openmpi/5.0.8 vasp/6.5.1-wannier
mpirun -x OMP_NUM_THREADS=8 -np 8 vasp_std

Transition State Tools for VASP (VTST)

VTST allows to find saddle points and evaluating transition state theory (TST) rate constants with VASP. We have them built for CPU only VASP 6.2.2, available as:

module load gcc/11.2.0  openmpi/4.1.4 vasp/6.2.1.VTST
mpirun -x OMP_NUM_THREADS=8 -np 8 vasp_std

Performance notes

For reference, below are some timings for VASP 6.3.2 on the AMD Milan 64 core and Intel Ice Lake 56 core Notchpeak nodes, running 20 SCF iterations of a 744 particle system. The runtimes are in seconds, lower is better.

First for pure MPI parallelization, no OpenMP threads

AMD Milan CPU                  Intel Ice Lake CPU 
CPUs Runtime Scaling   CPUs Runtime Scaling
64        505.36   14.82       56        244.31    22.49
32        523.64    14.31       28        306.64     17.91
16        623.27    12.02       14        512.80     10.71
8        1014.69     7.38          8         808.78       6.79
4        1881.89     3.98          4        1562.55     3.52
2        3715.09     2.02          2        2835.92     1.94
1       7491.70      1.00          1        5493.43     1.00

In bold is the fastest runtime on the whole node, which shows the Intel node to perform about 2x as fast as the AMD node. The parallel scaling is better on the Intel CPU as well.

Now let's look at combining less MPI tasks with more OpenMP threads, loading all the CPU cores on the node. Due to its design, the AMD CPU architecture should benefit from this more. In the table below the "p" stands for MPI task, "t" for OpenMP thread, so for example "8p8t" uses 8 MPI tasks each running 8 OpenMP threads.

AMD Milan CPU     Intel Ice Lake CPU
64p1t   505.36      56p1t    244.31
32p2t   472.61      28p2t    249.31
16p4t   390.01      14p4t    253.50
8p8t     345.34        8p7t    239.17

Notice that the AMD CPU significantly benefits from the OpenMP threading, which is why we recommend to use it. The Intel CPU is slightly faster, but, close to the noise level, which is why we think that it is not necessary to use OpenMPI at least up to the Ice Lake generation of the Intel CPUs.

Last Updated: 12/23/25