HPC Basics - Hello World MPI - Center for High Performance Computing

Logging in to the clusters

CHPC has several clusters, visible on the Cluster Guides page. For the purposes of this tutorial, we will assume you are using notchpeak (but any cluster will do). Log in using an SSH client of your choice:

[user@wherever:~]$ ssh u0123456@notchpeak.chpc.utah.edu

Make sure to replace the username with your own UNID, and if you want a different cluster, replace it with the appropriate cluster name. When you set up your account with CHPC, you selected a default shell, either bash or tcsh. If you forgot which shell you selected, you can find out using the SHELL variable:

[u0123456@notchpeak1:~]$ echo $SHELL

This will give something like /bin/bash or /bin/tcsh. The syntax for scripting each of these shells is different, so make sure you know which one you are using! (You can modify your shell through the Edit Profile page.) There are also many good resources on the internet for learning shell scripting. Associated with each of these shells is a configuration file, called .tcshrc and .bashrc. CHPC has specific configuration files that are essential for setting up cluster specific environments that are set up when user account is created. In March 2015, CHPC started using modules for environment setup. User accounts set up before this time do not use modules. In order to proceed with this tutorial, set up your environment for modules as described in our modules help page.

Note: the rest of this tutorial assumes you are using bash for your shell.

Sourcing MPI

To get started, execute the following:

[u0123456@notchpeak1:~]$ module load intel mpich

This command loads the environment for the Intel compilers and MPI distribution MPICH (previously called MPICH2). Consult the CHPC MPI libraries help page for performance recommendations, or experiment with your own codes to see what MPI setups provide the best performance. Once you've sourced MPI, you should now be able to execute mpicc:

[u0123456@notchpeak1:~]$ mpicc -v
                           mpicc for MPICH version 3.1.2
                           icc version 18.0.1 (gcc version 4.8.5 compatibility)

Hello world

If you have your own source code to test, you may want to use that, but in case you don't, here is a simple hello world script:

http://chpc.utah.edu/docs/manuals/getting_started/code/hello_1.c

You can download it using wget, or you can copy and paste it into your favorite editor. Once you have the file, you can compile it using:

[u0123456@notchpeak1:~]$ mpicc hello_1.c -o hello_notchpeak

If you received any warnings, ignore them. If you have an error, you probably copied the program incorrectly. Important note: It's good practice to compile programs on the interactive nodes of the cluster you'll be working with, and distinguish them using different names (e.g., hello_notchpeak, hello_kingspeak). Generic builds may suffer from lower performance than builds specific to a particular cluster, primarily due to different hardware configurations. Again, visit the cluster user guides for more information on best practices.

Interactive job submission

Now that you have your executable, you're ready to execute the job on the cluster. As of April 2015, CHPC is using the Slurm scheduler. For details on its use, see the Slurm help page. There are two ways to submit a job: through an interactive session and through a batch script. For initial program testing, it is more efficient to use an interactive session. Interactive sessions are also appropriate for doing analysis with programs with GUIs or long compile sessions on the cluster, where running on the standard interactive nodes would be inappropriate.

IMPORTANT WARNING: Never execute a large MPI job on the main interactive nodes (the ones you log in to initially). These nodes are shared by all CHPC users for basic work, and heavy load tasks will degrade performance for everyone. Tasks that exceed 15 minutes under heavy load will be arbitrarily terminated by CHPC systems.

Begin by submitting a request to launch a job on to the cluster nodes. Depending on cluster loads, you may or may not have to wait for the job to start. It will be easier to start an interactive session on the least utilized cluster - check system status using the sinfo command and look for idle nodes. The command:

[u0123456@notchpeak1:~]$ salloc -t 0:05:00 -n 64 -N 2 -A chpc -p notchpeak

This will request an interactive session on notchpeak (--pty /bin/tcsh -l), with 2 nodes (-N 2), 64 tasks (processors) total (-n 64), for 5 minutes (-t 0:05:00). Unless your MPI code is threaded, you should ask for as many task as physical cores available in order to efficiently utilize the resources. See the cluster user guides for details on how many CPU core counts cluster nodes have.

Once the interactive session starts, running the job is quite simple. Navigate to the directory where your program is stored and execute the following commands:

[u0123456@np123:~]$ module load intel mpich

You may need to put in your password once or twice to allow connection to the nodes and confirm some RSA keys. Once you get the command prompt back, execute this command:

[u0123456@np123:~]$ mpirun -np 64 $HOME/hello_notchpeak

Make sure to change the path for your hello world program if you put it somewhere besides your home directory (e.g., $HOME/test/hello_notchpeak). Also change the -n flag to reflect the number of processors you will be running on. If everything goes well, you should see something like this:

Hello world
                           Hello world
                           Hello world
                           Hello world
                           Hello world
                           Hello world
                           Hello world
                           Hello world
                           [u0123456@np123:~]

You should have the same number of "Hello world" lines as you have processors. Finally, exit the interactive session by using the command exit.

Batch job submission

With your favorite editor, make a new file and call it testjob. Copy and paste the following simple script into the file:

#! /bin/bash
                           #SBATCH -t 0:02:00
#SBATCH -n 64
#SBATCH -N 2
#SBATCH -A account-name
#SBATCH -p notchpeak
                           #SBATCH --mail-user=user@utah.edu
                           
                           cd $HOME
                           module load intel mpich
                           mpirun -np $SLURM_NTASKS $HOME/hello_notchpeak > test.out

All of the #SBATCH comments are directives for job control, just like the ones used in the interactive srun command. If you're using a different cluster (such as kingspeak), make sure to use the task count that corresponds to the total core count on the nodes requested and the -np flag, as well as the email that the script points to. To execute the script on the cluster:

[u0123456@notchpeak1:~]$ sbatch testjob
112233.nprm.opib.privatearch.arches

The output upon successful submission will give the job number and an internal moniker for the job. In order to view the job in the queue, you can use the following commands:

squeue - shows all jobs in the queue and current metrics
                           squeue -u u0123456 - shows all jobs for UNID u0123456 (use your own!)
                           squeue --start --jobid=112233 - gives an estimate for when a job will start
                           scontrol show job 112233 - gives useful information about a job

Note that many of these commands may not be useful for this job if it begins running right away. For longer jobs you may run in the future, these will become very useful. If your job ran without error, you should have three output files:

[u0123456@np123:~]$ ls
                           test.o123456 hello_notchpeak
                           test.e123456 hello_1.c
                           test.out

These output files have the format name.(o/e)number and correspond to the standard output and error produced by linux programs. If you use cat on test.out, you'll see output like we saw earlier in the interactive session. If the program ran with an error, or something gets written to output by the batch script, then those will appear in the numbered output files. If you have problems with your program during a batch session, you should look there.

[u0123456@np123:~]$ cat test.out
                           Hello world
                           Hello world
                           Hello world
                           Hello world
                           Hello world
                           Hello world
                           Hello world
                           Hello world

This concludes the tutorial. If you have trouble with this tutorial or anything else on CHPC systems, contact helpdesk@chpc.utah.edu. You may also want to consider attending the presentations which are held each semester by CHPC staff members, spanning a variety of topics such as Linux Basics, Parallel Programming, and Systems Overviews.