Apptainer/Singularity
The advantage of Apptainer or Singularity over other solutions is its focus on HPC environment, which includes support for parallel execution and GPUs. It is also more feature rich and user friendly.
CHPC provides containers for some applications. Users can also bring in their own containers, provided they include a few plugs for our systems, such as mount points for home and scratch file systems. Finally, Apptainer/Singularity also allows to import Docker containers, most commonly from container repositories such as DockerHub.
Note: As of late 2022, Singularity is being replaced by Apptainer, provided by the
apptainer
module. Apptainer has the same code base as Singularity but is being developed independently and as
such the code base is expected to diverge over time. The singularity
command is still being defined in Apptainer so users can either use the apptainer
or the singularity
command to obtain the same functionality.
Importing Docker Container
Singularity/Apptainer has a direct support for Docker containers. Singularity and Docker page provide a good overview of Singularity's Docker container support. Below we list
some basics with some local caveats. It is assumed that the Singularity or Apptainer
module is loaded, module load singularity
or module load apptainer
.
Running Docker container directly in Singularity/Apptainer
To start a shell in a Docker container using Singularity/Apptainer, simply point to the DockerHub container URL, e.g
singularity shell docker://ubuntu:latest
Singularity scans the host file systems and mounts them into the container automatically, which allows CHPC's non-standard /uufs and /scratch file systems to be visible in the container as well. This obviates the necessity to create the mount points for these file systems manually in the container and makes the DockerHub containers very easy to deploy with Singularity.
Similarly, we can run a program that's in a DockerHub container as
singularity exec docker://biocontainers/blast:2.2.31 blastp -help
Note that the Biocontainers repositories require the version number tag (following the colon) for Singularity to pull them correctly. The version can be found by finding the tag on the container DockerHub page.
A good strategy in finding a container for a needed program is to go to hub.docker.com and search for the program name.
Converting docker image to Singularity
This approach is useful to speed up a container startup, because Singularity will at each pull or exec from Dockerhub build a new Singularity container file, which may take a while if the container is large. The drawback of this approach is that one has to build the Singularity container manually again if the DockerHub image is updated.
The process is also described in Singularity and Docker page. For example, we can built a local bioBakery container by running:
singularity build bioBakery.sif docker://biobakery/workflows
This newly created bioBakery.sif container can then be run as:
singularity exec bioBakery.sif humann2 --help
This command will execute much faster than executing from the DockerHub pulled image:
singularity exec docker://biobakery/workflows humann2 --help
Modifying Docker container
Sometimes it is necessary to make modifications to a container that is downloaded from a public space (e.g. Dockerhub). For example, the container may not include a program that is needed to run part of the workflow for which the container is designed. To do that, we can build the sandbox container, which is a flat file system representation of the container, shell into the container in a writeable mode, make the modification, and then build the container sif file from the sandbox. This is possible in the user space with Apptainer 1.2.5 and newer.
module load apptainer
apptainer build --sandbox mycontainer docker://ubuntu:latest
mkdir mycontainer/uufs
apptainer shell -w mcontainer
... make the necesary modifications and exit
apptainer build mycontainer.sif mycontainer
Note that we are also running mkdir mycontainer/uufs
, this is necessary because running the container as user needs to mount the /uufs
space where the user home directory is, for which there needs to be a mount point
in the container.
Checking if container already exists
The container upload and build can be automated by utilizing a shell script that we wrote, update-container-from-dockerhub.sh. This script can be run before every time the container is run to ensure that the latest container version is used without unnecessary uploading if no newer version exists.
The approach described above can be wrapped into a SLURM script that checks if the sif file exists, or if there is an updated container on DockerHub. The SLURM script then may look like this:
#SBATCH -N 1
#SBATCH -p ember
#SBATCH -t 1:00:00
# check if the container exists or is newer and pull if needed
/uufs/chpc.utah.edu/sys/installdir/singularity3/update-container-from-dockerhub.sh biobakery/workflows bioBakery.sif
# run a program from the container
singularity exec bioBakery.sif humann2 --help
Setting up a module file for a downloaded container
To make the commands that run from the container easier to use, we can build an Lmod
module and wrap up the commands to be run in the container in this module. This way
the commands and program located inside of the container can be called by their original
name, not through the singularity command. First we create user based modules, and there put a directory named after our container, here my_new_container
. Then copy our template to this new user modules directory, and name it based on the container program version,
here 1.0.0
:
mkdir $HOME/MyModules/my_new_container
cd $HOME/MyModules/my_new_container
cp /uufs/chpc.utah.edu/sys/modulefiles/templates/container-template.lua 1.0.0.lua
Then edit the new module file, 1.0.0.lua
, to modify the container name, the command(s) to call from the container and the
module file meta data:
-- required path to the container sif file
local CONTAINER="/uufs/chpc.utah.edu/common/home/u0123456/containers/my_new_container.sif"
-- required text array of commands to alias from the container
local COMMANDS = {"command"}
-- these optional lines provide more information about the program in this module file
whatis("Name : Program name")
whatis("Version : 1.0.0")
whatis("Category : Program's category")
whatis("URL : Program's URL")
whatis("Installed on : 10/05/2021")
whatis("Installed by : Your Name")
When we have the module file created, we can activate the user modules and load the module:
module use $HOME/MyModules
module load my_new_container/1.0.0
This way we can use just the command
to run this program inside of the container, without the need of having he long singularity
execution line.
It may be difficult to find what are the programs in the container that need to be
placed in the COMMANDS
list. Our strategy is to look for the location of the program inside the container,
finding the directory where this program is, and get a list of all programs, that
are in this directory (oftentimes this directory name is some kind of bin
). This directory usually contains the programs/commands, that are supplied by the
given package.
To get a list of these programs, first load the newly created container module and
execute the containerShell
to get a shell to the container. Then run which command
, which finds the directory where the command
program is located. cd to this directory, and run script /uufs/chpc.utah.edu/sys/modulefiles/templates/grab_path_progs.sh
that produces the list of files in this directory. Scrutinize this list, removing
programs that look uneeded, and paste it into the COMMANDS
list of the module file. Make sure that the quotes in all the COMMANDS
list items are correctly placed.
NOTE: Some packages execute programs from scripts, which effectively calls the program
command
, (e.g. in the medaka/1.7.2 container, binary called medaka), that may be defined in the COMMANDS
list, inside of the container (medaka is listed in COMMANDS in the medaka/1.7.2 module file, however, some other commands such as medaka_consensus call medaka, which results in medaka being executed inside the container). This may result in the following error: “singularity: command not found”. This is
because we define the alias to the command
in the module file to use Singularity to call the program in the container, (e.g.
in the medaka/1.7.2 container, this is singularity exec /uufs/chpc.utah.edu/sys/installdir/r8/medaka/1.7.2/medaka.1.7.2.sif
medaka). Inside of the container, the "singularity" command is not defined, but, the command
is set up to call "singularity", because Singularity by default inherits the parent
shell environment (this includes the re-definition of the medaka command to call the long singularity line listed above). To fix this problem, open the module file, locate the following
section:
local run_function = 'singularity exec ' .. nvswitch .. CONTAINER .. "
and change it to :
local run_function = 'singularity exec -e' .. nvswitch .. CONTAINER .. "
This will not import the runtime environment into the container, thus the "singularity"
aliases will not be defined container. The negative of this approach is that potential
environmental variables defined by user before executing the command
, for example to modify how the program should execute, may not be available. The
way around that is to either see if the program's behavior is modifyable by a runtime
argument instead, or, alternatively, use the SINGULARITYENV_name_of_the_variable to
define the environment variable to be brought into the container (e.g. if we define
an environment variable DATA=$HOME/mydata, then we also need to set environment variable SINGULARITYENV_DATA=$DATA to make the DATA environment variable available to the program command
that runs inside of the container.
Example: Finding and running a Docker container
Frequently we get requests to install complex programs that may not even run on the OS that runs CHPC Linux machines. Before writing to CHPC support consider following the example below with your application.
An user wants to install a program called guppy, which installation and use are described at a blog post. They want to run it on a GPU since it's faster. From the blog post we know the program's name, have a hint on a provider of the program, and how to install it in Ubuntu Linux. After some web searching we find out that the program is mainly available commercially, so it has no publicly available dowload section and, likely there is no CentOS version. That leaves us with a need for an Ubuntu based container.
We can build the container ourselves based on the instructions in the blog post, but, we would need to either build with Docker or Singularity on a local machine with root, or use DockerHub automated build through GitHub repository. This can be time consuming and cumbersome, so we leave it as a last resort.
We do some more web searching to see if guppy
has a container. First we search for guppy dockerhub
, we get lots of hits like this one, but, none for the GPU (looking at the Dockerfile, there's no mention of GPU in the base image or what's being installed). Next we
try "guppy gpu" dockerhub
and find this container. We don't know yet if it does indeed support GPU, and since the Dockerfile is missing,
we suspect that it is hosted on GitHub. So, we search "guppy-gpu" github
and find this repository, which based on the repository name and source looks like a match to the DockerHub
image. Examining the Dockerfile we see that the container is based on nvidia/cuda9.0,
which means it's being set up for a GPU. This is looking hopeful so we get the container
and try to run it.
$ ml singularity
$ singularity pull docker://aryeelab/guppy-gpu
$ singularity shell --nv guppy-gpu_latest.sif
$ nvidia-smi
...
| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1
... to check if the GPU works
$ guppy_basecaller --help
: Guppy Basecalling Software, (C) Oxford Nanopore Technologies, Limited.
Version 2.2.2
... to check that the program is there.
Above we have loaded the Singularity module and used Singularity to pull the Docker
container. This has downloaded the Docker container image layers and has created Singularity
container file guppy-gpu_latest.sif
. Then we opened a shell in this container (using the --nv
flag to bring in the host GPU stack into the container), and tested the GPU visibility
with nvidia-smi
followed by running the command guppy_basecaller
to verify that it exists. With these positive outcomes, we can proceed to run the
program with our data, which can be done directly with
$ singularity exec --nv guppy-gpu_latest.sif guppy_basecaller -i <fast5_dir> -o <output_folder> -c dna_r9.4.1_450bps -x "cuda:0"
As mentioned above, the singularity pull
command creates a Singularity container based on a Docker container image. To guarantee
that we will always get the latest version, we can use the shell script we have described
above, e.g.
$ /uufs/chpc.utah.edu/sys/installdir/singularity3/update-container-from-dockerhub.sh aryeelab/guppy-gpu guppy-gpu_latest.sif
$ singularity exec --nv guppy-gpu_latest.sif guppy_basecaller -i <fast5_dir> -o <output_folder> -c dna_r9.4.1_450bps -x "cuda:0"
To make this even easier to use, we build an Lmod module and wrap up the commands to be run in the container in this module. First we create user based modules. Then copy our template to the user modules directory:
mkdir $HOME/MyModules/guppy
cd $HOME/MyModules/guppy
cp /uufs/chpc.utah.edu/sys/modulefiles/templates/container-template.lua 3.2.2.lua
and edit the new module file, 3.2.2.lua
, to modify the container name, the command(s) to call from the container and the
module file meta data:
-- required path to the container sif file
local CONTAINER="/uufs/chpc.utah.edu/common/home/u0123456/containers/guppy-gpu_latest.sif"
-- required text array of commands to alias from the container
local COMMANDS = {"guppy_basecaller"}
-- these optional lines provide more information about the program in this module file
whatis("Name : Guppy")
whatis("Version : 3.2.2")
whatis("Category : genomics")
whatis("URL : https://nanoporetech.com/nanopore-sequencing-data-analysis")
whatis("Installed on : 10/05/2021")
whatis("Installed by : Your Name")
When we have the module file created, we can activate the user modules and load the guppy module:
module use $HOME/MyModules
module load guppy/3.2.2
This way we can use just the guppy_basecaller
command to run this program inside of the container.
Running CHPC provided containers
We provide containers for applications that are difficult to build natively on the operation system that our clusters run. Most of these applications are being developed on Debian based Linux systems (Ubuntu and Debian) and rely on their software stack. Some containers are simply DockerHub images converted to the Singularity sif format, while others are built manually by CHPC staff.
Running a CHPC provided container is as simple as running the application command itself. We provide an environment module that sets up aliases for the application's commands that calls the container behind the scenes. We also provide a command to start a shell in the container, "containerShell", which allow to open a shell in the container, and from which the user can call the commands needed to execute their processing pipeline.
In the containers, the user can access storage in their home directories, or in the scratch file servers.
List of containerized modules can be found by running grep -R --include "*.lua" singularity /uufs/chpc.utah.edu/sys/modulefiles/CHPC-r8/Core
| grep depends_on
.
Building your own Singularity container
As of Apptainer version 1.2.5, one can build a container completely in user space, which means that container builds can be done on CHPC Linux systems. Since large containers require more CPU and memory resources, it is recommended to do so in an interactive job.
salloc -N 1 -n 16 -A notchpeak-shared-short -p notchpeak-shared-short -t 2:00:00 --gres=gpu:1080ti:1
module load apptainer
unset APPTAINER_BINDPATH
apptainer build --nv mycontainer.sif Singularity
We first ask for an interactive job session and optionally request a GPU as well. Then we load the Apptainer module and unset the module pre-set APPTAINER_BINDPATH environment variable - having this set results in the build process erroring out due to non-existent bind path. Then we build the container based on the definition file called Singularity. The --nv flag which initializes the GPU support during the build is optional, but needed for the GPU programs to be set up correctly.
Note that in our apptainer module, we define two environment variables:
- APPTAINER_SHELL=/bin/bash - this sets the container shell to bash (easier to use than default sh)
- APPTAINER_BINDPATH=/scratch,/uufs/chpc.utah.edu - this binds mount points to all the /scratch file servers and to /uufs file servers (sys branch, group spaces).
If you prefer to use a different shell, or not bind the file servers, set these variables differently or unset them.