Python Software Installation: Containerized Python Installations

Users can install Python packages and other software in containers. This provides complete control over the environment, guarantee that the environment will not be accidentally changed or updated (containers are immutable = can not be modified), and the containers can be archived or shared with others for reproducibility and consistency.

On this page, we describe the process of setting up Micromamba in a container. The process would be similar for other package managers, e.g. to the installation procedure described in the self-installed mamba/conda page.

Please note: While the environment management software described in this page has permissive licenses and can be used for free, channels (sources of packages) may have other license terms and may require commercial licenses. It is your responsibility to use an appropriate channel.

On this page

The table of contents requires JavaScript to load.

Micromamba in a Container: Installation and Use

Installing the whole mamba or conda environment in a container has several benefits:

The environment is packaged in a single file, so it's easy to share and archive the whole environment.
The whole environment is static (fixed during the build of the container), so it won't get changed accidentally when trying to install or update a package, as updates require building a new container.
While it is equally possible to create a container based on Miniconda or Miniforge, the Micromamba installation is smaller and is using the better performing mamba package manager. It also provides the micromamba command as a convenient wrapper for all the commands in the environment, which we utilize in the run command for the container.

Below, we outline steps in building a Micromamba container with some bioinformatics tools and a Jupyter Notebook as an example and use this environment on CHPC's Open OnDemand Jupyter app.

Creating a Micromamba Container

We use Apptainer to create the container. First, we create a recipe file for building the container, an example of which is linked here, and name it Singularity :

Bootstrap: docker
From: mambaorg/micromamba

%post
    micromamba install --yes --name base -c bioconda -c conda-forge \
    python=3.9.1 notebook samtools bwa
    micromamba clean -aqy

%runscript
  micromamba run -p /opt/conda "$@"

In this recipe, we are pulling the Micromamba container from DockerHub, installing the needed tools in the post section, and setting the micromamba run ... command to execute whenever the container is executed.

Python environment packages can be also built into a container by specifying them within an environment.yml file, e.g.

channels:
 -defaults
-conda-forge
dependencies:
-matplotlib
-python=3.9
-pip

For that, we can modify the micromamba install command in the %post section as:

micromamba create --yes --name base --file environment.yml

Note that we are installing these packages in the base environment. We recommend to install separate container for each environment one want to set up, as installing and using virtual environments in the container would complicate the container setup and use.

We build the container by running the following code (for example, in a bash shell):

module load apptainer
unset APPTAINER_BINDPATH
apptainer build mymamba.sif Singularity

Unsetting the APPTAINER_BINDPATH is necessary to avoid a build error that complains about missing mount points in the container. This environment variable ensures the /scratch and /uufs file systems get mounted automatically when the container is executed.

The container .sif file has executable permissions, so we can run the container directly along with the command we want to run within the container:

$ ./mymamba_jup.sif bwa 
Program: bwa (alignment via Burrows-Wheeler transformation)
Version: 0.7.17-r1188
...

Micromamba GPU Container

For a container that interacts with GPUs, one has to use the --nv flag during the container build, which imports the GPU stack from the host into the container to ensure that the mamba package manager picks up the GPU/CUDA dependencies and installs the GPU versions of programs like PyTorch. To see an example of a container definition file that has the PyTorch environment installed, see the Singularity.gpu recipe that we discuss below.

We build the container by running the following code (for example, in a bash shell):

 module load apptainer
 unset APPTAINER_BINDPATH
 apptainer build --nv mymamba_gpu.sif Singularity.gpu

To test that the correct GPU version of PyTorch is installed, we export the environment variable APPTAINER_NV as an alternative to --nv flag. This allows to directly run the container file and include the correct GPU environment:

module load apptainer
export APPTAINER_NV=true
./mymamba_gpu.sif /opt/conda/bin/python -c "import torch; print(torch.cuda.is_available())"
True

The True return from the torch.cuda.is_available() function indicates that the GPU has been detected.

Using the Micromamba Container in Open OnDemand

Just as we have run the bwa command above, we can also run the Jupyter notebook command to start Jupyter. It is not recommended to run the Jupyter notebook command from the terminal, as it requires to create an SSH tunnel to the machine where we run the terminal to access Jupyter in our client's web browser. The Open OnDemand Jupyter app simplifies this problem greatly by launching Jupyter directly in the client's browser.

To run our container, we choose the "Custom (Environment setup below)" option for the "Jupyter Python version", and in the "Environment Setup for Custom Python" text box, put:

shopt -s expand_aliases
module load apptainer
alias jupyter="$HOME/containers/mymamba.sif jupyter"

The first command is a bash option to enable aliases in the shell script. We then load the Apptainer module and follow this with creating an alias for the jupyter command to call it from the container instead. This jupyter alias is then passed to the Open OnDemand Jupyter app and launches the Jupyter server. This alias is then used to run the jupyter command inside of the Open OnDemand job to start the Jupyter server. Notice that we use full path to the container, as the OpenOnDemand app starts at the base of user's $HOME directory.

As noted above, if we need to use GPUs, we need to add the environment variable APPTAINER_NV=true to initialize the GPUs in the container:

shopt -s expand_aliases
module load apptainer
export APPTAINER_NV=true
alias jupyter="$HOME/containers/mymamba_gpu.sif jupyter"