Skip to content

Container Builds and Nvidia Customization

cdalton-umaine edited this page Dec 19, 2024 · 2 revisions

Introduction - Apptainer

Containers offer a way to encapsulate a whole, sometimes complex, environment into a single file. It also allows software that was built in one environment to be able to be run in another environment. Probably the most widely known container system is Docker. Docker poses security problems in HPC environments so Apptainer (formerly Singularity) was created to address those issues. The ARCSIM clusters use Singularity/Apptainer and thankfully, it was designed to be able to run Docker containers. So, there is a large set of pre-existing containers that can be run. However, sometimes a container does not have everything you need. For instance, a TensorFlow container might not have all of the Python packages installed. This page will show you how to create a new container from an existing container and add packages into it.

The Definition File

A text file, called a Definition file, is used to describe what needs to be done in order to build the Apptainer container. Here is an example:

Bootstrap: docker                              # Start from a Docker container
From: rocker/geospatial                        # Specify the starting container from Docker Hub

%files                                         # Copy files from the host system into the container
                                               # In this case,                                                                     

	/opt/ohpc/pub/JAGS/gnu8-4.3.0 /opt/ohpc/pub/JAGS/gnu8-4.3.0

%environment
	export PATH=/opt/ohpc/pub/JAGS/gnu8-4.3.0/bin:$PATH
	export LD_LIBRARY_PATH=/opt/ohpc/pub/JAGS/gnu8-4.3.0/lib
	export INCLUDE=/opt/ohpc/pub/JAGS/gnu8-4.3.0/include
	export MANPATH=/opt/ohpc/pub/JAGS/gnu8-4.3.0/man

%post
	export PATH=/opt/ohpc/pub/JAGS/gnu8-4.3.0/bin:$PATH
	export LD_LIBRARY_PATH=/opt/ohpc/pub/JAGS/gnu8-4.3.0/lib
	export INCLUDE=/opt/ohpc/pub/JAGS/gnu8-4.3.0/include
	export MANPATH=/opt/ohpc/pub/JAGS/gnu8-4.3.0/man

	R --vanilla -e 'install.packages("rjags", repos="http://cran.us.r-project.org")'
	R --vanilla -e 'install.packages("landscapemetrics", repos="http://cran.us.r-project.org")'
	R --vanilla -e 'install.packages("dismo", repos="http://cran.us.r-project.org")'
	R --vanilla -e 'install.packages("randomForest", repos="http://cran.us.r-project.org")'
	R --vanilla -e 'install.packages("kernlab", repos="http://cran.us.r-project.org")'
	R --vanilla -e 'install.packages("tidyverse", repos="http://cran.us.r-project.org")'
	R --vanilla -e 'install.packages("jagsUI", repos="http://cran.us.r-project.org")'

# %runscript
#	R $@

%help

	This container is the R rocker/geospatial container with the addition of the 
	landscapemetrics R package. 

Introduction - Nvidia

Nvidia provides many containers that are optimized for use on their GPUs. These containers are very handy to have, but sometimes they don't include everything. For instance, there is a set of Tensorflow containers but the containers don't include the matplotlib package. This page describes how to build a new container using the original one that Nvidia provides as a starting point. In the example below, we will start with a TensorFlow 2 container that is provided by Nvidia and we will create a new Singularity Container that has everything from the original container but also includes the matplotlib package.

Nvidia GPU-optimized Container Catalog

To see the list of containers that Nvidia provides, go to https://catalog.ngc.nvidia.com/. From here, you can search for containers. For instance, searching for Tensorflow shows:

customizing-nvidia-containers1

Clicking on the TensorFlow container will give a bunch of information on the container, including how to use it:

customizing-nvidia-containers2

The path to the container can be found by clicking on the "Copy Image Path" button and choosing one of the versions of the container. This will be used to create a Definition file. As an example, the most recent one for TensorFlow 2 points to: nvcr.io/nvidia/tensorflow:22.07-tf2-py3 and this will be put into the new Definition file on the second line that starts with "From: "

Creating the Definition File with the Nvidia Container Path

Open a text editor and create a new file called "new_container.def" with the following contents, where the second line includes the path from the NGC site in the previous step:

Bootstrap: docker
From: nvcr.io/nvidia/tensorflow:22.07-tf2-py3

%post

    DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
        coreutils  

    pip install matplotlib
    echo "Done"

In the above example, matplotlib what is being added to the container. You can add other things to the container using the apt command, other pip commands, and other ways too.

Save the file and then run the following commands to actually create the new container. The end result will be a file with a ".simg" extension that is placed in your home directory. The process will start by SSHing to a system that has Nvidia GPUs.

ssh node-g103
module load apptainer
export TMPDIR=$XDG_RUNTIME_DIR 
apptainer build --fakeroot $HOME/new_container.sif new_container.def

The third line sets up the TMPDIR variable for the apptainer command to use. This is done for a couple of reasons but the biggest benefit is that the XDG_RUNTIME_DIR variable points to a tmpfs volume that gets created when you ssh to the node-g103 system. This tmpfs volume is located in RAM so it is very fast. So, by setting TMPDIR to this directory in RAM, it will speed up the process of creating the container tremendously.

The singularity command runs the build subcommand to build the .sif file and it uses the .def file to know how to build it. The --fakeroot parameter is needed in order for regular, not-root, accounts to be able to build the container.

Once the container has been created, you can use the container in a Slurm job with the following in your job submission script, where "my_python_script.py" is the name of your python script:

module load apptainer
apptainer run --nv new_container.sif python my_python_script.py