# A short introduction to containerized software

After spending time using nf-core pipelines to answer bioinformatic questions, we will focus on the processes that lie behind these pipelines now.

Today, we will focus on containerization, namely via Docker. 



1. Check if Docker is installed.

In [1]:
!docker info

Client:
 Version:    28.4.0
 Context:    desktop-linux
 Debug Mode: false
 Plugins:
  ai: Docker AI Agent - Ask Gordon (Docker Inc.)
    Version:  v1.9.11
    Path:     /Users/al/.docker/cli-plugins/docker-ai
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.28.0-desktop.1
    Path:     /Users/al/.docker/cli-plugins/docker-buildx
  cloud: Docker Cloud (Docker Inc.)
    Version:  v0.4.29
    Path:     /Users/al/.docker/cli-plugins/docker-cloud
  compose: Docker Compose (Docker Inc.)
    Version:  v2.39.4-desktop.1
    Path:     /Users/al/.docker/cli-plugins/docker-compose
  debug: Get a shell into any image or container (Docker Inc.)
    Version:  0.0.42
    Path:     /Users/al/.docker/cli-plugins/docker-debug
  desktop: Docker Desktop commands (Docker Inc.)
    Version:  v0.2.0
    Path:     /Users/al/.docker/cli-plugins/docker-desktop
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.31
    Path:     /Users/al/.docker/cli-plugins/docker-extension
  init: Cre

### What is a container?
Isolation ofna runtime enviroment containing all tools and their dependencies and libraries with fixed versions.

### Why do we use containers?
Ensures mainly reproduceablilty and standardisation, but could be risky to execute foreign workflows safely because of risk of crashing the own OS (works on host OS kernel and no virtual one).

### What is a docker image?
Its a portable snapshot of all dependencies, used to create a container.

### Let's run our first docker image:

### Login to docker

In [2]:
# This you need to do on the command line directly

### Run your first docker container

In [3]:
!docker run hello-world


Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (arm64v8)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/



### Find the container ID

In [4]:
!docker ps -l -q

10443dd81a15


### Delete the container again, give prove its deleted

In [5]:
CONTAINER_INFO = !docker ps -l
print("\n".join(CONTAINER_INFO)) 

last_line = CONTAINER_INFO[-1]
columns = last_line.split()
container_id = columns[0]  # CONTAINER ID
image_name = columns[1]    # IMAGE
command = columns[2]       # COMMAND

print("Container ID:", container_id)
print("Image:", image_name)
print("Command:", command)

if image_name == "hello-world":
    print("Deleting container...")
    !docker rm {container_id}
    print("Deleted.")
else:
    print("Container is not hello-world, skipping deletion.")

CONTAINER ID   IMAGE         COMMAND    CREATED        STATUS                              PORTS     NAMES
10443dd81a15   hello-world   "/hello"   1 second ago   Exited (0) Less than a second ago             compassionate_haslett
Container ID: 10443dd81a15
Image: hello-world
Command: "/hello"
Deleting container...
10443dd81a15
Deleted.


### FASTQC is a very useful tool as you've learned last week. Let's try and run it from command line

Link to the software: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Please describe the steps you took to download and run the software for the example fastq file from last week below:

1. Clicked on the download section and select the MacOS dmg file
2. To run it i selected file and opened a fastq fron the nf-core/fetchngs piupeline we ran yesterday

### Very well, now let's try to make use of its docker container

1. create a container holding fastqc using seqera containers (https://seqera.io/containers/)
2. use the container to generate a fastqc html of the example fastq file

In [6]:
# pull the container
!docker pull community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29
#!docker container create community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29
!docker ps -a

0.12.1--af7a5314d5015c29: Pulling from library/fastqc
Digest: sha256:b7f6caf359264cf86da901b0aa5f66735a6506fcfbf103c66db6987253ad44c1
Status: Image is up to date for community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29
community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29
[1m
What's next:[0m
    View a summary of image vulnerabilities and recommendations → [36mdocker scout quickview community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29[0m
CONTAINER ID   IMAGE                                                                         COMMAND                  CREATED          STATUS                      PORTS     NAMES
033f499413f5   community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29              "/usr/local/bin/_ent…"   38 minutes ago   Exited (0) 31 minutes ago             fervent_hugle
ec2a90ca35c7   community.wave.seqera.io/library/cutadapt_trim-galore_pigz:a98edd405b34582d   "/usr/local/bin/_ent…"   18 hours ago     Exited (137) 18 ho

In [7]:
# run the container and save the results to a new "fastqc_results" directory
container_id = !docker ps -a -q -f ancestor=community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29
print("Container ID to start:", container_id[0])

!docker run --rm \
  -v $(pwd)/fastq:/data \
  -v $(pwd)/fastqc_results:/out \
  community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29 \
  fastqc -o /out /data/SRX19144486_SRR23195516_1.fastq.gz


Container ID to start: 033f499413f5
application/gzip
Started analysis of SRX19144486_SRR23195516_1.fastq.gz
Approx 5% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 10% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 15% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 20% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 25% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 30% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 35% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 40% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 45% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 50% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 55% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 60% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 65% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 70% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 75% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 8

### Now that you know how to use a docker container, which approach was easier and which approach will be easier in the future?
The docker approach is much easier if you are used to it, because you dont need to think about tool specific problems.

### What would you say, which approach is more reproducible?
Also the docker approach, because it has the fixed/frozen image with versions and dependencies.

### Compare the file to last weeks fastqc results, are they identical? Is the fastqc version identical?
The is not really a chance to compre without two MulitQC reports.

## Dockerfiles

We now used Docker containers and images directly to boost our research. 

Let's create our own toy Dockerfile including the "cowsay" tool (https://en.wikipedia.org/wiki/Cowsay)

Hints:
1. Docker is Linux, so you need to know the apt-get command to install "cowsay"

In [20]:
# open the file "my_dockerfile" in a text editor

dockerfile_path = "my_dockerfile"

dockerfile_content = """\
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y cowsay
ENV PATH="/usr/games:${PATH}"
ENTRYPOINT ["cowsay"]
"""

with open(dockerfile_path, "w") as f:
    f.write(dockerfile_content)

print(f"Dockerfile '{dockerfile_path}' erfolgreich erstellt!")


Dockerfile 'my_dockerfile' erfolgreich erstellt!


### Explain the RUN and ENV lines you added to the file

1. ENV PATH="/usr/games:${PATH}" -> Set path to enviroment
2. RUN commands
  - apt-get update -> updates package lists
  - apt-get install -y cowsay -> installs the tool cowsay automatically

In [21]:
# build the docker image
!docker build -t mycowsay -f my_dockerfile .

[1A[1B[0G[?25l
[?25h[1A[0G[?25l[+] Building 0.0s (0/1)                                    docker:desktop-linux
[?25h[1A[0G[?25l[+] Building 0.2s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition from my_dockerfile                    0.0s
[0m[34m => => transferring dockerfile: 158B                                       0.0s
[0m => [internal] load metadata for docker.io/library/ubuntu:22.04            0.2s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition from my_dockerfile                    0.0s
[0m[34m => => transferring dockerfile: 158B                                       0.0s
[0m => [internal] load metadata for docker.io/library/ubuntu:22.04            0.3s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.5s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition

In [22]:
# make sure that the image has been built
!docker images

REPOSITORY                                                   TAG                        IMAGE ID       CREATED         SIZE
mycowsay                                                     latest                     a16129837f9c   8 seconds ago   182MB
<none>                                                       <none>                     e4df7d6f49aa   3 minutes ago   121MB
my_salmon_image                                              latest                     466420c7d697   6 minutes ago   295MB
my_cowsay_image                                              latest                     a093b575bc43   7 minutes ago   117MB
hello-world                                                  latest                     ca9905c726f0   7 weeks ago     5.2kB
community.wave.seqera.io/library/cutadapt_trim-galore_pigz   a98edd405b34582d           a26fa7f31e84   9 months ago    1.17GB
community.wave.seqera.io/library/fastqc                      0.12.1--af7a5314d5015c29   57ed62363d5f   11 months ago   922MB


In [24]:
# run the docker file 
!docker run mycowsay "Wo ist Frankie's Hund?"

 ________________________
< Wo ist Frankie's Hund? >
 ------------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||


## Let's do some bioinformatics with the docker file and create a new docker file that holds the salmon tool used in rnaseq

To do so, use "curl" in your new dockerfile to get salmon from https://github.com/COMBINE-lab/salmon/releases/download/v1.5.2/salmon-1.5.2_linux_x86_64.tar.gz

In [1]:
# use the file "salmon_docker" in this directory to build a new docker image

dockerfile_path = "salmon_docker"

dockerfile_content = """\
# Start from a slim Debian base
FROM debian:bullseye-slim

# Install dependencies
RUN apt-get update && apt-get install -y \
    curl \
    tar \
    gzip \
    && rm -rf /var/lib/apt/lists/*

# Download and install Salmon
RUN curl -L https://github.com/COMBINE-lab/salmon/releases/download/v1.5.2/salmon-1.5.2_linux_x86_64.tar.gz \
    | tar -xz -C /opt \
    && ln -s /opt/salmon-1.5.2_linux_x86_64/bin/salmon /usr/bin/salmon

# Set PATH (optional since we symlinked salmon)
ENV PATH="/usr/bin:${PATH}"

# Default command: show version
CMD ["salmon", "--version"]

"""

with open(dockerfile_path, "w") as f:
    f.write(dockerfile_content)

print(f"Dockerfile '{dockerfile_path}' erfolgreich erstellt!")

Dockerfile 'salmon_docker' erfolgreich erstellt!


In [7]:
# build the image
!docker buildx build --platform linux/amd64 -t mysalmon -f salmon_docker .

[1A[1B[0G[?25l
[?25h[1A[0G[?25l[+] Building 0.0s (0/1)                                    docker:desktop-linux
[?25h[1A[0G[?25l[+] Building 0.2s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition from salmon_docker                    0.0s
[0m[34m => => transferring dockerfile: 597B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.2s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition from salmon_docker                    0.0s
[0m[34m => => transferring dockerfile: 597B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.3s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.5s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition

In [8]:
# run the docker image to give out the version of salmon
!docker run --platform linux/amd64 mysalmon

salmon 1.5.2


## Do you think bioinformaticians have to create a docker image every time they want to run a tool?

Find the salmon docker image online and run it on your computer.

What is https://biocontainers.pro/ ?

Bioinformaticians do not need to create a Docker image every time they want to run a tool, because most software is already available as pre-built containers on platforms like Biocontainers (can just be downloaded), which provides standardized and versioned images for bioinformatics applications. This allows reusing of existing images instead of maintaining own.

## Are there other ways to create Docker (or Apptainer) images?

What is https://seqera.io/containers/ ?

Seqera Containers is a service that automatically builds and provisions containers on demand, often based on Conda or Bioconda package specifications and it integrates with nextflow to dynamically provide the right container during workflow execution. This makes container management easier, more reproducible and eliminates the need to manually write Dockerfiles for every tool.