# A short introduction to containerized software

After spending time using nf-core pipelines to answer bioinformatic questions, we will focus on the processes that lie behind these pipelines now.

Today, we will focus on containerization, namely via Docker. 



1. Check if Docker is installed.

In [1]:
!docker info

Client:
 Version:    28.4.0
 Context:    desktop-linux
 Debug Mode: false
 Plugins:
  ai: Docker AI Agent - Ask Gordon (Docker Inc.)
    Version:  v1.9.11
    Path:     C:\Program Files\Docker\cli-plugins\docker-ai.exe
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.28.0-desktop.1
    Path:     C:\Program Files\Docker\cli-plugins\docker-buildx.exe
  cloud: Docker Cloud (Docker Inc.)
    Version:  v0.4.27
    Path:     C:\Program Files\Docker\cli-plugins\docker-cloud.exe
  compose: Docker Compose (Docker Inc.)
    Version:  v2.39.2-desktop.1
    Path:     C:\Program Files\Docker\cli-plugins\docker-compose.exe
  debug: Get a shell into any image or container (Docker Inc.)
    Version:  0.0.42
    Path:     C:\Program Files\Docker\cli-plugins\docker-debug.exe
  desktop: Docker Desktop commands (Docker Inc.)
    Version:  v0.2.0
    Path:     C:\Program Files\Docker\cli-plugins\docker-desktop.exe
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.31
    Path:   

### What is a container?

- A container is an encapsulated environment in which applications can run
- It containes code as well as dependencies

### Why do we use containers?

- containers include fixed software, dependencies, versions, etc.
- by this, different users have the exact same environment available and can produce the same results
- this way reproducability of pipelines can be ensured

### What is a docker image?

- a docker image contains all information necessary to build a docker container
- it contains code, information on dependencies, tools, usages, ...

### Let's run our first docker image:

### Login to docker

In [None]:
# This you need to do on the command line directly

### Run your first docker container

In [2]:
!docker run hello-world


Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/



### Find the container ID

In [None]:
!docker container ls --all

CONTAINER ID   IMAGE         COMMAND    CREATED         STATUS                     PORTS     NAMES
8cf22baa8339   hello-world   "/hello"   2 minutes ago   Exited (0) 2 minutes ago             youthful_burnell
f3f167674866   hello-world   "/hello"   7 days ago      Exited (0) 7 days ago                stoic_bardeen


### Delete the container again, give prove its deleted

In [5]:
!docker rm 8cf22baa8339   

8cf22baa8339


In [6]:
!docker container ls --all

CONTAINER ID   IMAGE         COMMAND    CREATED      STATUS                  PORTS     NAMES
f3f167674866   hello-world   "/hello"   7 days ago   Exited (0) 7 days ago             stoic_bardeen


### FASTQC is a very useful tool as you've learned last week. Let's try and run it from command line

Link to the software: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Please describe the steps you took to download and run the software for the example fastq file from last week below:

1. install with: sudo apt install fastqc 
2. run: fastqc <file>

In [None]:
!fastqc day02/SRFetch_results/fastq/SRX19144486_SRR23195516_1.fastq.gz

#this produces a fastqc report as zip and html in the folder of the input file

### Very well, now let's try to make use of its docker container

1. create a container holding fastqc using seqera containers (https://seqera.io/containers/)
2. use the container to generate a fastqc html of the example fastq file

In [None]:
# creating a container using seqera gives you the following image name:
# community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29
# pull the container
!docker pull community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29

0.12.1--af7a5314d5015c29: Pulling from library/fastqc
10b8c00c10a5: Pulling fs layer
030a47592a0a: Pulling fs layer
dec6b097362e: Pulling fs layer
17dc7ea432cc: Pulling fs layer
0f93acc3b8ff: Pulling fs layer
92dc97a3ef36: Pulling fs layer
dafa2b0c44d2: Pulling fs layer
4f4fb700ef54: Pulling fs layer
f3c4c6865366: Pulling fs layer
403f74b0f85e: Pulling fs layer
bb36d6c3110d: Pulling fs layer
0ea1a16bbe82: Pulling fs layer
4f4fb700ef54: Pulling fs layer
f88da01cff0b: Pulling fs layer
92dc97a3ef36: Download complete
0ea1a16bbe82: Download complete
dec6b097362e: Download complete
17dc7ea432cc: Download complete
403f74b0f85e: Download complete
10b8c00c10a5: Download complete
4f4fb700ef54: Download complete
030a47592a0a: Download complete
0f93acc3b8ff: Download complete
bb36d6c3110d: Download complete
f88da01cff0b: Download complete
dafa2b0c44d2: Download complete
dec6b097362e: Pull complete
dafa2b0c44d2: Pull complete
f88da01cff0b: Pull complete
92dc97a3ef36: Pull complete
403f74b0f85e: Pu

In [9]:
!docker images

REPOSITORY                                                 TAG                        IMAGE ID       CREATED         SIZE
hello-world                                                latest                     54e66cc1dd1f   7 weeks ago     20.3kB
community.wave.seqera.io/library/fastqc                    0.12.1--af7a5314d5015c29   b7f6caf35926   11 months ago   1.37GB
quay.io/biocontainers/fq                                   0.12.0--h9ee0642_0         74b59572f1d0   14 months ago   20MB
quay.io/biocontainers/r-shinyngs                           1.8.8--r43hdfd78af_0       e0de72408557   17 months ago   1.99GB
quay.io/biocontainers/atlas-gene-annotation-manipulation   1.1.1--hdfd78af_0          099d0e113ec8   18 months ago   1.82GB
quay.io/nf-core/ubuntu                                     20.04                      59e9d08d8dc1   2 years ago     110MB
quay.io/biocontainers/pandas                               1.5.2                      cbb54fcf8730   2 years ago     493MB
quay.io/biocon

In [4]:
# run the container and save the results to a new "fastqc_results" directory

!docker run -v ../../notebooks:/data community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29 fastqc /data/day_03_part2/SRX19144486_SRR23195516_1.fastq.gz -o /data/day_03_part2/docker_out_fastqc

application/gzip
Analysis complete for SRX19144486_SRR23195516_1.fastq.gz


Started analysis of SRX19144486_SRR23195516_1.fastq.gz
Approx 5% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 10% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 15% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 20% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 25% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 30% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 35% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 40% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 45% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 50% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 55% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 60% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 65% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 70% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 75% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 80% complete for SRX19144486_SRR23195516_1.fastq.gz
Ap

### Now that you know how to use a docker container, which approach was easier and which approach will be easier in the future?

- Using fastqc via linux was fairly easy. If I had to install it on windows with a .exe file, including making sure that java was properly installed, it would have required way more work. 
- Setting up how to run the docker container (and understanding the command options, mounting, etc) was quite some work
- In the future, both ways may become easier just because one has done it multiple times already


### What would you say, which approach is more reproducible?

- The docker approach is more reproducible, because the docker image contains the specific version of fastqc. Re-using the docker image should produce the same result again
- Downloading and installing fastqc again could result in getting a different version much easier, and therefore may lead to different results

### Compare the file to last weeks fastqc results, are they identical?

- The fastqc results of both approaches done today seem to be identical. 

### Is the fastqc version identical?

- both fastqc and the docker container are the version 0.12.1

## Dockerfiles

We now used Docker containers and images directly to boost our research. 

Let's create our own toy Dockerfile including the "cowsay" tool (https://en.wikipedia.org/wiki/Cowsay)

Hints:
1. Docker is Linux, so you need to know the apt-get command to install "cowsay"


In [None]:
# open the file "my_dockerfile" in a text editor

### Explain the RUN and ENV lines you added to the file

- RUN: this command installs cowsay via the linux command

- ENV: this sets the path variable to usr/games

In [7]:
# build the docker image
!docker build -f my_dockerfile -t cowsay-container .


#0 building with "desktop-linux" instance using docker driver

#1 [internal] load build definition from my_dockerfile
#1 transferring dockerfile: 829B 0.0s done
#1 DONE 0.0s

#2 [internal] load metadata for docker.io/library/debian:bullseye-slim
#2 ...

#3 [auth] library/debian:pull token for registry-1.docker.io
#3 DONE 0.0s

#2 [internal] load metadata for docker.io/library/debian:bullseye-slim
#2 DONE 1.4s

#4 [internal] load .dockerignore
#4 transferring context: 2B done
#4 DONE 0.0s

#5 [1/2] FROM docker.io/library/debian:bullseye-slim@sha256:f807f4b16002c623115b0247dca6a55711c6b1ae821dc64fb8a2339e4ce2115d
#5 resolve docker.io/library/debian:bullseye-slim@sha256:f807f4b16002c623115b0247dca6a55711c6b1ae821dc64fb8a2339e4ce2115d 0.0s done
#5 DONE 0.1s

#5 [1/2] FROM docker.io/library/debian:bullseye-slim@sha256:f807f4b16002c623115b0247dca6a55711c6b1ae821dc64fb8a2339e4ce2115d
#5 sha256:4eb1dd59a73886acc6a3cc9d4c8f8e66d1fd6ba6d6195b05ce21c22b0658aab8 0B / 30.26MB 0.2s
#5 sha256:4eb1dd5

In [None]:
# make sure that the image has been built
!docker images

# yes, the cowsay-container appears in the list

REPOSITORY                                                 TAG                        IMAGE ID       CREATED              SIZE
cowsay-container                                           latest                     de8f4026f7a0   About a minute ago   197MB
hello-world                                                latest                     54e66cc1dd1f   7 weeks ago          20.3kB
community.wave.seqera.io/library/fastqc                    0.12.1--af7a5314d5015c29   b7f6caf35926   11 months ago        1.37GB
quay.io/biocontainers/fq                                   0.12.0--h9ee0642_0         74b59572f1d0   14 months ago        20MB
quay.io/biocontainers/r-shinyngs                           1.8.8--r43hdfd78af_0       e0de72408557   17 months ago        1.99GB
quay.io/biocontainers/atlas-gene-annotation-manipulation   1.1.1--hdfd78af_0          099d0e113ec8   18 months ago        1.82GB
quay.io/nf-core/ubuntu                                     20.04                      59e9d08d8dc1   2

In [10]:
# run the docker file 
!docker run cowsay-container cowsay "Hello!"

 ________
< Hello! >
 --------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||


## Let's do some bioinformatics with the docker file and create a new docker file that holds the salmon tool used in rnaseq

To do so, use "curl" in your new dockerfile to get salmon from https://github.com/COMBINE-lab/salmon/releases/download/v1.5.2/salmon-1.5.2_linux_x86_64.tar.gz

In [34]:
# use the file "salmon_docker" in this directory to build a new docker image
# the file was completed accordingly

!docker build -t salmon-container -f salmon_docker .


#0 building with "desktop-linux" instance using docker driver

#1 [internal] load build definition from salmon_docker
#1 transferring dockerfile: 636B 0.0s done
#1 DONE 0.0s

#2 [internal] load metadata for docker.io/library/debian:bullseye-slim
#2 ...

#3 [auth] library/debian:pull token for registry-1.docker.io
#3 DONE 0.0s

#2 [internal] load metadata for docker.io/library/debian:bullseye-slim
#2 DONE 1.0s

#4 [internal] load .dockerignore
#4 transferring context: 2B done
#4 DONE 0.0s

#5 [1/3] FROM docker.io/library/debian:bullseye-slim@sha256:f807f4b16002c623115b0247dca6a55711c6b1ae821dc64fb8a2339e4ce2115d
#5 resolve docker.io/library/debian:bullseye-slim@sha256:f807f4b16002c623115b0247dca6a55711c6b1ae821dc64fb8a2339e4ce2115d 0.0s done
#5 DONE 0.0s

#6 [2/3] RUN apt-get update && apt-get install -y curl tar && rm -rf /var/lib/apt/lists/*
#6 CACHED

#7 [3/3] RUN curl -L -o /tmp/salmon.tar.gz     https://github.com/COMBINE-lab/salmon/releases/download/v1.5.2/salmon-1.5.2_linux_x86_6

In [None]:
# build the image
# done above; checking it is really there:
!docker images

# yes, salmon-container appears in the list

REPOSITORY                                                 TAG                        IMAGE ID       CREATED              SIZE
salmon-container                                           latest                     53578819dc3f   About a minute ago   407MB
cowsay-container                                           latest                     de8f4026f7a0   12 minutes ago       197MB
hello-world                                                latest                     54e66cc1dd1f   7 weeks ago          20.3kB
community.wave.seqera.io/library/fastqc                    0.12.1--af7a5314d5015c29   b7f6caf35926   11 months ago        1.37GB
quay.io/biocontainers/fq                                   0.12.0--h9ee0642_0         74b59572f1d0   14 months ago        20MB
quay.io/biocontainers/r-shinyngs                           1.8.8--r43hdfd78af_0       e0de72408557   17 months ago        1.99GB
quay.io/biocontainers/atlas-gene-annotation-manipulation   1.1.1--hdfd78af_0          099d0e113ec8   18

In [5]:
# run the docker image to give out the version of salmon
!docker run salmon-container salmon -v


salmon 1.5.2


## Do you think bioinformaticians have to create a docker image every time they want to run a tool?

- Docker images are created for a lot of tools already and can be found online

Find the salmon docker image online and run it on your computer.

What is https://biocontainers.pro/ ?

- BioContainers allows to create, manage, and distribute bioinformatics software packages and containers using tools like Conda, Docker, and Singularity
- it provides ready-to-use containers for bioinformatics
- it offers infrastructure, specifications, and examples to help develop, build, and deploy new tools
- it promotes reproducible pipelines and workflows by offering guidelines on container usage
- it helps coordinate collaboration between developers and bioinformaticians to ensure best practices in documentation and software development

Source: https://biocontainers-edu.readthedocs.io/en/latest/what_is_biocontainers.html 

In [3]:
#salmon docker image found online: 

!docker pull combinelab/salmon:latest

latest: Pulling from combinelab/salmon
Digest: sha256:cefd8bb0b2ed9b07f22b5f0fc317ddda540e5b0dc00810d1ff0d92fee5d80370
Status: Image is up to date for combinelab/salmon:latest
docker.io/combinelab/salmon:latest


In [4]:
!docker run combinelab/salmon:latest salmon -v


salmon 1.10.3


## Are there other ways to create Docker (or Apptainer) images?

- create it manually
- use sequera, biocontainers or similar platforms
- use images which are already online available

What is https://seqera.io/containers/ ?

- a large collection of ready-to-use, version-controlled containers 
- allows to select one or multiple packages and generates an image name that can be used to pull and run the specified container 