# A short introduction to containerized software

After spending using nf-core pipelines to answer bioinformatic questions, we will focus on the processes that lie behind these pipelines now.

Today, we will focus on containerization, namely via Docker. 



1. Check if Docker is installed.

In [11]:
!docker info

Client:
 Version:    27.1.1
 Context:    desktop-linux
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.16.1-desktop.1
    Path:     /Users/weronikajaskowiak/.docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.29.1-desktop.1
    Path:     /Users/weronikajaskowiak/.docker/cli-plugins/docker-compose
  debug: Get a shell into any image or container (Docker Inc.)
    Version:  0.0.34
    Path:     /Users/weronikajaskowiak/.docker/cli-plugins/docker-debug
  desktop: Docker Desktop commands (Alpha) (Docker Inc.)
    Version:  v0.0.14
    Path:     /Users/weronikajaskowiak/.docker/cli-plugins/docker-desktop
  dev: Docker Dev Environments (Docker Inc.)
    Version:  v0.1.2
    Path:     /Users/weronikajaskowiak/.docker/cli-plugins/docker-dev
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.25
    Path:     /Users/weronikajaskowiak/.docker/cli-plugins/docker-extension
  feedba

### What is a container?

A container is a standardized software unit that bundles code along with all its dependencies, ensuring that the application runs efficiently and consistently across different computing environments.

### Why do we use containers?
Containers are a solution to the problem of how to get software to run reliably when moved from one computing environment to another. They are: self-contained, isolated, independent, portable and have small size (megabytes). 

### What is a docker image?
A container image is a standardized bundle that contains all the necessary files, binaries, libraries, and configurations needed to run a container. 

### Let's run our first docker image:

### Login to docker

In [12]:
!docker login
# This you need to do on the command line directly

Authenticating with existing credentials...
Login Succeeded


### Run your first docker container

In [13]:
!docker run hello-world


Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/



### Find the container ID

a9e3005d3e01530b4ad2bb2c810dd088a6cf1d2b1ad6667e763760b16e630323

### Delete the container again, give prove its deleted

In [14]:
!docker rm a9e3005d3e01530b4ad2bb2c810dd088a6cf1d2b1ad6667e763760b16e630323

a9e3005d3e01530b4ad2bb2c810dd088a6cf1d2b1ad6667e763760b16e630323


In [15]:
!docker ps -a
#it's not listed in container ids 

CONTAINER ID   IMAGE                                                                        COMMAND                  CREATED       STATUS                    PORTS     NAMES
15243ee38c74   my_cowsay                                                                    "cowsay SLAAAY"          4 hours ago   Exited (0) 4 hours ago              objective_proskuriakova
3e4dfc95fbff   community.wave.seqera.io/library/fastqc:0.12.1--5cfd0f3cb6760c42             "/usr/local/bin/_ent…"   5 hours ago   Exited (0) 5 hours ago              cool_williamson
f87af3ad5f12   community.wave.seqera.io/library/fastqc:0.12.1--5cfd0f3cb6760c42             "/usr/local/bin/_ent…"   5 hours ago   Exited (0) 5 hours ago              naughty_matsumoto
db38f7c3c36e   community.wave.seqera.io/library/fastqc:0.12.1--5cfd0f3cb6760c42             "/usr/local/bin/_ent…"   5 hours ago   Exited (0) 5 hours ago              cranky_bhaskara
d733acea2418   community.wave.seqera.io/library/fastqc:0.12.1--5cfd0f3cb6760c42 

### FASTQC is a very useful tool as you've learned last week. Let's try and run it from command line

Link to the software: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Please describe the steps you took to download and run the software for the example fastq file from last week below:

1) First, I installed FastQC in the Conda environment using the command: conda install bioconda::fastqc.
2) I checked the installation by running: fastqc -help.
3) To run the tool, I used the following command: fastqc SRX19144488_SRR23195511_1.fastq.

### Very well, now let's try to make use of its docker container

1. create a container holding fastqc using seqera containers (https://seqera.io/containers/)
2. use the container to generate a fastqc html of the example fastq file

In [16]:
!docker pull community.wave.seqera.io/library/fastqc:0.12.1--5cfd0f3cb6760c42

0.12.1--5cfd0f3cb6760c42: Pulling from library/fastqc
Digest: sha256:0c524d3abe2642c09c5852299bd79bf78ba0ee2ef040473324caab0826f64d44
Status: Image is up to date for community.wave.seqera.io/library/fastqc:0.12.1--5cfd0f3cb6760c42
community.wave.seqera.io/library/fastqc:0.12.1--5cfd0f3cb6760c42
[1m
What's next:[0m
    View a summary of image vulnerabilities and recommendations → [36mdocker scout quickview community.wave.seqera.io/library/fastqc:0.12.1--5cfd0f3cb6760c42[0m


In [32]:
!docker run -v /Users/weronikajaskowiak/Desktop/practical_course_2/day_2/files/fastq:/data -v /Users/weronikajaskowiak/Desktop/fastqc-results:/output community.wave.seqera.io/library/fastqc:0.12.1--5cfd0f3cb6760c42 fastqc /data/SRX19144488_SRR23195511_1.fastq.gz --outdir /output


application/gzip
Started analysis of SRX19144488_SRR23195511_1.fastq.gz
Approx 5% complete for SRX19144488_SRR23195511_1.fastq.gz
Approx 10% complete for SRX19144488_SRR23195511_1.fastq.gz
Approx 15% complete for SRX19144488_SRR23195511_1.fastq.gz
Approx 20% complete for SRX19144488_SRR23195511_1.fastq.gz
Approx 25% complete for SRX19144488_SRR23195511_1.fastq.gz
Approx 30% complete for SRX19144488_SRR23195511_1.fastq.gz
Approx 35% complete for SRX19144488_SRR23195511_1.fastq.gz
Approx 40% complete for SRX19144488_SRR23195511_1.fastq.gz
Approx 45% complete for SRX19144488_SRR23195511_1.fastq.gz
Approx 50% complete for SRX19144488_SRR23195511_1.fastq.gz
Approx 55% complete for SRX19144488_SRR23195511_1.fastq.gz
Approx 60% complete for SRX19144488_SRR23195511_1.fastq.gz
Approx 65% complete for SRX19144488_SRR23195511_1.fastq.gz
Approx 70% complete for SRX19144488_SRR23195511_1.fastq.gz
Approx 75% complete for SRX19144488_SRR23195511_1.fastq.gz
Approx 80% complete for SRX

### Now that you know how to use a docker container, which approach between running everything manually and using docker was easier and which approach will be easier in the future?
Using a Docker container will probably be an easier approach in the future, as we can simply download the container and perform the analysis without needing to create an entire environment for it.

### What would you say, which approach is more reproducible?
I would say that using a container is a more reproducible approach because of its stable releases and included software packages.

### Compare the file to last weeks fastqc results, are they identical?
SNI_oxy_3_1.fastqc and SRX19144486_SRR23195516_1.fastq are identical. 
### Is the fastqc version identical?
 Yes, it's identical FastQC version 0.12.1.

## Dockerfiles

We now used Docker containers and images directly to boost our research. 

Let's create our own toy Dockerfile including the "cowsay" tool (https://en.wikipedia.org/wiki/Cowsay)

Hints:
1. Docker is Linux, so you need to know the apt-get command to install "cowsay"

In [None]:
# open the file "my_dockerfile" in a text editor

### Explain the RUN and ENV lines you added to the file
1) RUN 
apt-get update: updates the list of available packages and their versions,
apt-get install -y curl cowsay: installs the curl and cowsay packages. The -y option automatically confirms prompts.
apt-get clean: removes the local repository of retrieved package files. This helps reduce the image size.
2) ENV 
The directory /usr/games is a standard directory in Unix-like operating systems where game binaries and related executable files are typically installed. By default, these games can be run from the command line without needing to specify their full paths. However, other directories could be used here as well :). 

In [21]:
#I changed name of my_dockerfile to my_cowsay first 
!docker build -t my_cowsay -f my_cowsay .

[1A[1B[0G[?25l[+] Building 0.0s (0/0)  docker:desktop-linux
[?25h[1A[0G[?25l[+] Building 0.0s (0/0)  docker:desktop-linux
[?25h[1A[0G[?25l[+] Building 0.0s (0/0)  docker:desktop-linux
[?25h[1A[0G[?25l[+] Building 0.0s (0/1)                                    docker:desktop-linux
[?25h[1A[0G[?25l[+] Building 0.2s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition from my_cowsay                        0.0s
[0m[34m => => transferring dockerfile: 1.46kB                                     0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.2s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.4s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition from my_cowsay                        0.0s
[0m[34m => => transferring dockerfile: 1.46kB                                     0.0s
[0m => [internal] load metadata for docker.io/li

In [22]:
# make sure that the image has been built
!docker images

REPOSITORY                                                 TAG                        IMAGE ID       CREATED         SIZE
salmon_docker                                              latest                     360762d63eaa   3 hours ago     594MB
my_cowsay                                                  latest                     2d6f822a3be9   5 hours ago     151MB
community.wave.seqera.io/library/salmon                    1.10.3--482593b6cd04c9b7   1274f935b72f   4 months ago    375MB
community.wave.seqera.io/library/fastqc                    0.12.1--5cfd0f3cb6760c42   1df9a8700d59   4 months ago    908MB
quay.io/biocontainers/r-shinyngs                           1.8.8--r43hdfd78af_0       3ae022b36dce   5 months ago    1.34GB
quay.io/biocontainers/atlas-gene-annotation-manipulation   1.1.1--hdfd78af_0          db9ec43ce403   5 months ago    1.25GB
combinelab/salmon                                          1.10.3                     3291c6f6c42d   6 months ago    101MB
hello-w

In [34]:
# run the docker file 
!docker run my_cowsay cowsay "SLAAAAY"

 _________
< SLAAAAY >
 ---------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||


## Let's do some bioinformatics with the docker file and create a new docker file that holds the salmon tool used in rnaseq

To do so, use "curl" in your new dockerfile to get salmon from https://github.com/COMBINE-lab/salmon/releases/download/v1.5.2/salmon-1.5.2_linux_x86_64.tar.gz

In [24]:
# use the file "salmon_docker" in this directory to build a new docker image
!docker build -t salmon_docker -f salmon_docker . 

[1A[1B[0G[?25l[+] Building 0.0s (0/0)  docker:desktop-linux
[?25h[1A[0G[?25l[+] Building 0.0s (0/1)                                    docker:desktop-linux
[?25h[1A[0G[?25l[+] Building 0.2s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition from salmon_docker                    0.0s
[0m[34m => => transferring dockerfile: 1.25kB                                     0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.2s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition from salmon_docker                    0.0s
[0m[34m => => transferring dockerfile: 1.25kB                                     0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.3s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.5s (1/2)                                    docker:

In [25]:
!docker images 

REPOSITORY                                                 TAG                        IMAGE ID       CREATED         SIZE
salmon_docker                                              latest                     360762d63eaa   3 hours ago     594MB
my_cowsay                                                  latest                     2d6f822a3be9   5 hours ago     151MB
community.wave.seqera.io/library/salmon                    1.10.3--482593b6cd04c9b7   1274f935b72f   4 months ago    375MB
community.wave.seqera.io/library/fastqc                    0.12.1--5cfd0f3cb6760c42   1df9a8700d59   4 months ago    908MB
quay.io/biocontainers/r-shinyngs                           1.8.8--r43hdfd78af_0       3ae022b36dce   5 months ago    1.34GB
quay.io/biocontainers/atlas-gene-annotation-manipulation   1.1.1--hdfd78af_0          db9ec43ce403   5 months ago    1.25GB
combinelab/salmon                                          1.10.3                     3291c6f6c42d   6 months ago    101MB
hello-w

In [35]:
# run the docker image to give out the version of salmon
!docker run salmon_docker salmon --version

salmon 1.5.2


## Do you think bioinformaticians have to create a docker image every time they want to run a tool?

Bioinformaticians can leverage existing images for commonly used tools, use containerization for reproducibility, and only create custom images when their workflows require unique configurations or dependencies. 

Find the salmon docker image online and run it on your computer.

https://hub.docker.com/r/combinelab/salmon/tags 


In [27]:
!docker pull combinelab/salmon:1.10.3

1.10.3: Pulling from combinelab/salmon
Digest: sha256:cefd8bb0b2ed9b07f22b5f0fc317ddda540e5b0dc00810d1ff0d92fee5d80370
Status: Image is up to date for combinelab/salmon:1.10.3
docker.io/combinelab/salmon:1.10.3
[1m
What's next:[0m
    View a summary of image vulnerabilities and recommendations → [36mdocker scout quickview combinelab/salmon:1.10.3[0m


In [36]:
!docker run combinelab/salmon:1.10.3 salmon --version

salmon 1.10.3


This is the latest version. 

What is https://biocontainers.pro/ ?

BioContainers is a community-led initiative that offers the framework and essential guidelines for building, managing, and distributing bioinformatics packages (such as Conda) and containers (like Docker and Singularity). It leverages widely-used platforms including Conda, Docker, and Singularity.