# SoS Docker Guide

## What is docker and why it is helpful

This is a big question to answer but in essence you can think docker containers as virtual machines with applications but without the bulky OS part, or applications with stripped down OSes. Docker containers are much more lightweight than virtual machines because all docker containers share the same core OS and related containers (e.g. different applications derived from the same CentOS or Ubuntu OS) share the same base container. Please refer to the [docker website](https://www.docker.com/) for details about docker. I have found it helpful to watch a few youtube videos on docker.

The reason why docker is very helpful in building (bioinformatics) workflows are that 

1. Applications are encapsulated in docker containers so that they do not interfere with the underlying OS, and with other applications. For example, we can run a workflow with applications that based on different versions of Python2 and Python 3 without having to install them locally and calling the correct version of Python, because all applications use the specific version of Python and required libraries and tools inside their own containers.

2. Workflows will be more stable and reproducible because unlike, for example, a local installation of Python that can be affected by other software and upgrades of python, Docker containers are stable and will not change.

3. The same docker containers can be executed on different OS (e.g. various version of Linux, MacOSX etc) so your workflow built on a Mac OS workstation can be executed on a cluster environment.

There are of course some complexity in the use of docker but SoS has made it extremely easy to use docker in your workflows. 

## Installing and configuring docker

Docker is relatively new and is evolving very fast. It is crucial for you to install the latest version from [docker website](https://www.docker.com/). This website provides very detailed step by step instruction and you should have no problem installing docker on your machine. 

After installation, you should be able to start a docker terminal and run command

```bash
$ docker run hello-world
```

as suggested by the documentation. Depending on the different versions of docker (e.g. docker under windows), docker might be run under a virtual machine. It is very important to understand that **the configuration (e.g. RAM, CPU) of docker machines are different from the host machines** so your docker machine might be restricuted to, for example, 1 CPU, 1G of RAM, which is insufficient for any serious work. You will most likely need to re-configure your docker virtual machine (e.g. from VirtualBox app locate a machine named `default`).

## Running a workflow with docker

Running a docker-based workflow is easy because SoS will automatically download docker images and execute scripts inside docker container. Anyway, before you start any workflow running docker, it is a good idea to check if your docker daemon is running by

In [1]:
!docker ps

CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES


Now, suppose you do not have ruby installed locally and would like to run a ruby script, you can execute it inside a `ruby` container. Here we set option to `-v2` to demonstrate the actual command executed by SoS.

In [2]:
%run -v2
ruby: docker_image='ruby'
    line1 = "Cats are smarter than dogs";
    line2 = "Dogs also like meat";

    if ( line1 =~ /Cats(.*)/ )
      puts "Line1 contains Cats"
    end
    if ( line2 =~ /Cats(.*)/ )
      puts "Line2 contains  Dogs"
    end

INFO: Running [32minteractive_0[0m: 
INFO: docker run --rm   -v /Users:/Users -v /tmp:/tmp -v /Users/bpeng1/SOS/doc/pending/tmp8k4kemrk/docker_run_19836.rb:/var/lib/sos/docker_run_19836.rb    -t -P -w=/Users/bpeng1/SOS/doc/pending     ruby ruby /var/lib/sos/docker_run_19836.rb


## Building a docker image

Building a docker image is usually done outside of SoS if you are maintaining a collection of docker containers to be shared by your workflows, your groups, or everyone. However, if you need to create a docker image on-the-fly or would like to embed the Dockerfile inside a SoS script, you can use the `docker_build` action to build a docker container.

For example, you can build a container for MISO as follows:

```
[miso_build]
# building miso from a Dockerfile
docker_build: tag='mdabioinfo/miso:latest'

    ############################################################
    # Dockerfile to build MISO container images
    # Based on Anaconda python
    ############################################################

    # Set the base image to anaconda Python 2.7 (miso does not support python 3)
    FROM continuumio/anaconda

    # File Author / Maintainer
    MAINTAINER Bo Peng <bpeng@mdanderson.org>

    # Update the repository sources list
    RUN apt-get update

    # Install compiler and python stuff, samtools and git
    RUN apt-get install --yes \
     build-essential \
     gcc-multilib \
     gfortran \ 
     apt-utils \
     libblas3 \ 
     liblapack3 \
     libc6 \
     cython \ 
     samtools \
     libbam-dev \
     bedtools \
     wget \
     zlib1g-dev \ 
     tar \
     gzip

    WORKDIR /usr/local
    RUN pip install misopy
```

Command

```
sos run script miso_build
```

would build a docker image `mdabioinfo/miso:latest` that can be used by other SoS steps.

## Writing a workflow with docker support

Writing a workflow with docker support is a bit more complicated because you will need to understand a few concepts of docker, so reading through the [docker run manual](https://docs.docker.com/engine/reference/run/) should be helpful. The most important concept is **Volumes**, whch is how the host directories are mounted to a docker container so that the command executed inside the container can access (and change) files on the host machine. SoS helps the use of docker by

* Automatically mounts `/tmp` to `/tmp` 
* Automatically mounts `/Users` to `/Users` under MacOS X
* Automatically mounts user script inside docker and execute it as `/var/lib/sos/xxxxx`

so that step `input` and `output` are almost always identical inside and outside of docker. 

To use existing public docker container, you will need to specify its tag using option `docker_image`. For example, to use `compbio/ngseasy-fastqc` container to run `fastqc`, instead of installing `fastqc` locally, you can do

```
[MISO_1]
run:     docker_image='compbio/ngseasy-fastqc:1.0-r001'
    fastqc ${input} -o /tmp
```

(More to follow)

## Limitations

* Virtual Box virtual machine does not support symbolic link so running `ln -s` inside a docker machine under Mac will cause a strange error message `Read-only file system`.