# Docker and Jupyter Notebooks for Reproducible Research

<b>Goal</b>: To understand what Docker is and how it can be used with Jupyter notebooks for reproducible research.

Docker is technological tool that creates high performance, shareable, reproducible computational environments. Jupyter notebooks are tools for interactive analysis that interweave prose, code, and results. Together, Docker and Jupyter notebooks are best-of-breed methods to create research that is reproducible.

In [1]:
#Imports for running this presentation live

from ipywidgets import interact, interactive
from IPython.display import clear_output, display, HTML

import numpy as np
from scipy import integrate

from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib.colors import cnames
from matplotlib import animation

%matplotlib inline

!sudo docker load -i busybox.dockerarchive.tar

ImportError: No module named 'ipywidgets'

# The Problem

Even though computers are often considered deterministic, **computational software is a rapidly evolving and changing landscape**. Libraries are constantly adding new features and fixing issues. 

<img src="Data/PythonStoryline.svg" width=700>

Image source: http://www.michaelogawa.com/research/storylines/

Even libraries with the strictest backwards-compatibility policies can **change in significant ways**.
<img src="Data/BackwardsCompatibility.png" width="600px">

Image source: http://www.bonkersworld.net/backwards-compatibility/

A **reproducible computational environment** has a *sufficiently consistent state for the computational task at hand*.

For example, this can consist of

- a similar CPU instruction set
- libraries and executables available with a specific version and configuration options
- a specific version of a given compiler
- a specific version of a libc implementation
- a specific version of the C++ standard library

## Close But Not Good Enough

### Source code

Does not include:

- Compiler
- Hardware it was built on
- How it is configured
- Package dependencies
- Run-time environment
- How to run it

<img src="Data/ConfusedCat.jpg" width="400px">

Image source: https://www.youtube.com/watch?v=g1LgVfV5_ZQ


### Package managers and distributions

- There is not a consensus on *the* package manager
- Packages become unsupported over time
- What to do if a required library is not packaged?

### Virtual machines (VMs)

- Inefficient utilization of computational resources

<img src="Data/CarJam.jpg">

Image source: http://time-az.com/images/2014/02/20140203carjam.jpg

# Enter Linux Containers

![Docker logo](Data/DockerLogo.png)

[Linux container systems](http://www.google.com/url?q=http%3A%2F%2Fwww.infoworld.com%2Farticle%2F2938638%2Fapplication-virtualization%2Fdocker-donates-its-container-specs-for-opc-open-standard.html&sa=D&sntz=1&usg=AFQjCNGrI-KxvoAN_waSazod5U1sPo0sVw) , like Docker, are new type of tool to easily build, ship, and run reproducible, binary applications.  

It is "good enough" for a reproducible computational environment.

In this talk, we will introduce Docker from the perspective a scientific research software engineer.  We will


- Generate an understanding of what Docker is by comparing it to existing technologies.

- Give an introduction to basic Docker concepts.

- Describe how Docker fits into the scientific analysis workflow with Jupyter notebooks.

# Understanding Docker

### Not just this cute whale thing

Docker is an open-source engine that automates the deployment of any application as a **lightweight**, **portable**, **self-sufficient container** that will run virtually anywhere.




In [1]:
!docker run --rm busybox sh -c 'echo "Hello Docker World!"'

Unable to find image 'busybox:latest' locally
latest: Pulling from library/busybox
[0B
[0B
[1BDigest: sha256:c1bc9b4bffe665bf014a305cc6cf3bca0e6effeb69d681d7a208ce741dad58e0
Status: Downloaded newer image for busybox:latest
Hello Docker World!


###Docker is a combination of a:

1. **Sandboxed chroot**
2. **Copy on write filesystem**
3. **Distributed VCS for binaries**

##Sandboxed chroot

Docker works with images that **consume minimal disk space**, **versioned**, **archiveable**, and **shareable**. Executing applications in these images does not require dedicated resources and is **high performance**.

It works with **containers** as opposed to **virtual machines** (VM's).

<img src="images/DockerVM.jpg" width="600">

In [2]:
%time !docker run --rm busybox sh -c 'echo "Hello Docker World!"'

Hello Docker World!
CPU times: user 4 ms, sys: 4 ms, total: 8 ms
Wall time: 1.23 s


A Docker container is similar to a running an application in a *chroot*, but it sandboxes processes and the network stack with Linux kernel:

* **Namespaces**: isolated processes, networking, messaging, file systems, hostname's
* **CGroups**: groups together cpu, memory, and IO resources

<img src="images/Chroot.png" width="600px">

##Copy on Write Filesystem

**Union file systems**, or UnionFS, are file systems that operate by **creating layers**, making them very **lightweight** and **fast** while **saving disk space**.

Docker can make use of several union file system variants including: 

- AUFS
- btrfs
- vfs
- DeviceMapper

<table border="0">
<tr>
<th><img src="images/LayerCake.jpg" width="300px"></th>
<th><img src="images/DockerFilesystems.svg" width="400px"></th>
</tr>
</table>


##Distributed VCS for binaries

### Docker is like Git for binaries



In [None]:
!docker search itk

- Docker images are identified with hex string or tags
- Interface is `docker <subcommand>`
- `docker push`, `docker pull`, `docker tag`
- `docker export` will create a archiveable tarball of an image's filesystem.
- DockerHub is like GitHub

<img src="images/DockerHub.png" width="400px">

###Installing

Here's what you need:

- Linux kernel with control groups and namespaces
- Support for a layered filesystem (like AUFS)
- Docker Daemon / Server (written in Go)

<img src="images/MasonJar.jpg" width="600px">

####Linux

- Ubuntu 14.04 *or*
- See [Docker installation instructions](http://docs.docker.com/installation/) for distributions with Kernel 3.8 + later *or*
- [Kernel configuration instructions](https://wiki.gentoo.org/wiki/LXC)

####Windows and Mac

[boot2docker](http://boot2docker.io/)

* easy install of
  - Git Bash
  - VirtualBox
  - Lightweight Linux distribution
  - Docker

* Works, but adds layer of complexity
* Mac native interface improving
* Comes with busybox shell -> Write your Docker build.sh and run.sh in Bourne shell

# Docker Concepts

##Image
 
###A read-only file system layer

<img src="images/DockerFilesystemsBusybox.png" width="600px">

In [3]:
!docker images

REPOSITORY             TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
busybox                buildroot-2014.02   8c2e06607696        11 weeks ago        2.433 MB
busybox                latest              8c2e06607696        11 weeks ago        2.433 MB
odise/busybox-python   latest              649988b8bf0e        4 months ago        20.26 MB


##Container

###An modifiable image with processes running in memory, or an exited container with a modified filesystem

<img src="images/DockerFilesystemsBusybox.png" width="600px">

In [4]:
!docker ps

CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES


In [5]:
!docker run -d busybox sh -c 'sleep 3'

3a6bf9d61548ae36bdc0bdb5a87aec17a8056517709c61e2df989aa0a37b7f32


In [7]:
!docker ps

CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES


In [8]:
!docker ps -a

CONTAINER ID        IMAGE                       COMMAND             CREATED             STATUS                      PORTS               NAMES
3a6bf9d61548        busybox:buildroot-2014.02   "sh -c 'sleep 3'"   14 seconds ago      Exited (0) 11 seconds ago                       goofy_almeida       
a3761241bf97        busybox:buildroot-2014.02   "sh -c 'sleep 3'"   53 minutes ago      Exited (0) 53 minutes ago                       reverent_stallman   


##Volume

###A mounted directory that is not tracked as a filesystem layer

* Data volumes are initialized when a container is created
* Volumes can be shared and reused between containers
* Changes to a data volume are made directly
* Changes to a data volume will not be included when you update an image
* Volume persist until no containers use them
* Host directories can also be mounted as data volumes

In [10]:
!ls $PWD/images/

BackwardsCompatibility.png    DockerVM.jpg
BuildInstructions1.png	      Eww.jpg
BuildInstructions2.png	      FilesystemsGeneric.png
BuildInstructions3.png	      itkka.png
BuildInstructions4.png	      Jenkins.png
CarJam.jpg		      Jupyter.png
Chroot.png		      LayerCake.jpg
ConfusedCat.jpg		      Liar.png
Debian.png		      MakerwareScreenshot.png
DockerDeploy.jpg	      MakerwareVTK.png
DockerFilesystemsBusybox.png  MakerwareWebsite.png
DockerFilesystems.svg	      MasonJar.jpg
DockerHub.png		      ModulesModulesModules.png
DockerLogo.png		      PythonStoryline.svg


In [11]:
!docker run --rm --volume $PWD/images:/images busybox \
    sh -c 'ls /images'

BackwardsCompatibility.png
BuildInstructions1.png
BuildInstructions2.png
BuildInstructions3.png
BuildInstructions4.png
CarJam.jpg
Chroot.png
ConfusedCat.jpg
Debian.png
DockerDeploy.jpg
DockerFilesystems.svg
DockerFilesystemsBusybox.png
DockerHub.png
DockerLogo.png
DockerVM.jpg
Eww.jpg
FilesystemsGeneric.png
Jenkins.png
Jupyter.png
LayerCake.jpg
Liar.png
MakerwareScreenshot.png
MakerwareVTK.png
MakerwareWebsite.png
MasonJar.jpg
ModulesModulesModules.png
PythonStoryline.svg
itkka.png


##Dockerfile

###A sequence of instructions to generate a Docker image

In [8]:
%%writefile docker-ls-images/Dockerfile

# Best practice for Dockerfile's:
#   specify exact versions whenever possible.
FROM busybox:4986bf8c1536
MAINTAINER Matt McCormick <matt.mccormick@kitware.com>
RUN mkdir -p /images
VOLUME /images
CMD ["/bin/sh", "-c", "ls /images"]

Overwriting docker-ls-images/Dockerfile


In [None]:
!docker build -t ls-images ./docker-ls-images

In [None]:
!docker run --rm -v $PWD/images:/images ls-images

In [15]:
YouTubeVideo('QqfjiuqVrV4')

## Scientific Python with Docker

##Graphical Applications and Docker

A **portable Docker image** will only assume standard CPU/memory/disk/network resources are available. If *local USB devices* and **video card devices** are used the images will **not be runnable anywhere**.

* No OpenGL
* Use [IPython / Jupyter Notebooks](http://ipython.org/notebook.html)

<img src="images/Jupyter.png" width="500px">

##Choosing a base image

* [debian](https://registry.hub.docker.com/_/debian/) - [Most common](https://docs.docker.com/articles/dockerfile_best-practices/) lightweight image
* [ipython/notebook](https://registry.hub.docker.com/u/ipython/notebook/) - Launcher SSL / password enabled IPython notebook
* [jupyter/tmpnb](https://registry.hub.docker.com/u/jupyter/tmpnb/) - Launches "temporary" Jupyter notebook servers
* [continuumio/miniconda](https://registry.hub.docker.com/u/continuumio/miniconda/) miniconda installed
* [nixos/nix](https://registry.hub.docker.com/u/nixos/nix/) Nix package manager installed
* ...
* Make your own

<img src="images/Debian.png" width="100px">

**Isn't there supposed to be no OpenGL?**

<img src="images/Liar.png" width="600px">

It is possible to run accelerated X11 OpenGL 3D applications, but the Docker images that are built will **only work** on host systems with the same video driver and compatible video card.

See the [docker-opengl-nvidia](https://github.com/thewtex/docker-opengl-nvidia) and the [docker-opengl-mesa](https://github.com/thewtex/docker-opengl-mesa) repositories.

# Recap and Next Steps

## Docker is


* Sandboxed chroot +

* Incremental, copy on write filesystem +

* Distributed VCS for binaries +

## Concepts

* *Image*:  A read-only file system layer

* *Container*: A writable image with processes running in memory, or an exited container with a modified filesystem

* *Volume*: A mounted directory that is not tracked as a filesystem layer

* *Dockerfile*: A sequence of instructions to generate a Docker image

## Scientific Python and Docker

* Not for graphical applications, especially OpenGL 

* Reproducible computational environment for IPython notebook

* Use with Linux-based packaging system of your choice

## Learn more!

* [Interactive Brower-Based Docker Tutorial](https://www.docker.com/tryit/)
* [Docker Documentation](https://docs.docker.com/userguide/)
* [Reproducible Research: Walking the Walk Tutorial](https://reproducible-research.github.io/scipy-tutorial-2014/)
* [IPython DockerHub Repositories](https://registry.hub.docker.com/repos/ipython/)

##Docker vs. LXC

* [LXC](https://linuxcontainers.org/) is a set of tools and API to interact with Linux kernel namespaces, cgroups, etc.
* LXC used to be the default execution enviroment for Docker
* Docker provides LXC function, plus:
  - Portable deployment across machines
  - Application-centric
  - Automatic builds
  - Versioning
  - Component re-use
  - Sharing
  - Tool echosystem

##Docker vs Rocket

- [Rocket](https://github.com/coreos/rocket) is a container system like Docker developed by CoreOS
- Rocket is not yet fully operational
- Rocket does not use a daemon/client system