# Docker and Python Packaging

- [How to build a docker image](#how-to-build-a-docker-image)
- [Pushing an image to a registry](#how-to-push-an-image-to-a-registry)
- [**Exercise**: Build a simple custom image of your own](#exercise-build-your-own-custom-image)

## Motivation

In the previous section, we saw how to spin up a cluster using Dask. Using a Docker image spins up the workers with a consistent environment so that distributed flows run reliably. But even beyond the use case with Dask, Docker is an integral part of data workflows. Data professionals often hear phrases like "it worked on my machine" or "nothing changed, but by workflow stopped running."

Docker allows us to build images and run them in other execution environments. An image contains all of the dependencies needed for our application, while a container is an instance of the image. By building a fixed image, we can pin the dependencies so that they stay fixed from run to run. 

## When to use Docker?

Data scientists or data engineers might already be familiar with virtual environments. Virtual environment managers such as `poetry`, `pipenv`, and `conda` allow us to fix package versions. What are the use cases then that call for Docker?

1. You can have multiple containers with different Python versions
2. It allows us to pin the non-Python dependencies such as Java for Spark
3. It is the unit for spinning up clusters/jobs (Kubernetes or Dask)
4. Host legacy applications with older technology (some Prefect users run containers with specific scientific computing libraries)
5. Encourages reproducible work
6. Eases the transition from development to deployment (some services like AWS ECS and Google Vertex need containers to run)

## Docker Architecture

The Aqua Security [documentation](https://www.aquasec.com/cloud-native-academy/docker-container/docker-architecture/) has a very good diagram about the Docker architecture. There are three main parts:

* Client - this can be the Docker client or Python client
* Daemon - the daemon is what orchestrates containers and images
* Registry - used to store images for downloading in other places.

![img](docker_architecture.png)

## Build a Docker image



`Dockerfile`
```Dockerfile

FROM base_image:andmyTag

COPY myModule2 /root/or/somewhere_on_my_new_container

# and for myModule2's dependencies, we need to
RUN pip install -r requirements.txt

# and then maybe some commands to run when the container starts, like a
CMD echo 'ing some handy dandy message like "cats!" or "im cool!"'
```

## Upload to a Registry

#  
<!-- https://medium.com/the-prefect-blog/the-simple-guide-to-productionizing-data-workflows-with-docker-31a5aae67c0a -->

<img src="https://miro.medium.com/max/504/1*iBGlEPUruUqqT5NreeEF8g.png" width=200>

## **Exercise**: dockerizing some python



## what is docker?
it is an open-source containerization technology especially useful for cloud-based ecosystems

## how can we **build** a docker image?
we tell docker what to build!

`Dockerfile`
```Dockerfile
# starting
FROM base_image:andmyTag

COPY myModule2 /root/or/somewhere_on_my_new_container

# and for myModule2's dependencies, we need to
RUN pip install -r requirements.txt

# and then maybe some commands to run when the container starts, like a
CMD echo 'ing some handy dandy message like "cats!" or "im cool!"'
```
## **push** an image to a registry

### Auth
<!-- thoughts on actually having people login/signup w dockerhub? -->
We need to auth our CLI session with our image repository otherwisesuperconvenientfreestorage

```console
docker login
```
sign up etc

### Build and Tag
```console
docker build . --tag zzstoatzz/prefect-imgs:dev
```

## other cool images (prolly ML stuff) + what they do

In [None]:
from setuptools import setup, find_packages

# with open('requirements.txt') as f:
#     requirements = f.read().splitlines()


Docker and Python Packaging (25 min) - Nate:
* Presentation: 



* How to create a Python Package
* Uploading an image to a registry
* Exercise: Building a simple image
* Q&A
Resource: https://medium.com/the-prefect-blog/the-simple-guide-to-productionizing-data-workflows-with-docker-31a5aae67c0a
