# Docker (Containerisation)

## Goals
- Understand Docker’s core principles: **images**, **containers**, and the **engine/daemon**.
- Know what Docker is used for and what problems it solves.
- Learn the most important day-to-day commands.
- Understand how Docker networking works (IPs, bridges, DNS, port publishing).
- Understand how volumes/mounts work and how Docker Compose orchestrates multi-container apps.


## Prerequisites
- Basic Linux concepts help: processes, filesystems, permissions.
- Basic networking: IPs, ports, DNS.

Notes:
- On Linux, containers run using the host kernel.
- On macOS/Windows, Docker Desktop typically runs containers inside a lightweight Linux VM, so paths/networking can differ.


## What is Docker?

**Docker** is a platform for building, distributing, and running **containerized applications**.

Key pieces:
- **Docker Engine**: the runtime that creates and manages containers.
- **Docker daemon (`dockerd`)**: a background service that talks to the OS and container runtime.
- **Docker CLI (`docker`)**: the command-line client that calls the daemon API.
- **Images**: immutable templates (built from layers) used to create containers.
- **Registries**: places to store/pull images (Docker Hub, ECR, GHCR, etc.).

What it is not:
- Not a VM: containers share the host kernel (they are isolated processes with a packaged filesystem).
- Not a configuration-management tool by itself (OS config often uses cloud-init/Ansible/etc.).


## Core concepts (minimum you should know)
- **Image**: built artifact (layers + metadata) → used to start containers.
- **Container**: a running (or stopped) instance of an image.
- **Dockerfile**: build recipe for an image.
- **Registry**: image storage (pull/push).
- **Network**: virtual L2/L3 connectivity between containers and the host.
- **Volume / bind mount**: how you persist or share data outside the container filesystem.
- **Port publishing**: map a host port to a container port (e.g., `-p 8080:80`).
- **Compose**: define multi-container apps in a YAML file and run them as a unit.


## Most important commands (cheat sheet)

Images:
```bash
docker pull nginx:1.25
docker images
docker build -t myapp:dev .
docker tag myapp:dev myrepo/myapp:dev
docker push myrepo/myapp:dev
```

Containers:
```bash
docker ps            # running
docker ps -a         # all
docker run --rm -it ubuntu:24.04 bash
docker run -d --name web -p 8080:80 nginx:1.25
docker logs -f web
docker exec -it web sh
docker stop web
docker rm web
```

Inspect/debug:
```bash
docker inspect web
docker stats
docker events
```

Networking + storage:
```bash
docker network ls
docker network inspect <network>
docker volume ls
docker volume inspect <volume>
```

Cleanup (be careful):
```bash
docker system df
docker system prune
```


## How Docker works internally (high-level)

On Linux, Docker runs containers using kernel primitives:
- **Namespaces** isolate views of the system (PID, network, mounts, UTS hostname, IPC, user).
- **cgroups** enforce resource limits (CPU, memory, IO).
- **Union filesystem** (often OverlayFS) provides image layers + copy-on-write.

Image layers and copy-on-write:
- An image is a stack of read-only layers.
- A container adds a small read-write layer on top.
- Deleting the container deletes that writable layer (unless you used volumes/bind mounts).

Docker is OCI-aligned:
- Images are commonly OCI-compatible.
- Under the hood Docker typically uses components like `containerd` and an OCI runtime (`runc`).


## Networking: how container IPs work

### The default mental model (Linux)
- Each container gets its own **network namespace**.
- Docker creates a **veth pair** (virtual ethernet cable):
  - one end becomes `eth0` inside the container namespace
  - the other end stays on the host and connects to a Linux bridge
- The bridge (often `docker0` or a user-defined bridge) acts like a virtual switch.

Outbound traffic (container → internet):
- Containers usually get private IPs (e.g., `172.17.0.0/16`).
- Docker configures NAT (iptables) so container traffic is masqueraded through the host.

Inbound traffic (internet → container):
- `-p 8080:80` publishes a port by adding a DNAT rule:
  - host `:8080` → containerIP `:80`
- Publishing is explicit; without it, services are reachable only from the host or other containers on the same network.

### Networks you’ll actually use
- **User-defined bridge networks** (recommended for local multi-container apps):
  - built-in DNS lets containers resolve each other by name
  - better isolation vs the default `bridge`
- **host** network: container shares host network namespace (no port mapping; fewer isolation guarantees).
- **none**: no networking.
- **overlay**: multi-host networking (typically with Swarm; Kubernetes uses its own CNI).

DNS:
- On user-defined networks, Docker provides an embedded DNS server (commonly reachable at `127.0.0.11` inside containers).
- Container names (and Compose service names) resolve to container IPs on that network.


## Volumes and mounts (how persistence works)

Containers are disposable. Persist data using mounts:

### 1) Named volumes (managed by Docker)
- Created and lifecycle-managed by Docker.
- On Linux (rootful), data typically lives under `/var/lib/docker/volumes/...`.
- Can use volume drivers for remote storage.

Example:
```bash
docker volume create pgdata
docker run -d --name db \
  -e POSTGRES_PASSWORD=dev \
  -v pgdata:/var/lib/postgresql/data \
  postgres:16
```

### 2) Bind mounts (host path → container path)
- You mount a specific directory/file from the host.
- Great for local development (live-edit code on host).

Example:
```bash
docker run --rm -it \
  -v "$PWD":/app \
  -w /app \
  python:3.12-slim python -m http.server 8000
```

### 3) tmpfs mounts (memory-backed)
- Data is not persisted to disk; useful for secrets or scratch space.

Common pitfalls:
- Permissions: the container’s user/UID must be able to read/write the mounted path.
- Don’t store secrets in images; prefer runtime env/secret stores.


## Docker Compose (what it is and how it works)

**Docker Compose** is a tool and spec for defining and running **multi-container applications** using a YAML file (commonly `compose.yml` or `docker-compose.yml`).

Compose concepts:
- **Project**: a named group of resources (containers, networks, volumes) created from one compose file.
- **Services**: long-running containers (web, worker, db).
- **Networks**: Compose creates a default user-defined bridge network (e.g., `<project>_default`).
- **Volumes**: Compose can create named volumes automatically.

What happens on `docker compose up` (typical):
1) Build images (if `build:` is set) and/or pull images.
2) Create networks and volumes.
3) Create containers with the declared config.
4) Start containers (often in dependency order).
5) Provide DNS-based service discovery on the project network (service name → container IP).


## Example: Compose file (pseudo-code)

```yaml
# PSEUDO-CODE: compose.yml

services:
  web:
    build: .
    ports:
      - "8080:8080"   # host:container
    environment:
      DATABASE_URL: postgresql://app:dev@db:5432/app
    depends_on:
      - db

  db:
    image: postgres:16
    environment:
      POSTGRES_USER: app
      POSTGRES_PASSWORD: dev
      POSTGRES_DB: app
    volumes:
      - pgdata:/var/lib/postgresql/data

volumes:
  pgdata:
```

Commands:
```bash
docker compose up -d --build
docker compose logs -f
docker compose exec web sh
docker compose down
# docker compose down -v   # also remove named volumes
```

Networking detail:
- `web` can reach `db` via hostname `db` because both are on the same Compose network.


## Pitfalls & quick tips
- Prefer **user-defined networks** (Compose does this by default) for reliable service-name DNS.
- Avoid `latest` in production; pin versions/digests.
- Use `.dockerignore` to keep build context small.
- Use multi-stage builds to reduce image size.
- Think about trust boundaries: containers are isolated, not impenetrable.
- Don’t bake secrets into images; inject at runtime.

## Exercises
- Write a Dockerfile for a small web app and build/run it locally.
- Create a user-defined network and verify containers can resolve each other by name.
- Compare a bind mount vs a named volume for a database container.

## References
- Docker Docs: https://docs.docker.com/
- Docker Compose Spec: https://compose-spec.io/
