# Container architecture and fundamentals

Great video tutorials to start with:

[Quick podman install and load of Cave Adventure](https://www.youtube.com/watch?v=bJDI_QuXeCE)

[Detailed architectural overview of containers and podman](https://www.youtube.com/watch?v=lc2rR_0Ie5g)
* Go to 32:00

Not reviewed yet: [Docker Networking](https://www.youtube.com/watch?v=cfzrLKvF5X0). Control and configuration of ports and sockets is are critical - particularly when the default is usually to lock everything down.





# Basic Principles: Why containers? 


* How do I handle multiple copies of an application that may interfere with each other?
    * Copies can interfere with each other (expect uniqueness)
    * Good database: each copy has different access ports,directories. Copies (test, QA, production) exist on same system. Common when a Unix system cost minimum \$50,000, *and that is after cost came down from \$100K+*. Informix was probably the best.
    * Bad database: all those parameters hard-coded.
    * Multi-process software: how can I tell which instance of a process belongs to which instance of the full package?
        

 ### Basic principles - Management (cont.)   
* Same as above - but for separate applications that can interfere with each other?
* How can I massively expand the number of copies for performance?
    * With the interference mentioned above?
    * Without the interference mentioned above, but on multiple systems?
    * Without the hassle of re-installing the software and operating system?
* How do I handle application fail/restart?
    

### Basic principles - Security   
* I need these management tools on my VM. Does that increase the number of risk points?
* If the software I'm running is hacked, what else on the system can the hack get at?
    * Data?
    * Software with greater permissions/access?
    
### Basic principles - Management
* If I need a VM for each instance, isn't that a lot of overhead?
    * Disk space: system, swap
    * Dedicated RAM
    * Management applications
        
        

# Key requirements for containers

## Isolation essentials
1. How do I stop multiple apps (same app or different) from interfering with each other?
    * They may want the same network ports
        * Well-known port in /etc/services
        * Not well known ports and inter-process communication between processes of the application
    * They may want the same directory and files
    * Their process stack management may interfere with each other
    

## Key requirements (cont.): Isolation wish list
1. How do I isolate an app from other apps?
    * Because other apps may open vulnerabilities
    * Because this app may open vulnerabilities to others
1. Can I *really* restrict an app:
    * To certain network interfaces and ports?
    * Even down to *_system calls_*?
    

## Key requirements (cont.)
### Management essentials
1. Can I spin up new copies quickly?
1. Can I control the software component versions?

### Management wish list
1. Could I spin up copies on different platforms? x86? ARM?

    
    

![Integers](integers.PNG)

# The Integers: Basic Operating System Enhancements for Containers

## Key principle: Linux namepaces

Inspired by Bell Lab's Plan 9 operating system.

[Wikipedia page](https://en.wikipedia.org/wiki/Linux_namespaces)
Common within an operating system (Virtual machine or native):
* Mount/directories
* Process Ids
* Network interfaces and ports
* Interprocess communication
* Host and domain names (UTS)
* User IDs
    

    
## Key development: cgroups ("control groups")
[Wikipedia page](https://en.wikipedia.org/wiki/Cgroups)

* Concept came from Bell Lab's Plan 9 operating system - built by designers of Unix!!!
* Linux development and kernel modifications by Paul Menage and Rohit Seth at Google started in 2007
* RHEL6.0 adopted it in 2010
* V2 Redesign and rewrite by Tejun Heo first appeared in Linux Kernel 4.5 in 2016
* Achieved all the namespace isolation objectives

### cgroups (cont)
* Examples:
    * Each cgroup starts with Pid 1
        * Zombie processes inherited by cgroup's Pid 1
        * Processes in cgroup only see processes in cgroup
    * Out of memory? Kill entire cgroup, not just one process. Integrity maintained.
    

  
## ... and cgroups go farther
* Memory and file cache limits
* Share of CPU and disk I/O
* Usage accounting
* Freezing, checkpointing and restarting

## Operational usage
OS can manage its processes in cgroups
* system calls added to Linux: cgcreate, cgexec, cgclassify
* system call to start process: clone() (vs fork() and exec() for standard process)


# *And that's it for the Integers*

# The Work of Man: Managing the cgroups

* Software that initiates and manages process groups (i.e. running the system calls above
* Docker, Podman, Pods of containers, Kubernetes (K8S), OpenShift, etc.



![Copied from RedHat video](Container_objectives.PNG)

### Guaranteed portability? *Really?*

The OS dependencies (a disturbingly vague concept) are:
* Libraries and other software for the application
* Operating system resources - network, memory, file system, etc.
* Objective can be defeated by lack of a library or other software component for the selected platform.
* _*Manageability*_ can be limited by the environment and OS structure choices (Some issues for KVM and K8s)

# Container concepts and terminology

Ideally, The SMALLEST COMPUTE UNIT...*_to accomplish the task at hand_*
* Not really very small if there are a ton of processes and libraries
* But:
    * you are eliminating all the operating system management and operational tools and processes _*if you had to set up a VM to run it*_
    * You are eliminating all the possible security holes from other processes in the container
    

    
## *But don't forget vulnerabilities from base OS packages*

## Container IMAGE

* The DEFINITION of the container
    * Software and libraries required
    * Those operating system resources we need (network, file system, etc)
    * Matching versions and dependencies
* With definition created, a tarball of all the components
* Can include management tools if required
* Available as a whole from repositories!

## Image Registry
* All the different container IMAGES and RELEASES
* All the different container image VERSIONS FOR THE PLATFORMS YOU WANT TO RUN ON (Guaranteed portability? Hah!)
* You can pull down from container registries on the web

## CONTAINER: A running copy of the image
* Yes, it has its own ephemeral copy of the software
    * Reloaded from the image every time
    * The anti-DK: you patch, you lose when container restarted
    * DATA is not part of the image. It's a RESOURCE pointed to by the IMAGE
        * cgroups NAMESPACE control maps actual data directory to expected location 
    * DATA DOES PERSIST when a container exits... but not when container is DELETED.
* Runs within a cgroup

    

# INITIATORS (RJ's terminology) or Engine: Docker and Podman
* Docker laid out the design for the definition file for all the requirements. Podman copied.
* Remember those cgroups and the system calls to implement them?
    * Docker and Podman provide the "root" process(es) that make the system calls to start the cgroup based on the definition file
* Docker and Podman both use a runc initiator process
* Docker creates a "root" process for each container
* Podman does it with enhanced runc than doesn't require a per-container process

### Initiators (cont)
* Every program running in the container is a pure Linux process - but within the namespace and restrictions of the cgroup
* Orchestration tools... K8s, OpenShift... are all regular processes.
* *_All the real control is from the operating system_*
    * The OS knows nothing about containers
    * The OS implements the namespace through the cgroups controls to segregate processes and resource access
* The container initiators use the OS calls and interfaces to create the management tools

![Container host](Container_host.PNG)

### *And they have a lot of added management stuff*
### The container configuration file (.yaml) can go on for pages

# Open Container Initiative

## The Docker standard

Two specifications from Docker (a company), Red Hat and others in June 2015.
1. Image format: how to create the IMAGE with all the info required
2. How to unpack and run the IMAGE


## Podman: daemon-less Docker

* Removed per-container daemaon by adding required capabilities to runc
* Added remote management via Varlink
* Improved systemd integration and advanced namespace isolation


# Ephemeral or not?

![Ephemeral Definition](ephemeral_definition.PNG)

### *The operating system facilities (cgroups, namespaces) have nothing to do with this*
### *_It's all managed by the container initiators/managers_*




![Persistence](persistence.jfif)

# The container persistence puzzle

Which is true?
* "All the data is destroyed when you restart a container"
* "When you restart a container, you continue where you left off, with all the data"
* "If you make changes to the code in a running container, it's lost when you restart"
* "I made a change to the image, restarted the container, and still had the old code!"

Answer: _*It depends*_... ON THE MANAGEMENT! Container stop/start configuration, how the containers was stopped...

# Key Persistence Concepts

* Registry/Depository location and management
* Container stop vs container delete
* Container restart
* IMAGE vs VOLUME: data *and* code

### Key Persistence Concepts (cont.)
* VOLUME vs WORKDIR
* IMAGE and container instance
* IMAGE and caching
* IMAGE and config file (yaml) caching instructions
* Clean up after yourself!

## Be careful of answers and descriptions you find on-line *because the author may want or have tailored/managed the system for a specific behaviour*

## Good reads:

* [Clearing Docker Cache to Save Disk Space](https://mohitgoyal.co/2017/07/03/clear-docker-cache-to-save-disk-space/)
* [Auto-restarting docker containers](https://stackoverflow.com/questions/29603504/how-to-restart-an-existing-docker-container-in-restart-always-mode)

# Can we go further on security?

* SELINUX
* System calls to restrict system calls
    * IBM pioneered this one
    * Again, back to the Integers - requires base Linux feature
    * "Initiator" process can get a list of which system calls to allow
    * Default container settings have a very tight allowed set.