## Set up a Dockerhub pull through cache

This notebook sets up a KVM instance as a DockerHub pull-through cache for use in experiments where you may have to repeatedly pull the same images from DockerHub. For example:

* Installing Kubernetes in a large cluster (each node will independently pull the same images from DockerHub)
* Running a Docker-based assignment in a classroom setting (each student will independently execute the experiment, pulling the same images from DockerHub)

Under these circumstances, a pull-through cache helps by:

* Preventing you from exceeding DockerHub's rate limit for `docker pull` operations (note that all hosts in `sharednet1` count as one address toward this rate limit, unless they have their own public IPs)
* and speeding up `docker pull` operations by keeping a local cache of images.

In [None]:
import chi, os, time, datetime
from chi import lease
from chi import server
from chi import context
from chi import hardware
from chi import network

context.version = "1.0" 
context.choose_project()
context.choose_site(default="KVM@TACC")


In [None]:
exp_name = "docker_cache"
server_name = f"{exp_name}"
lease_name = f"{exp_name}"


#### Reserve and launch a VM instance

Adjust the duration of your lease as needed:

In [None]:
l = lease.Lease(lease_name, duration=datetime.timedelta(weeks=18))
l.add_flavor_reservation(id=chi.server.get_flavor_id("m1.large"), amount=1)
l.submit(idempotent=True)


In [None]:
l.show()


In [None]:
image_name = "CC-Ubuntu24.04"
s = server.Server(
    name=server_name,
    image_name=image_name,
    flavor_name=l.get_reserved_flavors()[0].name
)
s.submit(idempotent=True)


#### Set up network connectivity for instance

In [None]:
s.associate_floating_ip()


In [None]:
security_groups = [
  {'name': "allow-ssh", 'port': 22, 'description': "Enable SSH traffic on TCP port 22"},
  {'name': "allow-5000", 'port': 5000, 'description': "Enable TCP port 5000 (used by Docker image registry)"}
]

In [None]:
for sg_def in security_groups:

    sg_list = network.list_security_groups(name_filter=sg_def["name"])
    if sg_list: # already exists, get ID of first matching entry
        sg = sg_list[0]
    else:       # create new security group
        sg = network.SecurityGroup({"name": sg_def["name"], "description": sg_def["description"]})
        sg.add_rule("ingress", "tcp", sg_def["port"])
        sg.submit()
    try:
        s.add_security_group(sg.id)
    except:     # server may already have this security group - that's OK
        pass

In [None]:
s.refresh()
s.check_connectivity()


#### Set up Docker

In [None]:
s.execute("curl -sSL https://get.docker.com/ | sudo sh")
s.execute("sudo groupadd -f docker; sudo usermod -aG docker $USER")


#### Set up pull-through cache

In [None]:
s.execute("git clone https://github.com/teaching-on-testbeds/dockerhub-pull-through-cache")


In [None]:
s.execute("docker compose -f ~/dockerhub-pull-through-cache/docker-compose-registry.yaml up -d")

#### Get address of pull-through cache

In [None]:
host = f"kvm-dyn-{s.get_floating_ip().replace('.', '-')}.tacc.chameleoncloud.org"
print(host)

Use this address in a `daemon.json` file for a Docker service to use the pull-through cache. For example:

In [None]:
import json
config = {
    "registry-mirrors": [f"http://{host}:5000"],
    "insecure-registries": [f"{host}:5000"]
}
print(json.dumps(config, indent=1, separators=(",", ":")))