As you already know, we use Dagger for CI/CD. By default, this runs on
Fly.io (via Docker). In some cases, this can fail.
The last failure was when DNS resolution stopped working after the
Docker instance was auto-upgraded from apps v1 -> v2 (a.k.a. Fly.io
machines), e.g.
https://github.com/thechangelog/changelog.com/actions/runs/5673476702/attempts/1
As a temporary fix, we had to delete some secrets and re-run the job.
The job ran on GHA free runners & failed for genuine reasons
6 mins later:
https://github.com/thechangelog/changelog.com/actions/runs/5673476702/job/15395264391
While running on the free GHA runners can be 3x-8x slower, it's a good
fall-back. You heard us mention on multiple occasions: "always have
redundancies in place". Since we already have multiple CI runtimes in
place (Fly.io. K8s), let's make our GHA workflow resilient by:
- Run on our preferred back-end by default (Dagger on Fly.io)
- ✅ If it succeeds, we are done
- ❌ If it fails, fallback to running on the free GitHub runners
- In forks, use free GitHub runners by default (we cannot share `secrets`)
While this means that a workflow which fails for genuine reasons will
fail twice for us (1. Dagger on Fly.io, 2. Dagger on GitHub), it seems
like a better place to improve from.
This change goes one step further. We are using a third back-end: Dagger
on K8s. This uses a self-hosted GitHub runner on K8s which is already
integrated with Dagger. For now, we are using it just to see how the CI
part compares to our primary setup (Dagger on Fly.io). We are not using
Dagger on K8s to deploy the app. Let's see how this setup behaves over a
few weeks/months before we consider taking it further.
Part of this, we also improved on how we check for Fly.io connectivity.
Things that could be improved in follow-ups:
- the workflow should succeed if the `dagger-on-github-fallback` job succeeds
- currently it fails if `dagger-on-fly-docker` fails
- add `dagger-on-k8s` job as secondary fallback
- GitHub Actions is currently missing actions/runner#1665
- maybe leverage a Dagger cache that works in forks too 😉
- Run Dagger Engine as a Fly Machine (no more Docker)
- thechangelog#471
Signed-off-by: Gerhard Lazu <gerhard@changelog.com>