Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make our ship_it.yml GHA workflow resilient #476

Merged

Commits on Jul 31, 2023

  1. Make our ship_it.yml GHA workflow resilient

    As you already know, we use Dagger for CI/CD. By default, this runs on
    Fly.io (via Docker). In some cases, this can fail.
    
    The last failure was when DNS resolution stopped working after the
    Docker instance was auto-upgraded from apps v1 -> v2 (a.k.a. Fly.io
    machines), e.g.
    https://github.com/thechangelog/changelog.com/actions/runs/5673476702/attempts/1
    
    As a temporary fix, we had to delete some secrets and re-run the job.
    The job ran on GHA free runners & failed for genuine reasons
    6 mins later:
    https://github.com/thechangelog/changelog.com/actions/runs/5673476702/job/15395264391
    
    While running on the free GHA runners can be 3x-8x slower, it's a good
    fall-back. You heard us mention on multiple occasions: "always have
    redundancies in place". Since we already have multiple CI runtimes in
    place (Fly.io. K8s), let's make our GHA workflow resilient by:
    - Run on our preferred back-end by default (Dagger on Fly.io)
      - ✅ If it succeeds, we are done
      - ❌ If it fails, fallback to running on the free GitHub runners
    - In forks, use free GitHub runners by default (we cannot share `secrets`)
    
    While this means that a workflow which fails for genuine reasons will
    fail twice for us (1. Dagger on Fly.io, 2. Dagger on GitHub), it seems
    like a better place to improve from.
    
    This change goes one step further. We are using a third back-end: Dagger
    on K8s. This uses a self-hosted GitHub runner on K8s which is already
    integrated with Dagger. For now, we are using it just to see how the CI
    part compares to our primary setup (Dagger on Fly.io). We are not using
    Dagger on K8s to deploy the app. Let's see how this setup behaves over a
    few weeks/months before we consider taking it further.
    
    Part of this, we also improved on how we check for Fly.io connectivity.
    
    Things that could be improved in follow-ups:
    - the workflow should succeed if the `dagger-on-github-fallback` job succeeds
      - currently it fails if `dagger-on-fly-docker` fails
    - add `dagger-on-k8s` job as secondary fallback
      - GitHub Actions is currently missing actions/runner#1665
    - maybe leverage a Dagger cache that works in forks too 😉
    - Run Dagger Engine as a Fly Machine (no more Docker)
      - thechangelog#471
    
    Signed-off-by: Gerhard Lazu <gerhard@changelog.com>
    gerhard committed Jul 31, 2023
    Configuration menu
    Copy the full SHA
    2fcffd9 View commit details
    Browse the repository at this point in the history