Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alpine package upgrades #23

Open
TravisCardwell opened this issue Mar 28, 2022 · 12 comments
Open

Alpine package upgrades #23

TravisCardwell opened this issue Mar 28, 2022 · 12 comments

Comments

@TravisCardwell
Copy link
Collaborator

During discussion in #21, I ran lsupg on utdemir/ghc-musl:v24-ghc922 and was surprised to see that many packages have updates for an image that was built so recently. I assume that ghc-musl containers are not public-facing, but statically linking old versions of libraries could have security implications for the static executables built using ghc-musl.

The image that I am testing was built within the past two days.

$ docker images utdemir/ghc-musl
REPOSITORY         TAG           IMAGE ID       CREATED             SIZE
utdemir/ghc-musl   v24-ghc884    762d6c408038   About an hour ago   3.38GB
utdemir/ghc-musl   v24-ghc922    e4b4c874e807   40 hours ago        3.71GB
utdemir/ghc-musl   v24-ghc8107   bf3cac041a70   41 hours ago        3.42GB
utdemir/ghc-musl   v24-ghc902    f7e76a7f3474   41 hours ago        3.32GB

There are a number of packages with updates available. In this case, ghc-musl users may be concerned with the old crypto, TLS, and SSL packages.

$ lsupg --docker utdemir/ghc-musl:v24-ghc922
apk  busybox                 1.34.1-r3    1.34.1-r4
apk  ca-certificates-bundle  20191127-r7  20211220-r0
apk  libcrypto1.1            1.1.1l-r8    1.1.1n-r0
apk  libssl1.1               1.1.1l-r8    1.1.1n-r0
apk  libretls                3.3.4-r2     3.3.4-r3
apk  ssl_client              1.34.1-r3    1.34.1-r4
apk  bash                    5.1.8-r0     5.1.16-r0
apk  expat                   2.4.4-r0     2.4.7-r0
apk  openssl-dev             1.1.1l-r8    1.1.1n-r0
apk  openssl-libs-static     1.1.1l-r8    1.1.1n-r0
apk  libxml2                 2.9.12-r2    2.9.13-r0

The problem is that apk update is run to update the package index, but apk upgrade is not run to upgrade the packages that are already installed. The parent image (alpine:3.15.0) is over four months old, so the packages really should be upgraded, IMHO.

Using a specific version (3.15.0) of the alpine image makes it clear exactly which image was used to build a ghc-musl image. Adding apk upgrade, however, means that the packages used depends on the timing of the build. Perhaps this negates the point of using a specific version? Would it be worthwhile to use alpine:latest instead? Note that I recommend running apk upgrade even when using the latest image.

TravisCardwell added a commit to TravisCardwell/ghc-musl that referenced this issue Apr 12, 2022
@TravisCardwell
Copy link
Collaborator Author

I am working on documentation and am adding a section on security. The way that this issue is resolved will have a big impact on that documentation.

I think that it is essential to provide a way to use the latest packages. Currently, there is no way to do so without editing the Earthfile.

I wrote that adding apk upgrade makes the packages used depend on the timing of the build, but I have since realized that apk update is the actual culprit. Current images already depend on the timing; they are not reproducible since apk update is used. I was thinking about proposing a APK_UPGRADE argument that would allow users to disable upgrades (in the same way that the TEST_STACK argument can be used to disable Stack tests), but I cannot think of a good reason to disable upgrades... I therefore propose that we add apk upgrade.

Since timing is already a factor, I also propose that we make the ALPINE_VERSION argument default to latest. People who want to build images based on a specific version can use --build-arg to specify the tag.

I pushed both of these changes to my upgrade branch, if you would like to try it out.

What do you think?

@utdemir
Copy link
Owner

utdemir commented Apr 15, 2022

It mostly looks good to me @TravisCardwell . But I think we shouldn't update ALPINE_VERSION as it might contain backwards-incompatible changes.

I guess the term reproducibility means a bit different than a bit-by-bit equality. I would say that in ghc-musl an image is equivalent if it contains the same set of libraries with the same ABI (ie. same minor version).

  1. We should have a tag that does never change. Users who might want bit-by-bit equality would prefer that. It's similar to what we have right now.
  2. I think we should run apk upgrade periodically and upload it using a new tag, and also with a mutable tag pointing to the "latest" apk upgrade'd version. This would probably the image I'd use. I bet Alpine publishes security fixes for some time for previous versions too, so this tag would provide the best of the both worlds.
  3. I personally would not depend on tag where the underlying ALPINE_VERSION changes. As this means that it can contain new major versions of the libraries, which can break my builds.

So, I think we only disagree on the third item. I am happy to provide that tag, but given that we also provide tags that are pinned to a specific Alpine version.

A major issue with both mine and your approaches is that we do not really maintain our previous version. Someone can depend on, say v25-ghc902-latest and expect it to get updates, however as soon as we publish v26 we'll stop updating the v25 images. So this might give a false sense of security. To avoid that, I suggest we stop versioning our library, and solely use dates on our changelog.

I am thinking like publishing all below tags:

  1. ghc-musl:ghc9.0.2-alpine3.15-20220416
  2. ghc-musl:ghc9.0.2-alpine3.15
  3. ghc-musl:ghc9.0.2

So, every week we have a CI task rebuilding the image, and publishes the tag 1 as an immutable pointer, and updates the 2. and 3. mutable pointers.

Another improvement would be that we keep building the project on last 3 Alpine versions so they keep getting security updates. But this might be too much maintenance burden for our scale.

What do you think @TravisCardwell ?

@TravisCardwell
Copy link
Collaborator Author

Thank you very much, @utdemir, for the explanation!

The term reproducibility does indeed have various interpretations. The one that I had in mind when I wrote the above is the ability to produce an image with the same versions of packages given a static Earthfile. Currently, the versions of all packages except for those already installed in the base image depend on the state of the package index, since apk update is run. We therefore cannot provide reproducibility like this. End users can only have reproducibility of their builds by using a fixed version of the image. I did not mean that we should remove such tags, just that it would be nice to also provide general tags to ease maintenance when the risks are acceptable.

What you say about minor version upgrades is a very good point. Does Alpine make any guarantees about the change of versions of packages between point releases of Alpine? I read that Alpine stable releases are "point-in-time snapshots of the package archives" that are tested to ensure inter-package compatibility, but I have been unable to find details about the how major and minor versions are managed.

I think that it is unlikely that changes in Alpine will break lsupg builds, so I was thinking that it would be acceptable to use general tags so that I do not need to regularly bump the versions and simply worry about issues if they arise. Perhaps this is too optimistic, though, and using Alpine release branch tags would reduce the risks without increasing the maintenance burden much, since branches are only made twice per year.

I like your suggested tagging strategy. It sounds good to me!

When building images in CI, the Alpine version will be set by passing an ALPINE_VERSION argument via --build-arg. End users can do the same. What should the default ALPINE_VERSION in the Earthfile be? I still think that latest would be a good choice, as long as it is clearly documented in the README. Since we would always specify the version when building images for Docker Hub, we would never actually use the default. A significant benefit is that it would never need to be updated. Do you agree?

Providing updated images for the last three Alpine branches sounds good to me as well. I am still curious about how frequently relevant packages have upgrades available. Perhaps we can try it out and see if it is suitable or not.

TravisCardwell added a commit to TravisCardwell/ghc-musl that referenced this issue Apr 16, 2022
@TravisCardwell
Copy link
Collaborator Author

I just pushed a commit to my upgrade branch that implements the tag syntax that you suggested above.

For example, the following command builds image utdemir/ghc-musl:ghc9.2.2-alpine3.15-20220416.

$ earthly --allow-privileged --build-arg ALPINE_VERSION=3.15 +ghc9.2.2

The image is saved with the most specific tag (1). Do you know of a way to also tag this image with the mutable pointers (2 and 3) from within the Earthfile, or will this need to be done externally?

The date is formatted using the UTC time zone.

Since I named the ghc targets to match the tags, I renamed the targets to match the new tags. For example, ghc9.2.2 is now used instead of ghc922.

When using ALPINE_VERSION=latest, the tag ends up looking like utdemir/ghc-musl:ghc9.2.2-alpinelatest-20220416. This does not bother me. How about you?

The name:tag arguments to update-readme.sh are passed without change in this branch, but I would like to update how this is done if we decide to go ahead with the new tag format.

I am thinking about how to add support for building images for three Alpine versions as well as conditional building of images (only if there are upgrades available). I am pretty confident that I can do this within the Earthfile without much trouble, though it requires some refactoring. I will likely give it a try if there is a way to add tags to images within the Earthfile. If tagging must be done externally, however, it may be preferable to do this externally as well and keep the Earthfile simple...

@TravisCardwell
Copy link
Collaborator Author

By defining the DATE argument at the top level, the base-system layer is forced to update whenever that value changes. The cache is therefore only used for a maximum of one day. This is nice because we want to be sure to check for available upgrades. 😄

@TravisCardwell
Copy link
Collaborator Author

By the way, I am fine with setting the default ALPINE_VERSION to 3.15 if you prefer it.

@utdemir
Copy link
Owner

utdemir commented Apr 20, 2022

Quick update:

I'm going on a holiday for the next week, so I won't be able to look at this project (probably the next week too as I'll be busy catching up with other stuff).

But I am happy with your suggestions above,

Do you know of a way to also tag this image with the mutable pointers

I'm unsure :(. Multiple SAVE IMAGE instructions might work, but it'd be wasteful if it does a bunch of works other than tagging.

When using ALPINE_VERSION=latest, the tag ends up looking like utdemir/ghc-musl:ghc9.2.2-alpinelatest-20220416. This does not bother me. How about you?

Doesn't look super pretty, but I don't have any better alternative. Happy to leave it to :).

By the way, I am fine with setting the default ALPINE_VERSION to 3.15 if you prefer it.

As long as we also provide pushed images that are tied to a specific Alpine version, I'm happy with any default.


Regarding other Earthfile refactors or introducing another shellscript, I trust your judgement :). Earthly was mostly an experiment for me, and it would be okay for me even if we replace it with Dockerfile's and shell scripts.

When I'm away, please feel free to do any improvements to the codebase, and feel free to merge them to (but of course I'd be happy to review them). As I said, I'll be away next week, but after that we can cut a new release with the changes if we have them merged by then.

@TravisCardwell
Copy link
Collaborator Author

No problem, mate! Have a great holiday!

I started logging available upgrades to packages in our Alpine 3.15 images from the 18th. There are none so far. I will continue to log this, and we can use the results to decide how to do the CI.

Multiple SAVE IMAGE instructions might work, but it'd be wasteful if it does a bunch of works other than tagging.

Good idea! I will test that.

If I have the time, I will see if I can implement everything within the Earthfile. I can write scripts if it does not work well. I will likely not merge such changes until you can take a look at it. I am in no rush, so please do not worry about it while you are on your holiday and catching up after your return.

@TravisCardwell
Copy link
Collaborator Author

I logged available upgrades from April 16 until July 31, using Alpine 3.15 and GHC 9.2.2 throughout. Of the 107 days logged, 14 of them had upgrades. Here is a quick visualization:

ghc-musl-package-upgrades-20220731

@TravisCardwell
Copy link
Collaborator Author

TravisCardwell commented Jul 30, 2022

Multiple SAVE IMAGE instructions might work, but it'd be wasteful if it does a bunch of works other than tagging.

This works, but there is one strange artifact.

Here are the Earthfile commands:

SAVE IMAGE --push "${IMAGE_NAME}:ghc${GHC}-alpine${ALPINE_VERSION}-${DATE}"
SAVE IMAGE --push "${IMAGE_NAME}:ghc${GHC}-alpine${ALPINE_VERSION}"
SAVE IMAGE --push "${IMAGE_NAME}:ghc${GHC}"

For some reason the Docker image IDs are not all the same:

REPOSITORY                       TAG                              IMAGE ID       CREATED          SIZE
utdemir/ghc-musl                 ghc9.2.4                         a1d488614526   41 seconds ago   3.68GB
utdemir/ghc-musl                 ghc9.2.4-alpine3.16.1            a1d488614526   41 seconds ago   3.68GB
utdemir/ghc-musl                 ghc9.2.4-alpine3.16.1-20220730   f9b223601ff2   2 minutes ago    3.68GB

Inspecting, I see that some empty layers are added:

$ docker history utdemir/ghc-musl:ghc9.2.4-alpine3.16.1-20220730
IMAGE          CREATED         CREATED BY                                      SIZE      COMMENT
f9b223601ff2   3 minutes ago   mount / from exec /bin/sh -c ALPINE_VERSION=…   2.22GB    buildkit.exporter.image.v0
<missing>      4 minutes ago   mount / from exec /bin/sh -c ALPINE_VERSION=…   975MB     buildkit.exporter.image.v0
<missing>      4 minutes ago   mount / from exec /bin/sh -c ALPINE_VERSION=…   478MB     buildkit.exporter.image.v0
<missing>      2 hours ago     pulled from docker.io/library/alpine:3.16.1@…   5.52MB    buildkit.exporter.image.v0

$ docker history utdemir/ghc-musl:ghc9.2.4-alpine3.16.1
IMAGE          CREATED         CREATED BY                                      SIZE      COMMENT
a1d488614526   2 minutes ago   fileop target                                   0B        buildkit.exporter.image.v0
<missing>      2 minutes ago   fileop target                                   0B        buildkit.exporter.image.v0
<missing>      3 minutes ago   mount / from exec /bin/sh -c ALPINE_VERSION=…   2.22GB    buildkit.exporter.image.v0
<missing>      4 minutes ago   mount / from exec /bin/sh -c ALPINE_VERSION=…   975MB     buildkit.exporter.image.v0
<missing>      4 minutes ago   mount / from exec /bin/sh -c ALPINE_VERSION=…   478MB     buildkit.exporter.image.v0
<missing>      2 hours ago     pulled from docker.io/library/alpine:3.16.1@…   5.52MB    buildkit.exporter.image.v0

This is unfortunate, but I do not think it causes any problems.

I have only been able to test saving to multiple tags locally, but I doubt that there will be any issues when also pushing to Docker Hub.

@TravisCardwell
Copy link
Collaborator Author

I realized that there is another issue with this multi-tagging. If we want to provide images for a given version for GHC using multiple versions of Alpine, then building an image for an older version of Alpine would push the image with the GHC-only tag. This is not desired, as that tag should point to the image using the latest (supported) version of Alpine.

I attempted to fix this by adding new flag arguments to the image target:

ARG TAG_DATE=1
ARG TAG_ALPINE=0
ARG TAG_GHC=0

The idea was to allow use of command-line arguments to specify which tags are created/updated, as follows:

IF [ "$TAG_DATE" = "1" ]
  SAVE IMAGE --push "${IMAGE_NAME}:ghc${GHC}-alpine${ALPINE_VERSION}-${DATE}"
END
IF [ "$TAG_ALPINE" = "1" ]
  SAVE IMAGE --push "${IMAGE_NAME}:ghc${GHC}-alpine${ALPINE_VERSION}"
END
IF [ "$TAG_GHC" = "1" ]
  SAVE IMAGE --push "${IMAGE_NAME}:ghc${GHC}"
END

Unfortunately, this does not work! Earthly gives the following error message:

Error: build target: build main: failed to solve:
Earthfile line 103:2 apply BUILD +image: earthfile2llb for +image:
Earthfile line 95:2 no non-push commands allowed after a --push
in      +image --GHC=9.2.4
in      +ghc9.2.4

Conditional tagging of multiple images is not supported. If you have any ideas about how this issue could be resolved, please let me know. I will remove the GHC-only tag for now, until we can figure out a solution.

@TravisCardwell
Copy link
Collaborator Author

Conditional tagging of multiple images is not supported. If you have any ideas about how this issue could be resolved, please let me know. I will remove the GHC-only tag for now, until we can figure out a solution.

I thought of a solution while documenting the issue in a comment. Multiple conditional tags are not supported due to the above limitation, but we can have a single conditional tag as long as it comes first!

I updated the image target to have a single new argument flag:

ARG TAG_GHC=0

The images are now saved as follows:

IF [ "$TAG_GHC" = "1" ] 
  SAVE IMAGE --push "${IMAGE_NAME}:ghc${GHC}"
END
SAVE IMAGE --push "${IMAGE_NAME}:ghc${GHC}-alpine${ALPINE_VERSION}"
SAVE IMAGE --push "${IMAGE_NAME}:ghc${GHC}-alpine${ALPINE_VERSION}-${DATE}"

This works!

TravisCardwell added a commit to TravisCardwell/ghc-musl that referenced this issue Jul 31, 2022
When building an image, the Alpine package index is updated in order to
install various dependencies.  The versions installed depend on the
state of the package index at that time.  One therefore does not
necessarily get the same packages when building `ghc-musl` images at
different times; images are not reproducible in this way.  This commit
adds a command to upgrade any already-installed packages after the
package index is updated.  This ensures that the latest packages are
used, and that the already-installed packages are consistent with the
versions of newly added packages.
TravisCardwell added a commit to TravisCardwell/ghc-musl that referenced this issue Jul 31, 2022
Previously, each image was tagged like the following:

* `ghc-musl:v24-ghc924`

With this commit, each image is now tagged like the following:

* `ghc-musl:ghc9.2.4-alpine3.16.1-20220730`
* `ghc-musl:ghc9.2.4-alpine3.16.1`
* `ghc-musl:ghc9.2.4`

This allows users to specify image names according to their needs.  An
image tagged with the GHC version, Alpine version, and (UTC) build date
never changes.  An image tagged with just the GHC version and Alpine
version may be updated to include minor/security package releases, but
the fixed Alpine version means that major upgrades should not break
builds.  An image tagged with just the GHC version may be updated to
include new package releases, both major and minor.

The Earthly targets are renamed to match the new tags.  For example,
previous target `ghc924` is now named `ghc9.2.4`.

Note that updating the GHC-only tag should be done with caution, so that
it always points to an image using the latest (supported) Alpine
version.  To facilitate this, the `image` target has a `TAG_GHC`
argument that defaults to `0` so that the GHC-only tag is not saved by
default.  This argument should be set to `1` only when updating the
image using the latest (supported) Alpine version.  The following
command is an example of using this argument when building locally:

```
$ earthly --allow-privileged --build-arg TAG_GHC=1 +ghc9.2.4
```

The `ALPINE_VERSION` variable is still set to the latest (supported)
Alpine version, *not* set to `latest`.  Users who would like to use the
`latest` image are still able to do so by specifying it in a build
argument.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants