Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-detecting latest tool versions #5

Closed
thomcc opened this issue Apr 4, 2022 · 5 comments · Fixed by #27
Closed

Auto-detecting latest tool versions #5

thomcc opened this issue Apr 4, 2022 · 5 comments · Fixed by #27
Labels
enhancement New feature or request

Comments

@thomcc
Copy link
Contributor

thomcc commented Apr 4, 2022

It seems like the list of versions is manually maintained. This seems hard to scale and already is at the point where it would be hard to blame you for failing to notice that a release happened.

I'm hoping to use install-action to replace some similar code (in an unreleased project). However, remembering to update to the latest version seemed impractical to me, so my code auto-detected it using the github API + jq (jq is pre-installed on GHA runners). The code was something like this:

api_url="https://api.github.com/repos/$repo/releases/latest"
query='.assets[] | select(.name | endswith("x86_64-unknown-linux-gnu.tar.gz")) | .browser_download_url'

tarball_url="$(curl --proto '=https' --tlsv1.2 --retry 5 --retry-delay 10 -sfSL "$api_url" | jq -r "$query")"
# then fetch the tarball was fetched with 
# curl <same flags> "$tarball_url" | tar zxf -

This hard-codes several things which would need to not be hard-coded for install-action usage, but it may be useful as a starting point. (Also, I'm not good at sh, so this may do something that is not ideal in some way). But you get the picture.

Although, maybe it's not how you'd like to do it. There's an argument that auto-detecting the latest version in this way is fragile, both because it makes too many API requests, and because it breaks if a release happens without binaries. It also doesn't easily support usage with a checksum as required by #1.

So perhaps it would be better to instead automate the updates of install-action's version list in some manner. This seems harder to me though, but perhaps you have plans.

Anyway, sorry if my concern is misplaced and you do not think this will be a problem. I just felt a little bit bad adding to your workload by adding a new tool after I noticed that versions were hard-coded.

Thank you.

@taiki-e
Copy link
Owner

taiki-e commented Apr 4, 2022

Thanks for writing this! I strongly agree that auto-detection of the latest version is necessary.
(Currently, I track the latest releases by watching the releases of these repositories, but I acknowledge that it is an easy-to-forget and non-scalable way.)

There's an argument that auto-detecting the latest version in this way is fragile, both because it makes too many API requests, and because it breaks if a release happens without binaries.

Yes. In the past, this action used the similar way that you mentioned (56a0328), but I have encountered both of these problems.

So perhaps it would be better to instead automate the updates of install-action's version list in some manner. This seems harder to me though, but perhaps you have plans.

The repository that manages homebrew tap for my project has a script that automates this (and CI to auto-create PR when the update is needed), so I'm considering porting it.

@taiki-e taiki-e added the enhancement New feature or request label Apr 4, 2022
@thomcc
Copy link
Contributor Author

thomcc commented Apr 4, 2022

Currently, I track the latest releases by watching the releases of these repositories, but I acknowledge that it is an easy-to-forget and non-scalable way

Yeah, I figured as much (this is also how I'd do it too).

Yes. In the past, this action used the similar way that you mentioned (56a0328), but I have encountered both of these problems.

That makes sense, I had kinda assumed the window between the release creation and binary upload, but figured it would be too small for that to happen in practice, but given how frequently CI runs happen (especially as the number of all projects using the action increases), I'm not sure why I thought this.

The repository that manages homebrew tap for my project has a script that automates this (and CI to auto-create PR when the update is needed), so I'm considering porting it.

Ah, this is a better solution I think. Then you'd just need to run the CI that does the version check/PR on a cron. Or something like that, anyway. For some reason, I thought making automated PRs was somewhat more involved, and couldn't just be done from an action.

@thomcc
Copy link
Contributor Author

thomcc commented Apr 4, 2022

Rambling about prebuilt binaries and version update automation

Uh, I apologize in advance for this rambling mess. It started out as a reply to your comment in #4, but it... got out of hand. Much of this is me working it out for myself, and has not been really refined, although I think it's coherent? Maybe?

The basic idea is that you can get the benefits of them with less work by just putting a little more smarts in the hypothetical "version update detector / PR maker" code, also, note that there's a summary at the end.

It's hard to say what to do about prebuilt binaries. I think it would allow avoiding the version compat issues that mdbook has and would allow simplifying the main.sh quite a bit, but... it comes with a lot of pain. I suspect the lack of consistency for release formats, archive names, etc. is generally less difficult to address than inconsistencies between ecosystems, build systems, etc. (see the end for more here)

That is, IME there's really no limit to how annoying an arbitrary project can be to build (although it's not unthinkable to do this for stuff like rust, go, zig, <insert other modern portable compiled language here...>). To put it another way, I think it basically means you're taking on basically taking on a subset of the roles that packagers1 are responsible for... which gets a lot harder when you start expanding focus to other build systems and ecosystems.

That said, it might be worth it sometimes, but probably only for packages without prebuilt binaries, and which have a sufficient ratio between usefulness and build complexity to justify the hassle.

Handling Unsupported Github Runner OS Versions

For stuff that has binaries like this, I think I'd want to handle it differently. Basically, I'd want to teach the install-action script about which github runner OS versions are supported by which packages.

This is a big pain to do manually, but if updates are automated, I think it's a lot easier -- the bot that does the update PR can also try on many different runners, see which ones are compatible, and automatically update information the script has about it.

Then, if a user tries to use the package on a runner that is not supported the script can produce a nicer error2. For example, IMO an error like "No prebuilt binary available for mdbook that supports ubuntu-18.04. Please use ubuntu-20.04 or later" (wording can be bikeshedded ofc) would make it very clear how to address the problem as a user. Most of the time I think this would be actionable3.

A benefit of this approach is that it is still useful even if if it turns out you want to manage the binaries yourself. It's very likely that the binaries would still have some runners they don't work under, and this would help in that case.

Handling inconsistent URL/Tag/release formats

So, let's ignore OS/version compat issues like the one with mdbook, though, and just consider "the lack of uniform formats for URLs, archive names, ...". and try to solve that without performing the build ourselves. I think this is possible.

The thought is: use the code that automatically performs PRs for new releases (which I'm going to refer to as "the pr maker"4), and have it also be responsible for normalizing these differences and re-publishing the normalized version somewhere using a normalized/consistent5 scheme for tagging, versions, compression, naming, etc.

Now, this would just be moving complexity from the install-action/main.sh script into the code for whatever makes the PRs on new releases, so is this better than just keeping that complexity in install-action/main.sh?

There are a couple reasons I see, but some of them may not be convincing for you, IDK.

  1. (debatable) The pr maker has looser constraints around OS/version compat, can be written in any language (even one which is slow, or which has long compile times). This means it could be in a language that has better abstractions than sh. Maybe this is just my lack of skill at writing sh, so this one might be unconvincing.

  2. Similarly, the PR maker can pretty much use whatever software it needs, even if that software would be impractically slow to build/use (either slower than just doing cargo install foobar, or close enough to not be worth using the action).

  3. Most importantly (I think), the pr maker does not need to handle schemes which were used for past releases, but will not be used by any future releases.

EDIT: Note that a big downside of this approach is that it makes #5 more important (maybe much more), since now the request is made to an endpoint which is controlled by you, and not the original authors.

A Hypothetical Case

To elaborate on the previous benefits, lets consider a hypothetical package foosmtime has decided to switch its prereleased packages from using .tar.xz to .tar.zstd in version 1.0.0, and evaluate the two6 following approaches:

  1. The "smart main.sh" approach, which is a lot like what you have now, where the main.sh (or more broadly, the code which runs as part of the action) has all the smarts for handling each version of each package.

    In this world, the PR maker is largely irrelevant aside from being the thing that makes PRs.

  2. The "smart pr maker" approach, where all the complexity lives in the PR maker, which rebundles and rehosts every package to normalize tagging scheme/compression format/filenames/download url/etc.

    In this world, the main.sh code would be very very simple, and would handle basically every package identically.

So, the problem: foosmtime decided to switch from .tar.xz to .tar.zstd in version 1.0.0. How is this handled?

The "smart main.sh" approach:

  1. First off, it leaves users who are on "latest" (the default) with broken CI until the install-action code is taught how to handle this change.

  2. Then, it needs keep both the new .tar.zstd-handling code and the old .tar.xz-handling code. It also needs logic to determine which of these to use, based on what version is requested.

    Alternatively, the old code can be removed, but this will break anybody who explicitly requrests an older tool, unless they also have pinned install-action.

  3. If zstd is not supported on the github action runner (it is), some other way of decoding it must be found. This may require pulling down another binary, which is unfortunate.

    That said, this case is unlikely, and as mentioned would not happen with zstd. I mention it mostly because the other approach handles it trivially.

The "smart pr maker" approach:

  1. First off, most users will be unaware the update occurred. Critically: users who are on the "latest" version are not broken, and are instead left on the previous version until the PR maker is taught how to handle the new format, and the update to install-action occurs.

  2. Then, the patch that updates the pr-maker code can immediately remove handling of tar.xz at the same time as it adds code to handle tar.zstd.

    Critically, removing this will not break anyone, even if they're using an explicit version: all the releases which used tar.xz have already been normalized and reuploaded, and all the releases in the future will use tar.zstd.

  3. If zstd is not supported on the github action runner, we can just add an external package, even if it has an annoying and slow installation process, or any other constraint that would be undesirable in the actual install-action github action.

    (This is because the PR maker does not have strict performance constrants, beyond something like "run once per day")

That is to say, this approach seems much better.


Additionally, the flexability this approach has to install whatever software is needed (part 3) means that it's basically perfectly set up to perform the builds too, should that end up being necessary for some of the packages. It just is also able to get that same benefit without taking on that complexity in cases where the authors of the tool in question provide usable binaries themselves.

Which I suppose shouldn't be suprising, since this is just "do all the thing's you'd do if you did the build yourself, but reuse the prebuilt versions"...

Summary

Okay, that was honestly... quite rambling. I'm genuinely sorry. Most of it was talking through it for myself. Obviously this is your project, so you can take any approach you like.

Anyway, I guess the TLDR boils down to: IMO doing all the builds yourself sounds deceptively hard, but you can get a lot of the same benefits by sticking smarts into the tool that does version updates/PRs. Specifically:

  1. The release updater/PR maker could detect which github runner OSes the tool supports, and provide that info in a way the install-action action can understand. Then, users will get a much better error than they'd get from a glibc version mismatch. IMO this could be made into a clear enough error message that the problem becomes fairly unimportant.

  2. If you ignore the GHA runner OS support question (addressed by error messages IMO), and only consider tools which provide prebuilt binaries (for now) then you can handle the inconsistent release file/url/tag/formats by just reuploading them yourself to a new location in a known consistent way, while normalizing all that stuff.

    Which is not trivial (but still strictly less than performing the builds yourslef) has a surprising number of benefits that I hadn't considered.

Footnotes

  1. OTOH, I suppose it already is one, just with a very different focus than most.

  2. If this route is taken, it might be worth allowing the user to override this and try running it anyway, just in case install-action gets it wrong. Or not.

  3. Concretely, the actions they'd take is either change the runner os version or install via some other method, whereas if you get an error from the runtime binary loader, I imagine many users have no idea what to do, since that error is 100% jargon.

  4. Yes, even though I'm all these responsibilities it might have in addition to making PRs about new releases.

  5. I'm imagining it could ensure that every package is at https://github.com/${some_repo}/releases/download/${tool}_${version}/${target}.tar.gz, or something like this. Where ${some_repo} could be something like taiki-e/install-action-binaries or whatever.

    (That is is: the repo would be some new repo for this purpose -- since I suspect adding these release tags to the install-action repo would cause some confusion)

  6. I'm leaving out the third approach of "do the build yourself", since it's a natural extension of approach 2, and is otherwise irrelevant for this problem (since you wouldn't use the prebuilt binaries, although you may have similar issues if you did the build yourself when some package overhauls the build system between two versions).

@NobodyXu
Copy link
Collaborator

FYI, cargo-binstall actually supports automatically detecting the latest version.

@taiki-e
Copy link
Owner

taiki-e commented Dec 25, 2022

This has been implemented in v2.0.0.

The implementation (#27) is basically an extension of the one described in #5 (comment). The basic design/policy is something like "all the annoying stuff like gaps between tools is handled by the manifest generator" -- this allowed us to remove almost all package-specific code from main.sh.

As for the host OS version, mdbook has started distributing musl binaries, so it is not high on my priority list, but I think @thomcc's idea (better error messages) makes sense (thanks for the idea!), so I will try it.

Btw, the implementation I currently think of is as follows:

  • manifest generator: Extract binary from the archive, get the glibc version requirements (probably by objdump), and include it in the manifest.
  • main.sh: Check the host glibc version and the binary's glibc requirement specified in the manifest, and emit an error or warning if the binary's glibc requirement is high. If the host is Ubuntu LTS, I think we could also suggest a specific version that meets the requirements.

For the latter, bbbd8c8 069858b that was created for another purpose has a prototype of the code that handles the glibc version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants