-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable always writing cache to support hermetic build systems #109
Comments
@wchargin this is an interesting topic thanks for bringing it up. In this example if we had a way to skip storing the cache unless the run was on master you could use the git commit as part of your key and get the desired behavior without writing a new cache for each run of a pull request. Do you think that would work for you? |
Summary: GitHub Actions is a new first-party CI service offered by GitHub. It requires no extra permissions. Its concurrency limits are appealing, at 20 workflows per repo (1 workflow ≈ 1 commit) and concurrent jobs ranging from 20 (free tier) to 180 (enterprise tier), with the option to run on your own servers if this isn’t enough. This commit adds a workflow definition for our CI. It’s similar to our existing Travis workflow, except that it only runs on Python 3.6 for now due to a bug in the Python 2.7 runtime that has been fixed on GitHub’s end but not yet deployed (see note inline). I also added a run of our self-diagnosis script for good measure. (The diagnosis script always exits successfully, and runs in about 4 seconds.) The high job concurrency limits let us save some time by running the lint steps in parallel and just once rather than sequentially and in every cell of the build matrix. The GitHub Actions VMs appear to have very little overhead: the entire elapsed time for the `lint-yaml` job is 12 seconds, of which 6 seconds is checking out the repo. Empirically, there is very little latency (order of seconds) between pushing a commit and seeing real work being done on the VMs. GitHub Actions offers caching. From what I can glean, each cache directory (e.g., “the Bazel cache” or “the Node cache”) is tarred and gzipped; each such archive must not exceed 400 MB. This is enough space to cache our `node_modules` and our Bazel state.\* But I haven’t done so, pending (a) clarity on the recommended way to cache Node modules and (b) better support for Bazel-style unicaches (see notes inline). Even without any caching, the total workflow time is still about the same as the best-case Travis build time because of the improved concurrency. \* Sometimes: in my tests, sometimes Bazel could be cached successfully, and other times it was well over the limit (595 MB out of 400 MB). I’m not quite sure what that’s about. [nm]: actions/cache#67 [bzl]: actions/cache#109 Test Plan: Note that this commit triggers a GitHub Actions workflow that succeeds. wchargin-branch: gh-actions
@chrispat: Yeah, that sounds reasonable! At a glance, I don’t see a way If I understand correctly, we’d still be proliferating caches with each |
For the record, cabal's Nix-style store/cache also falls into this category; see my comment at #38 (comment) |
@wchargin given the version of the sources is part of the bazel caching algorithm what key do you think should be used to prevent a huge number of updates? My assumption is travis is uploading new caches essentially every build if they are just looking at changes to the cache directory. |
Yes, Travis uploads new caches every build. And you’re right that this We do want to update the cache on every build, but it should be cheap to I see that |
For something like bazel I wonder if having a truly remote cache is actually a better option https://github.com/buchgr/bazel-remote. This is not something we are going to get around to implementing anytime soon but it is something we can consider for the future. The model we have for caching enables to user to control the key and also requires that all caches are immutable by key. While that is not ideal for all scenarios it does work generally well for a large number of different technology stacks and scenarios. This immutable nature make incremental update untenable and likely not possible. Even if we could incrementally update the cache the download on next run is going to have to be the entire cache as we have to provision a fresh VM for each job. |
I believe I have a similar use case to the issue described here, and ideally would like to see an A project of mine consists largely of C files, and naturally a significant portion of my CI cycle time is spent in compilation. To speed things up, I've employed ccache, which will opportunistically recycle previously built object files when it detects that the compilation would be the same for the current build. This has a dramatic performance improvement on CI times. In order to do this though, I need some persistence of storage between workflow runs in order to save and restore ccache's cache directory. Of course, as the code base evolves, the cache of object files will change too. I was pleased to discover actions/cache, as it fits my use case very nicely; but, I was surprised to find that when a cache hit occurs, actions/cache will not attempt to update the cache at all, and there's not an option to request such update. To work around this, I do the following:
It works like this: when the cache is loaded for a workflow, there will be an initial cache miss because the cache key contains the current commit sha. actions/cache will fall back to the most recently added cache via This solution seems to work very well for me, and hopefully this will be useful to others with a similar use case. Ideally though, I think actions/cache should just support updating the cache, to a new immutable revision perhaps--as I have done above. |
Having the caches be immutable makes a lot of sense. Immutable caches A truly remote cache is an appealing option, but comes with a lot more Downloading the full latest cache on each run may not be perfect, but |
@wchargin I agree that immutability is acceptable on the condition that we can restore and create a new cache as I have described above (though it can be quite wasteful as you mentioned). My guess is that this particular use case will be desirable by many projects. Perhaps the documentation could simply be updated to demonstrate this type of use case? To me, it wasn't immediately obvious. My suggestion would be to mention using |
Right; immutability is space-wasteful if the caches are stored |
I was thinking we would run that server on behalf of the user so the operational overhead should be essentially the same is it would be for the existing cache action. I am not 100% sure that is the best option but it seems like it might be a really good one for build systems that support it. |
Oh, that would be fantastic! —being able to just point Bazel to a remote |
Similar use case: |
Hi! I have the same issue I think with Composer saves every download it makes in cache and most of the time this cache is globally used. I've been using the github.sha in my cache keys since it allows me to re-save the cache and avoid the case where it hits the cache but new versions of dependencies exist and it's always downloading them since they're not in cache. |
Specifically for Bazel: the cache protocol is pretty simple. I wonder if it would be feasible to write a service that simply proxies to Github's own |
GitHub actions caching will never update the cache if there was a cache hit, but for bazel we want to do this since bazel guarantees hermetic builds and will update the cache if needed. See actions/cache#109 for more context. As such, we adjust the cache key logic to work better with github actions as per the documentation: https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#matching-a-cache-key Now we will always have a cache miss and load the latest restore key and then we will always upload to a new cache key. This adds a couple minutes for saving the cache, but building from scratch is 12+ min, so it is worth it.
Since this task is still open, What's the current best practice for bazel caching + github actions? Does someone have a snippet of their github workflow they can share? Update: Just sharing the CI pipeline yaml with caching that we went with, hopefully it'll help the next person who lands on this task. (Slightly more permissive approach to @nanddalal above) |
This issue is stale because it has been open for 200 days with no activity. Leave a comment to avoid closing this issue in 5 days. |
I still think this is useful to have and not close |
I think this will also help reduce the overall cost of compute resources on GitHub actions, as many open source projects can minimize the GitHub actions minutes they use for every run. |
So... If you're willing to use an a/b system, you could probably do something like: - uses: actions/cache/restore@v3
with:
key: preferred
restore-key: fallback
- run: do-work
- if: no-cache
uses: actions/cache/save@v3
with:
key: fallback
- if: no-cache
uses: actions/cache/save@v3
with:
key: preferred
- if: used-preferred-cache
uses: ./delete-cache
with:
key: fallback
- if: used-preferred-cache
uses: actions/cache/save@v3
with:
key: fallback
- if: used-fallback-cache
uses: actions/cache/save@v3
with:
key: preferred
- if: used-preferred-cache
uses: ./delete-cache
with:
key: preferred
- if: used-preferred-cache
uses: actions/cache/save@v3
with:
key: preferred Notes:
./delete-cache can be implemented using the APIs that were made available in circa June 27, 2022: |
This issue is stale because it has been open for 200 days with no activity. Leave a comment to avoid closing this issue in 5 days. |
Bots suck. |
I think this can be closed as it's now released in v4 |
I don't see how v4 changes anything. Either it was already possible (and I think my suggestions and others show that there are ways to do something) or it might still not be possible. If it's now possible as of v4, it'd be nice if someone put together an actual example of how to do it. |
My bad. v4 has a |
I mean, I'd probably just use an epoch time value with a fallback of none: key: cache-${{ steps.time.outputs.epoch }}
restore-keys: cache- That'd result in it always writing one. Older caches will get wiped out as they become least recently used. Sure, you pay a bit to store a duplicate of the cache (or you could use |
That excess space usage causes other caches to get dropped too. |
Then use If you're being really aggressive, you might be able to portion the cache into lots of pieces and have steps to calculate and retrieve/save them. There will be a trade-off between how many steps you need to run and how big your cache pieces are. |
This issue is stale because it has been open for 200 days with no activity. Leave a comment to avoid closing this issue in 5 days. |
This is still an issue worth resolving |
Why wasn't the stale label removed? |
I’d like to use
actions/cache
to cache my Bazel build state, whichincludes dependencies that have been fetched, binaries and generated
code that have been built, and results for tests that have run. Bazel is
a hermetic build system, so the standard Bazel pattern is to always use
a single cache. Bazel will take care of invalidation at a fine-grained
level: if you only change one source file, it will only re-build and
re-test targets that depend on that source file.
Thus, the pattern that makes sense to me for Bazel projects is to always
fetch the cache and always store the cache. We can always fetch the
cache by using a constant cache key, but then the cache will never be
stored. Bazel doesn’t have a single
package-lock.json
-style file thatcan be used as a cache key; it’s the combination of all build and source
files in the whole repository. We could key use the Git tree (or commit)
hash as a cache key, but this would lead to storing a mountain of
caches, too, which seems wasteful.
Ideally, the fetched cache would be taken from
origin/master
, butreally taking it from any recent commit should be fine, even if that
commit was in a broken or failing state.
On my repository, it takes 33 seconds to save the Bazel cache after a
successful job, but on a clean cache it takes 2 minutes to fetch remote
dependencies and 26 minutes to build all targets. I would be more than
happy to pay those 33 seconds every time if it would save half an hour
in the rest of the build!
For comparison, on Travis we achieve this by simply pointing to the
Bazel cache directory:
https://github.com/tensorflow/tensorboard/blob/1d1bd9a237fe23a3f2c31282ab44e7dfbcac717c/.travis.yml#L30-L32
The text was updated successfully, but these errors were encountered: