-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add update-env-variable to force/disable cache update. #498
base: main
Are you sure you want to change the base?
Conversation
90e11e8
to
bae9ed8
Compare
In short: Set an environment variable in your CI to force or disable the cache update. Here's a longer description: ProblemSay you want to cache a program called But, what if you have a lot of different programs with version numbers? You'd have to add them all to your cache key. So maybe your cache key will be: The problem is that you're doing a cache per version of each dependency. You could have multiple caches, like one for the g++ in windows32 and another for the g++ in windows64 etc. And then the same for the version of chrome, nodejs, etc. You would have N*M caches, for every combination of platform and software. What a hassle! What you really want is a cache per CI job. For example, one cache for "windows32", one for "windows64", one for "linux" and one for "macos". SolutionWhat this PR provides is simple: Programmatically decide, after fetching the cache, if it should be overridden or not. How it works:
If you don't set the env variable at all, the default behavior is maintained. And if you set it to Here's a sample CI yaml file with comments: - name: Cache .local
uses: eyal0/cache@update_env_variable
with:
path: ~/.local
key: my_cache_key
# Set the environment variable to watch to be named UPDATE_CACHE
update-env-variable: "UPDATE_CACHE"
- name: Install parallel
run: |
# Get the installed version number.
MY_VERSION=$(parallel --version | head -1 | cut -d' ' -f3)
# Download the webpage and get the latest version number.
LATEST_VERSION=$(curl -fsSL https://mirrors.tripadvisor.com/gnu/parallel/ | hxnormalize | pup 'a[href]' 'text{}' | grep -v latest | cut -d- -f2 | cut -d. -f1 | sort -r | head -1)
if [[ MY_VERSION != LATEST_VERSION ]]; then
# There was a cache hit but the version in the cache is old. Get the new one.
wget -q -T5 -t1 -O "parallel-latest.tar.bz2" "http://mirrors.tripadvisor.com/gnu/parallel/parallel-latest.tar.bz2"
mkdir parallel && pushd parallel && tar xjf "../parallel-latest.tar.bz2" && pushd parallel-* && ./configure --prefix=${LOCAL_INSTALL_PATH} && make && make install && popd && popd
# *****FORCE A CACHE UPDATE, DESPITE THE CACHE HIT FROM BEFORE.*****
echo "UPDATE_CACHE=true" >> $GITHUB_ENV That's it. Even though there was a cache hit, the cache has stale data and needs updating. By setting the environment variable, the final step that used to say:
will now instead say:
And if you don't use the environment variable, the default behavior is maintained. |
Maintainers, It's been three months without any feedback! This is a very requested feature. Please? |
Unfortunately this does not work for me (anymore?). It fails with the following error: |
@paresy Okay, I have a solution for it now. It's a bit of a hack. The idea is to take advantage of the "key" and "restore-keys". What I do is append a random number to the key and use a restore-key without the random number. The key will never match but the restore-key will match the most recent good key. Then I set the update variable to false and only set it to the true if the cache needs rewriting. This will cause the cache to only be updated if the one that was fetched, which is the latest one, is not new enough. You can see it demonstrated below. I create a random number, append it to the key that I'm looking for but also have a restore key without that random number.
You could use the output of You could do all this without my patch and it would work but then it would update every single time, which would unnecessarily thrash the cache and slow down your CI. It would increase the number of times that your CI needs to start over because of cache eviction. So this solution is still better than the default behavior. |
@eyal0 Thanks! Unfortunately my use case is ccache for C++ which needs to be written every time. And i have several actions for different platforms. Due to the 5 GB limitation the cache is getting "cleaned" up for the other platforms if i do not run the actions equally. Therefore i really need to update the exact cache name :-) |
@paresy That will be impossible, I think, and it's a limitation of the server (on Microsoft Azure) which GitHub uses to store the cached files. That's why I suggested that maybe the cache server has a command to delete entries. That would be great! We don't have details on the functioning of the server. Maybe the source code for the server is actually on GitHub somewhere and we could examine it? Some of the keywords that the cache server uses are: Here's the result of a search: https://github.com/search?q=commitCache+getCacheEntry&type=code Maybe something in one of those? I see clients but we need the server. |
This is exactly what I'm looking for. Would be great to have this merged. 🙏 |
I see plenty of commits to this repo but no attention to this issue so you probably should not hold your breath! :-( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for submitting this PR! As has been discussed above, caches are immutable and there's no API to delete caches (yet!). Thus, as already observed, a unique key must be provided each run in order to "update" the cache, either by appending a random number, timestamp, etc.
Given that a key must be crafted in a special way to force the cache update, I almost wonder if the changes proposed in #489 is sufficient to achieve the desired functionality here. That PR is adding an environment variable, CACHE_SKIP_SAVE
, to skip the save step.
So I think the functionality you're looking for can be recreated using something like:
- name: Get current time
run: echo "TIMESTAMP=$(date +%s)" >> $GITHUB_ENV
- name: Cache dependencies
uses: actions/cache@v2
with:
path: foo
key: ${{ runner.os }}-foo-${{ env.TIMESTAMP }}
restore-keys: |
${{ runner.os }}-foo-
- name: Install dependencies
run: |
MY_VERSION=...
LATEST_VERSION=...
if [[ MY_VERSION != LATEST_VERSION ]]; then
# Version changed! Since a unique cache key is used each run, this will create a new cache.
... install new version ...
else
# Version unchanged. Skip saving cache.
echo "SKIP_CACHE_SAVE=true" >> $GITHUB_ENV
fi
Would this work? I ask because the other PR is close to being merged and it would be great if it solved these requirements as well. If not, please let me know!
@@ -14,6 +14,9 @@ inputs: | |||
upload-chunk-size: | |||
description: 'The chunk size used to split up large files during upload, in bytes' | |||
required: false | |||
update-env-variable: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest changing the name of this input and updating the description. Given that caches are immutable, this could give the impression that setting this input is sufficient to update the cache, whereas a unique key must also be provided.
src/save.ts
Outdated
@@ -29,7 +29,35 @@ async function run(): Promise<void> { | |||
return; | |||
} | |||
|
|||
if (utils.isExactKeyMatch(primaryKey, state)) { | |||
const envVarSet = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand it, this is the current behavior:
Inputs.UpdateEnvVariable |
process.env[core.getInput(Inputs.UpdateEnvVariable)] |
Result |
---|---|---|
Unset | Unset | Default Cache Behavior |
Set | Unset or empty string | Default Cache Behavior |
Set | true or yes |
Update Cache |
Set | Any non-empty string that is not true or yes |
Skip Saving Cache |
In the second case, where the input is set but the env var hasn't been set, it falls back to the default cache behavior. Is that expected? Just want to confirm...as I could also see that falling into the "Skip Saving Cache" result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The table that you have written is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
environment variables don't allow for multiple, independently controlled choice of update.
e.g., two actions/cache -- one for tools, one with results of using those tools ... environment variable would affect both ... which is NOT a great solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should that be true? You can use two different environment variables and they will function independently. Eg, TOOLS_UPDATE
and RESULTS_UPDATE
.
@dhadka I agree that, in light of being unable to overwrite existing cache entries, #489 is a better idea. But I have a concern with #489: It doesn't allow you to use two caches. Also, being unable to overwrite an existing cache means that neither this PR nor #489 address #498 (comment) . I think that the only solution to that would be to modify the server to allow overwriting caches, which I think would be a good feature anyway but I don't know if you are able to fix that. |
@eyal0 Good point. We would need different env vars for each cache step. My main goal is we have a few PRs all with similar requests:
So it would be 👍 if we can find some common denominator to avoid adding a bunch of similar or overlapping env vars / inputs for each special case. |
Hmm...so something else we can consider. The inputs are re-evaluated on both the restore and save steps. For example:
When the post step runs, the env var
Proof of concept - https://github.com/dhadka/test-env/runs/2403284954?check_suite_focus=true |
Yeah, that's interesting. I definitely won't have have expected that late execution. That makes it pretty easy to use different environment variables for each thing. I don't really follow how the proof of concept that you sent correlates to the problem at hand, though. I think that the most convincing would be to have an example that uses two caches and to show how each one can have a different result: One saving, one not. #489 is nice in that the code is really, really short but I think that it doesn't provide much output so it could be potentially difficult for users to see whether or not the cache saving was skipped and for what reason. Anyway, assuming that the concept that you are proving works as expected, how do we incorporate that into something like #489? |
I don't see how it'll work like you've suggested. Those input variables aren't available to the post option. https://github.com/eyal0/test-cache/runs/2405477347?check_suite_focus=true Nowhere to be found are |
The skip-save option allows users to force the cache to skip saving the cache. skip-save can be set to a constant or to an expression that will be evaluated at the end of the CI job, such as an environment variable. This is for actions#498
The skip-save option allows users to force the cache to skip saving the cache. skip-save can be set to a constant or to an expression that will be evaluated at the end of the CI job, such as an environment variable. This is for actions#498
In order to understand this logic look at: actions/cache#498 (comment)
Hi @dhadka! I read all PRs you mentioned here and I the see the next solution(s) to all these PRs 1st one is temporary and it means adding one more input with the name
this should cover most of the cases people want to solve in their workflows As for the 2nd one, I would call it more permanent, although it brings more flexibility and consequently breaking changes. And efforts of course. If you read carefully what the author of #474 ended with is separate actions for restoring cache and saving it. From what I recall there is another issue with exactly the same suggestion. this will allow
Edit: add one more value to 1st solution |
b1ef024
to
6171549
Compare
6ea62e7
to
5146ada
Compare
5146ada
to
3d41dc5
Compare
Any updates on this ? I think everyone expects a |
This fixes #342
It's a more flexible version of #353 because it allows the CI to force or disable a cache update programmatically, instead of just setting it to a constant.