Add update-env-variable to force/disable cache update. #498

eyal0 · 2021-01-04T05:59:25Z

This fixes #342

It's a more flexible version of #353 because it allows the CI to force or disable a cache update programmatically, instead of just setting it to a constant.

eyal0 · 2021-01-04T07:21:25Z

In short: Set an environment variable in your CI to force or disable the cache update.

Here's a longer description:

Problem

Say you want to cache a program called parallel so that you don't have to build it all the time. But you want to force a rebuild if the version that is online is different from the version that is in the cache. What you have to do is check what the latest version of the program is online, then put that number in the cache key, and then use the cache.

But, what if you have a lot of different programs with version numbers? You'd have to add them all to your cache key. So maybe your cache key will be: ubuntu-g++8-glib2-chrome46-nodejs15. And the order in which you put them in the cache key matters! For example, what if the latest version of chrome is now 47? You'll look for ubuntu-g++8-glib2-chrome47-nodejs15 and you won't get a cache hit. You could use the restore-key feature and match a prefix, like ubuntu-g++8-glib2-. But maybe that'll match some recent build that was on an older version of nodejs. Now you'll have to rebuild that, too.

The problem is that you're doing a cache per version of each dependency. You could have multiple caches, like one for the g++ in windows32 and another for the g++ in windows64 etc. And then the same for the version of chrome, nodejs, etc. You would have N*M caches, for every combination of platform and software.

What a hassle! What you really want is a cache per CI job. For example, one cache for "windows32", one for "windows64", one for "linux" and one for "macos".

Solution

What this PR provides is simple: Programmatically decide, after fetching the cache, if it should be overridden or not.

How it works:

Designate an environment variable name, such as UPDATE_CACHE.
Fetch your cache like usual.
During your CI steps, if you find that the cache needs to be overridden, export UPDATE_CACHE=true.
That's it!

If you don't set the env variable at all, the default behavior is maintained. And if you set it to false, then the cache update is disabled, even if it should have occured.

Here's a sample CI yaml file with comments:

    - name: Cache .local
      uses: eyal0/cache@update_env_variable
      with:
        path: ~/.local
        key: my_cache_key
        # Set the environment variable to watch to be named UPDATE_CACHE
        update-env-variable: "UPDATE_CACHE"
    - name: Install parallel
      run: |
        # Get the installed version number.
        MY_VERSION=$(parallel --version | head -1 | cut -d' ' -f3)
        # Download the webpage and get the latest version number.
        LATEST_VERSION=$(curl -fsSL https://mirrors.tripadvisor.com/gnu/parallel/ | hxnormalize | pup 'a[href]' 'text{}' | grep -v latest | cut -d- -f2 | cut -d. -f1 | sort -r | head -1)
        if [[ MY_VERSION != LATEST_VERSION ]]; then
          # There was a cache hit but the version in the cache is old.  Get the new one.
          wget -q -T5 -t1 -O "parallel-latest.tar.bz2" "http://mirrors.tripadvisor.com/gnu/parallel/parallel-latest.tar.bz2"
          mkdir parallel && pushd parallel && tar xjf "../parallel-latest.tar.bz2" && pushd parallel-* && ./configure --prefix=${LOCAL_INSTALL_PATH} && make && make install && popd && popd
          # *****FORCE A CACHE UPDATE, DESPITE THE CACHE HIT FROM BEFORE.*****
          echo "UPDATE_CACHE=true" >> $GITHUB_ENV

That's it. Even though there was a cache hit, the cache has stale data and needs updating. By setting the environment variable, the final step that used to say:

Cache hit occurred on the primary key my_cache_key, not saving cache.

will now instead say:

Cache saving was forced by setting UPDATE_CACHE to true.
/bin/tar --posix --use-compress-program zstd -T0 -cf cache.tzst -P -C /home/runner/work/user/repo --files-from manifest.txt
Cache saved successfully

And if you don't use the environment variable, the default behavior is maintained.

ollydev · 2021-03-10T15:29:59Z

Maintainers, It's been three months without any feedback! This is a very requested feature. Please?

paresy · 2021-03-15T09:43:55Z

Unfortunately this does not work for me (anymore?). It fails with the following error: Unable to reserve cache with key build-ubuntu-ccache, another job may be creating this cache. I think the problem is already documented here: actions/toolkit#505

eyal0 · 2021-03-25T19:06:57Z

@paresy Okay, I have a solution for it now. It's a bit of a hack. The idea is to take advantage of the "key" and "restore-keys". What I do is append a random number to the key and use a restore-key without the random number. The key will never match but the restore-key will match the most recent good key. Then I set the update variable to false and only set it to the true if the cache needs rewriting. This will cause the cache to only be updated if the one that was fetched, which is the latest one, is not new enough.

You can see it demonstrated below. I create a random number, append it to the key that I'm looking for but also have a restore key without that random number.

    - name: Get a random number
      run: echo "RANDOM_SUFFIX=${RANDOM}${RANDOM}" >> $GITHUB_ENV
    - name: Cache local install path
      uses: eyal0/cache@main
      with:
        path: ${{ steps.sanitize-key.outputs.path }}
        key: ${{ steps.sanitize-key.outputs.key }}-${{ env.RANDOM_SUFFIX }}
        restore-keys: |
          ${{ steps.sanitize-key.outputs.key }}-
        update-env-variable: "UPDATE_CACHE"
    - name: Default don't update cache
      run: echo "UPDATE_CACHE=false" >> $GITHUB_ENV
    - name: Install valgrind
      run: |
        if ! valgrind --version || ! grep -qx "$(git ls-remote https://github.com/eyal0/valgrind.git master)" ~/.local/bin/valgrind.version; then
          git ls-remote https://github.com/eyal0/valgrind.git master > ~/.local/bin/valgrind.version
          echo "UPDATE_CACHE=true" >> $GITHUB_ENV
          <DO YOUR BUILD OF VALGRIND HERE>
        fi

You could use the output of date instead of a random number if you want, it doesn't matter, so long as you never collide. I would recommend against the github run_id because those get re-used if you re-run the CI.

You could do all this without my patch and it would work but then it would update every single time, which would unnecessarily thrash the cache and slow down your CI. It would increase the number of times that your CI needs to start over because of cache eviction. So this solution is still better than the default behavior.

paresy · 2021-03-26T08:39:28Z

@eyal0 Thanks! Unfortunately my use case is ccache for C++ which needs to be written every time. And i have several actions for different platforms. Due to the 5 GB limitation the cache is getting "cleaned" up for the other platforms if i do not run the actions equally. Therefore i really need to update the exact cache name :-)

eyal0 · 2021-03-26T16:08:01Z

@paresy That will be impossible, I think, and it's a limitation of the server (on Microsoft Azure) which GitHub uses to store the cached files. That's why I suggested that maybe the cache server has a command to delete entries. That would be great!

We don't have details on the functioning of the server. Maybe the source code for the server is actually on GitHub somewhere and we could examine it? Some of the keywords that the cache server uses are: getCacheEntry and reserveCache and commitCache. So it's possible that there is a GitHub repo out there that includes those words that we could look at.

Here's the result of a search:

https://github.com/search?q=commitCache+getCacheEntry&type=code

Maybe something in one of those? I see clients but we need the server.

ruudk · 2021-04-19T17:02:59Z

This is exactly what I'm looking for. Would be great to have this merged. 🙏

eyal0 · 2021-04-19T19:02:46Z

This is exactly what I'm looking for. Would be great to have this merged.

I see plenty of commits to this repo but no attention to this issue so you probably should not hold your breath! :-(

dhadka

Thank you for submitting this PR! As has been discussed above, caches are immutable and there's no API to delete caches (yet!). Thus, as already observed, a unique key must be provided each run in order to "update" the cache, either by appending a random number, timestamp, etc.

Given that a key must be crafted in a special way to force the cache update, I almost wonder if the changes proposed in #489 is sufficient to achieve the desired functionality here. That PR is adding an environment variable, CACHE_SKIP_SAVE, to skip the save step.

So I think the functionality you're looking for can be recreated using something like:

      - name: Get current time
        run: echo "TIMESTAMP=$(date +%s)" >> $GITHUB_ENV
        
      - name: Cache dependencies
        uses: actions/cache@v2
        with:
          path: foo
          key: ${{ runner.os }}-foo-${{ env.TIMESTAMP }}
          restore-keys: |
            ${{ runner.os }}-foo-
            
      - name: Install dependencies
        run: |
          MY_VERSION=...
          LATEST_VERSION=...
          
          if [[ MY_VERSION != LATEST_VERSION ]]; then
            # Version changed! Since a unique cache key is used each run, this will create a new cache.
            ... install new version ...
          else
            # Version unchanged. Skip saving cache.
            echo "SKIP_CACHE_SAVE=true" >> $GITHUB_ENV
          fi

Would this work? I ask because the other PR is close to being merged and it would be great if it solved these requirements as well. If not, please let me know!

dhadka · 2021-04-20T01:51:18Z

action.yml

@@ -14,6 +14,9 @@ inputs:
  upload-chunk-size:
    description: 'The chunk size used to split up large files during upload, in bytes'
    required: false
+  update-env-variable:


Suggest changing the name of this input and updating the description. Given that caches are immutable, this could give the impression that setting this input is sufficient to update the cache, whereas a unique key must also be provided.

dhadka · 2021-04-20T02:01:26Z

src/save.ts

@@ -29,7 +29,35 @@ async function run(): Promise<void> {
            return;
        }

-        if (utils.isExactKeyMatch(primaryKey, state)) {
+        const envVarSet =


As I understand it, this is the current behavior:

Inputs.UpdateEnvVariable process.env[core.getInput(Inputs.UpdateEnvVariable)] Result

Unset Unset Default Cache Behavior

Set Unset or empty string Default Cache Behavior

Set true or yes Update Cache

Set Any non-empty string that is not true or yes Skip Saving Cache

In the second case, where the input is set but the env var hasn't been set, it falls back to the default cache behavior. Is that expected? Just want to confirm...as I could also see that falling into the "Skip Saving Cache" result.

The table that you have written is correct.

environment variables don't allow for multiple, independently controlled choice of update.

e.g., two actions/cache -- one for tools, one with results of using those tools ... environment variable would affect both ... which is NOT a great solution.

Why should that be true? You can use two different environment variables and they will function independently. Eg, TOOLS_UPDATE and RESULTS_UPDATE.

eyal0 · 2021-04-20T16:39:15Z

@dhadka I agree that, in light of being unable to overwrite existing cache entries, #489 is a better idea. But I have a concern with #489: It doesn't allow you to use two caches. CACHE_SKIP_SAVE is essentially a global variable. If you want to have multiple caches in a single job then you're in trouble. It would be possible to modify #489 to add a parameter for naming the environment variable, where the default is CACHE_SKIP_SAVE. So it would work just like the original for those that want the default environment variable name and for those that want a different name, for example in the case of two caches, they could have that, too. If I wrote that code, would you be interested in it?

Also, being unable to overwrite an existing cache means that neither this PR nor #489 address #498 (comment) . I think that the only solution to that would be to modify the server to allow overwriting caches, which I think would be a good feature anyway but I don't know if you are able to fix that.

dhadka · 2021-04-21T18:10:36Z

@eyal0 Good point. We would need different env vars for each cache step.

My main goal is we have a few PRs all with similar requests:

Set env var to skip saving the cache - Allow to skip the save post-step #489
Make cache read-only - Add read-only feature #474
Configurable save on failure - Configurable save cache on failure #92
And this PR to add an env var to to force an update or disable update

So it would be 👍 if we can find some common denominator to avoid adding a bunch of similar or overlapping env vars / inputs for each special case.

dhadka · 2021-04-21T18:36:31Z

Hmm...so something else we can consider. The inputs are re-evaluated on both the restore and save steps. For example:

      - uses: actions/cache@v2
        with:
          path: foo
          key: ${{ runner.os }}-foo
          skip-save: ${{ env.SKIP_SAVE }}
      - run: echo "SKIP_SAVE=true" >> $GITHUB_ENV

When the post step runs, the env var SKIP_SAVE is set to true and the input will get that value as well. Someone who wants a read-only cache can then just do:

      - uses: actions/cache@v2
        with:
          path: foo
          key: ${{ runner.os }}-foo
          skip-save: true

Proof of concept - https://github.com/dhadka/test-env/runs/2403284954?check_suite_focus=true

eyal0 · 2021-04-21T23:13:33Z

Yeah, that's interesting. I definitely won't have have expected that late execution. That makes it pretty easy to use different environment variables for each thing.

I don't really follow how the proof of concept that you sent correlates to the problem at hand, though. I think that the most convincing would be to have an example that uses two caches and to show how each one can have a different result: One saving, one not.

#489 is nice in that the code is really, really short but I think that it doesn't provide much output so it could be potentially difficult for users to see whether or not the cache saving was skipped and for what reason.

Anyway, assuming that the concept that you are proving works as expected, how do we incorporate that into something like #489?

eyal0 · 2021-04-22T00:05:41Z

I don't see how it'll work like you've suggested. Those input variables aren't available to the post option.

https://github.com/eyal0/test-cache/runs/2405477347?check_suite_focus=true

Nowhere to be found are skip-save, path, key, restore-keys. Perhaps they are only available to the code that is run by the post but not to the post-if. I looked in the entire github object and the env object.

The skip-save option allows users to force the cache to skip saving the cache. skip-save can be set to a constant or to an expression that will be evaluated at the end of the CI job, such as an environment variable. This is for actions#498

In order to understand this logic look at: actions/cache#498 (comment)

ZuBB · 2022-08-03T09:00:02Z

@eyal0 Good point. We would need different env vars for each cache step.

My main goal is we have a few PRs all with similar requests:

Set env var to skip saving the cache - Allow to skip the save post-step #489

Make cache read-only - Add read-only feature #474

Configurable save on failure - Configurable save cache on failure #92

And this PR to add an env var to to force an update or disable update

So it would be 👍 if we can find some common denominator to avoid adding a bunch of similar or overlapping env vars/inputs for each special case.

Hi @dhadka!

I read all PRs you mentioned here and I the see the next solution(s) to all these PRs

1st one is temporary and it means adding one more input with the name post-action (just as an example) with the next values

on_miss - create cache only when we had any of 'key miss' during restore. this is the default;
on_explicit_miss - create cache when we had a miss by key but not by restore-keys
always - always create cache
never - never create cache

this should cover most of the cases people want to solve in their workflows

As for the 2nd one, I would call it more permanent, although it brings more flexibility and consequently breaking changes. And efforts of course.

If you read carefully what the author of #474 ended with is separate actions for restoring cache and saving it. From what I recall there is another issue with exactly the same suggestion.

this will allow

save cache step depending on other things that happen in flow after cache restore
to have a separate strategy for key in save 'cache step'
anything else?

Edit: add one more value to 1st solution

This fixes actions#342

asvny · 2022-11-03T00:30:46Z

Any updates on this ? I think everyone expects a Cache policy kinda thing which will help to skip post save and gitlab has this feature

https://docs.gitlab.com/ee/ci/yaml/index.html#cachepolicy

eyal0 requested a review from a team as a code owner January 4, 2021 05:59

eyal0 force-pushed the update_env_variable branch 4 times, most recently from 90e11e8 to bae9ed8 Compare January 4, 2021 06:49

eyal0 mentioned this pull request Jan 4, 2021

Add a parameter to ignore the existing cache but still store the cache at the end #481

Closed

This was referenced Jan 4, 2021

Skip post cache #457

Closed

Overriding cache #432

Closed

Configurable save cache on failure #92

Closed

dylanbannon mentioned this pull request Feb 2, 2021

Replace codefresh with GitHub actions cloudposse/docs#510

Merged

3 tasks

youngar mentioned this pull request Feb 25, 2021

[CI] Stop building LLVM in a separate workflow job llvm/circt#659

Closed

This was referenced Mar 25, 2021

Allow updating cache with 'update' key #353

Closed

Enable caching even when the workflow failed #541

Closed

dhadka requested changes Apr 20, 2021

View reviewed changes

ollydev mentioned this pull request Apr 20, 2021

Allow to skip the save post-step #489

Closed

eyal0 mentioned this pull request Apr 22, 2021

Add skip-save feature #571

Closed

elviejo79 added a commit to elviejo79/experiment-cardano-node-gh-actions that referenced this pull request Mar 31, 2022

in order to create an update cache we need a unique key per cache.

7a1b78b

In order to understand this logic look at: actions/cache#498 (comment)

vsvipul requested a review from bishal-pdMSFT May 11, 2022 09:30

vsvipul assigned bishal-pdMSFT May 11, 2022

jcp19 mentioned this pull request Jun 14, 2022

The CI does not update the gobra-cache on cache-hits viperproject/VerifiedSCION#28

Closed

vsvipul assigned vsvipul and bishal-pdMSFT and unassigned bishal-pdMSFT and vsvipul Aug 9, 2022

eyal0 force-pushed the update_env_variable branch from b1ef024 to 6171549 Compare September 10, 2022 22:43

eyal0 requested a review from a team as a code owner September 10, 2022 22:43

eyal0 force-pushed the update_env_variable branch 3 times, most recently from 6ea62e7 to 5146ada Compare September 11, 2022 04:57

Add update-env-variable to force/disable cache update.

3d41dc5

This fixes actions#342

eyal0 force-pushed the update_env_variable branch from 5146ada to 3d41dc5 Compare September 11, 2022 05:50

rhaschke mentioned this pull request Oct 24, 2022

Configurable save cache via environment variable #964

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add update-env-variable to force/disable cache update. #498

Add update-env-variable to force/disable cache update. #498

eyal0 commented Jan 4, 2021

eyal0 commented Jan 4, 2021 •

edited

Loading

ollydev commented Mar 10, 2021

paresy commented Mar 15, 2021

eyal0 commented Mar 25, 2021

paresy commented Mar 26, 2021

eyal0 commented Mar 26, 2021 •

edited

Loading

ruudk commented Apr 19, 2021

eyal0 commented Apr 19, 2021

dhadka left a comment

dhadka Apr 20, 2021

dhadka Apr 20, 2021

eyal0 Apr 20, 2021

henrygab Oct 2, 2022

eyal0 Oct 2, 2022

eyal0 commented Apr 20, 2021

dhadka commented Apr 21, 2021

dhadka commented Apr 21, 2021 •

edited

Loading

eyal0 commented Apr 21, 2021

eyal0 commented Apr 22, 2021

ZuBB commented Aug 3, 2022 •

edited

Loading

asvny commented Nov 3, 2022

`Inputs.UpdateEnvVariable`	`process.env[core.getInput(Inputs.UpdateEnvVariable)]`	Result
Unset	Unset	Default Cache Behavior
Set	Unset or empty string	Default Cache Behavior
Set	`true` or `yes`	Update Cache
Set	Any non-empty string that is not `true` or `yes`	Skip Saving Cache

Add update-env-variable to force/disable cache update. #498

Are you sure you want to change the base?

Add update-env-variable to force/disable cache update. #498

Conversation

eyal0 commented Jan 4, 2021

eyal0 commented Jan 4, 2021 • edited Loading

Problem

Solution

ollydev commented Mar 10, 2021

paresy commented Mar 15, 2021

eyal0 commented Mar 25, 2021

paresy commented Mar 26, 2021

eyal0 commented Mar 26, 2021 • edited Loading

ruudk commented Apr 19, 2021

eyal0 commented Apr 19, 2021

dhadka left a comment

Choose a reason for hiding this comment

dhadka Apr 20, 2021

Choose a reason for hiding this comment

dhadka Apr 20, 2021

Choose a reason for hiding this comment

eyal0 Apr 20, 2021

Choose a reason for hiding this comment

henrygab Oct 2, 2022

Choose a reason for hiding this comment

eyal0 Oct 2, 2022

Choose a reason for hiding this comment

eyal0 commented Apr 20, 2021

dhadka commented Apr 21, 2021

dhadka commented Apr 21, 2021 • edited Loading

eyal0 commented Apr 21, 2021

eyal0 commented Apr 22, 2021

ZuBB commented Aug 3, 2022 • edited Loading

asvny commented Nov 3, 2022

eyal0 commented Jan 4, 2021 •

edited

Loading

eyal0 commented Mar 26, 2021 •

edited

Loading

dhadka commented Apr 21, 2021 •

edited

Loading

ZuBB commented Aug 3, 2022 •

edited

Loading