Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to re-evaluate cache key during post action #673

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

TimoRoth
Copy link

@TimoRoth TimoRoth commented Nov 8, 2021

We're having a situation where we don't know in advance which files will need cached.
As in, the list of files is only known once the files are actually there.

This change adds an optional option which explicitly re-evaluates the Cache-Key in the post action, so it can get the latest hash, or whatever else determines the key.

Update of actions/core was done to get access to getBooleanInput().

@aparna-ravindra
Copy link
Contributor

Hi @TimoRoth
Are you still keen on contributing this PR? If yes, I have a couple of questions for you

  1. If the cache key changes between cache restore step and the cache save step (in the post step), would it cause inconsistency?
  2. If the cache key is not available at the time of cache restore, there will never be a cache hit. In that case, we will create a new cache everytime the workflow runs. That is not desirable. Is this understanding correct?

@TimoRoth
Copy link
Author

TimoRoth commented Oct 7, 2022

Hi @TimoRoth Are you still keen on contributing this PR? If yes, I have a couple of questions for you

1. If the cache key changes between cache restore step and the cache save step (in the post step), would it cause inconsistency?

No, it would just store the cache under a different key than the one it downloaded initially.
But generally, a hit on the key itself is not normally expected when you need this option, because:

2. If the cache key is not available at the time of cache restore, there will never be a cache hit. In that case, we will create a new cache everytime the workflow runs. That is not desirable. Is this understanding correct?

This option only really makes sense in combination with a sensible restore-key.
You can see our use of it here:
https://github.com/OGGM/tutorials/blob/ba7c1fc64fb0d7f2edf94738229513c239f82ab0/.github/workflows/build-pages.yml#L23

As long as nothing changes, the list of files will be the same, and the same cache key will be generated, and nothing will be uploaded.
Problem is: The list of files that will be cached is determined at runtime: https://github.com/OGGM/tutorials/blob/ba7c1fc64fb0d7f2edf94738229513c239f82ab0/.github/workflows/build-pages.yml#L61

If you do not have a restore-key, you might never get a cache hit and the whole thing is pointless.

@TimoRoth
Copy link
Author

TimoRoth commented Oct 7, 2022

To elaborate a bit on why we need that:
The software we run downloads a whole lot of files from various sources on the internet.
There is no predetermined list of which files of the tens of thousands that are there this specific configuration of the software will download.
It will however be able to detect if those files are already there. I.e. have been downloaded on previous runs of the software (or were restored from GHA Cache).

So we need a way to update the cache whenever the set of files it downloaded changed, but we only know that after it has finished running.
Obviously that is too late to restore the cache, since at this point the software would have already downloaded the files again on its own.

Re-Evaluating the cache key at post is the easiest way I found to achieve that.

@aparna-ravindra
Copy link
Contributor

Thank you for the context. As you understand, this is not a very common use case. The action might behave in unexpected ways if the cache key changes within the same workflow. Today, the cache key used while restoring the cache is explicitly saved in the state to be used again in the post action step as you see in the comment here.

This makes me skeptical of re-evaluating the cache key. I will try to gather more context about the importance of keeping the cache key same throughout the workflow and get back to you if I learn anything more. Based on my understanding so far, I am not sure if we should go ahead with this change.

@TimoRoth
Copy link
Author

TimoRoth commented Oct 7, 2022 via email

@IanButterworth
Copy link

Being able to make a key based on the contents of the cache would be very valuable.
At least, the docs should be clearer on when the key is rendered, as it's currently very confusing because during the save stage, debug output shows that the key is rendered again, but the version that is used for save is the one from the earlier restore step.

Docs change: #1283

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants