Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a mechanism to have a per file cache eviction/retention #788

Open
devminded opened this issue Apr 14, 2022 · 11 comments
Open

Provide a mechanism to have a per file cache eviction/retention #788

devminded opened this issue Apr 14, 2022 · 11 comments
Assignees

Comments

@devminded
Copy link

devminded commented Apr 14, 2022

This is related to issue actions/setup-java#269

The problem is that caches fill up over time as dependencies, runtimes, and tooling are upgraded. Old files are never evicted and the cache grows. The current solution is to recalculate the cache-key at every build (base it on the week number or such) and throw it all away, but that works against the purpose of a cache to begin with.

I suggest that when saving the caches we should be able to evict files older than a configurable number of days. That way old dependencies will be removed over time and we can have the best of both worlds.

PS. I'm not sure how the cache-hit logic works in this scenario.

Something like this:

- name: Configure Gradle JDK cache
    uses: actions/cache@v3
    with:
      path: ~/.gradle/jdks
      key: gradle-jdks-${{ runner.os }}
      # Evict files older than 30 days from the cache and repackage.
      eviction:
        include:    # required, we don't want to remove just any file and corrupt the cache
          - **/*.zip
          - **/*.jar
          - **/*.tar
        days: 30
@bishal-pdMSFT
Copy link
Contributor

@devminded can you please help me understand the use case better?

The problem is that caches fill up over time as dependencies, runtimes, and tooling are upgraded.

If I am not wrong, the dependencies get updated for every build and that means the old files go away. And hence the cache also will only have latest files.

Is this problem more with runtimes and tooling where multiple versions may exist side by side? If so then the problem may be much less impacting as such version changes would not happen too frequently. Am I reading this wrong?

@devminded
Copy link
Author

Not sure if I have misunderstood something in how the cache-mechanism works.

As far as I understand, a cache-hit is simply that we found a cache with a matching key, that is then restored. How we calculate this key each build will affect if we restore the cache or not.

The issue is that for example maven/gradle saves all the dependencies, toolchains, wrappers, etc. in a directory. Gradle for example has a default 30 day eviction from some of these directories, but (AFAIK) it's based on "last accessed time" which seems to break when using GitHub caches, so after a while every new cache-file becomes larger.

Some things can be managed by being picky how we generate the cache key (like hashing the gradle-wrapper file), but that has two issues:

  1. Changing the cache-key will cause a cache-miss and the entire cache will be thrown away even if the change is tiny. And the next build will have to download the internet again.
  2. Some things does not have a simple "file to hash" to generate a key but it's part of a larger build file that changes often.

What I feel is missing is some kind of middle ground where we can evict content based on some rule (so it's excluded when packing the cache).

Perhaps I'm missing something obvious here.

@bishal-pdMSFT
Copy link
Contributor

bishal-pdMSFT commented Jul 1, 2022

@devminded looks like your ask is to be able to update a cache. Something similar to #342 ?

Essentially you want to:

  • Use a more static key rather than a hash based e.g. gradle-jdks-${{ runner.os }}
  • Restore existing copy of cache
  • Run gradle to update dependecies/tools etc
  • Update the existing cache with updated content
  • Set custom eviction for certain directories to keep the cache from growing unboundedly

There are two parts to it which are not possible today:

  1. There is no provision to update a cache. Cache in immutable and you can only create a new cache and hence the key is supposed to be more dynamic. This ask seems similar to Feature request: option to update cache #342. Can't you use a more static restore-key to be able to reuse older cache?
  2. Even if a cache can get updated, it gets stored as a single archived tar. Hence it is not possible to "purge" some directories from there. Can you work around this by only caching the directories which don't grow unboundedly? You may want to use a different cache key for the unbounded ones with a timestamp based key so that it gets purged periodically.

@devminded
Copy link
Author

I understand that it goes into one large tar that gets packed at the end of the build. The problems is just that that the source for the tar is a bunch of directories that our build tools fills with stuff but are unable to clean due to being based on timestamps and the cache pack-unpack mechanism seems to do something with the timestamps.

I guess I will do what I wrote in my original post and base the cache key on the week-number or something.

With that said I would then like to propose the following:
The actions for setup-java, setup-node, etc has a cache property where it then sets up the cache and keys for gradle, maven, node, etc. Can we add a new field "append-cache-key" to those actions where we can add extra info (like a week-number) that get appended to the generated cache keys? That way we still have some additional options for the keys.

@codylerum
Copy link

codylerum commented Oct 3, 2022

This is a pretty common issue with maven caches. If you have a dependency of foo-1.0.0.jar and then upgrade to foo-1.0.1.jar the original foo-1.0.0.jar will stay in the cache forever.

I have a step in my builds to remove those dirs from .m2 at the end of the build if the last accessed time is older than that of a dir that I created /var/oss-test at the start of the build

- name: Remove Unused Cache
        run: |
          sudo find ~/.m2 ! -neweraa /var/oss-test -iname '*.pom' | while read pom; do parent=`dirname "$pom"`; rm -Rf "$parent"; done

Something built in to delete the dir for the maven dep if not accessed in X days would be nice and would reduce the cache size for a lot of people significantly.

@github-actions
Copy link

This issue is stale because it has been open for 200 days with no activity. Leave a comment to avoid closing this issue in 5 days.

@github-actions github-actions bot added the stale label Apr 21, 2023
@williamdes
Copy link

🏓

@github-actions github-actions bot removed the stale label Apr 22, 2023
@jcadavez
Copy link

I'd like this feature too. There are some caches that I'm ok to let it expire to the default 7 days.

But, there are larger caches that I'd like to store for about 1-2 days. But, there's no GA input to specify such.

Copy link

This issue is stale because it has been open for 200 days with no activity. Leave a comment to avoid closing this issue in 5 days.

@github-actions github-actions bot added the stale label Jan 17, 2024
@williamdes
Copy link

You shall not close

@github-actions github-actions bot removed the stale label Jan 18, 2024
@aaronadamsCA
Copy link

@bishal-pdMSFT, I think the ask here is simply to delete stale files during cache restore (or save), based on configurable name patterns and maximum age.

@devminded, you may be able to do a version of this yourself with an additional workflow step at the end of your job:

- name: Delete cached files not modified in the last 30 days
  run: find . -type f -mtime +29 -name "*.jar" -name "*.tar" -name "*.zip" -delete
  working-directory: ~/.gradle/jdks

Ideally this would use the last access time, but this cache action doesn't appear to preserve atime on restore. Evicting files based on last modified time is probably wrong for most use cases, but also probably fine, as long as you can heal the cache by re-generating or re-downloading missing files.

If this action could preserve atime, that would be great; if it could automatically enforce a file retention policy for me based on atime, that would be even better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants