Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Failed to CreateCacheEntry #1541

Open
icanhazstring opened this issue Feb 7, 2025 · 39 comments
Open

Bug: Failed to CreateCacheEntry #1541

icanhazstring opened this issue Feb 7, 2025 · 39 comments
Assignees

Comments

@icanhazstring
Copy link

icanhazstring commented Feb 7, 2025

Greetings.

Since today Feb 7th around 1AM CET we got a problem with the actions/cache.
We can't use actions/cache/save anymore and get the following error

Warning: Failed to save: Failed to CreateCacheEntry: Received non-retryable error: Failed request: (404) Not Found: invalid request

This is a simple cache step as described in the docs:

      - name: Cache vendor
        uses: actions/cache/save@v4
        with:
          key: ${{ env.VENDOR_CACHE_KEY }}
          path: |
            vendor/
            vendor-bin/

Current workaround:
Fix the version to v4.1.2. This should solve the issue for now.

@icanhazstring icanhazstring changed the title Bug: Can't create cache anymore Bug: Failed to CreateCacheEntry Feb 7, 2025
@tanya-ok
Copy link

tanya-ok commented Feb 7, 2025

Did you find any workaround?

@domman95
Copy link

domman95 commented Feb 7, 2025

We've noticed the same issue. actions/cache/restore@v3 returns a warning

Warning: Failed to restore: Failed to GetCacheEntryDownloadURL: Received non-retryable error: Failed request: (404) Not Found: invalid request

our cache step looks pretty the same as above

@icanhazstring
Copy link
Author

Yes. Workaround is to fix the version to v4.1.2 then it works again.

@zsliu98
Copy link

zsliu98 commented Feb 7, 2025

I have the same issue. From #1510, I would guess the problem is related to the new cache service (v2) APIs. Cause v4.1.2 will become deprecated in one month, it is only a temporary solution. Could somebody fix it as soon as possible? (TBH I don't know where to report the issue other than here)


Update: Yes, it works again. I am using v4.

@domman95
Copy link

domman95 commented Feb 7, 2025

I think it works again - we use v3.

@deitch
Copy link

deitch commented Feb 8, 2025

I am curious as to why the action didn't fail when the cache upload failed? I get that the upload might have failed, either due to a latent bug in the action or changes in the API (or some combination). Should the action itself not have failed when the upload didn't succeed?

@icanhazstring
Copy link
Author

I am curious as to why the action didn't fail when the cache upload failed? I get that the upload might have failed, either due to a latent bug in the action or changes in the API (or some combination). Should the action itself not have failed when the upload didn't succeed?

The upload only gives a warning.
Retrieving gives an error if you use the fail on miss setting only.

And I also created a support ticket where they mentioned they rolled back the change.
As they updated there backend there seems to be an issue that devs reverted already. So it should work with the latest version again.

@deitch
Copy link

deitch commented Feb 8, 2025

Yeah following this issue, I restarted the various jobs I manage and see them working again.

Would it not make sense to have the action fail (error) on upload error, instead of warning?

Tiphereth-A added a commit to Tiphereth-A/CP-lib that referenced this issue Feb 11, 2025
Tiphereth-A added a commit to Tiphereth-A/CP-lib that referenced this issue Feb 11, 2025
@Link-
Copy link
Member

Link- commented Feb 11, 2025

On February 7th we began the rollout of the new cache service backend. There was a bug in the implementation that caused a wider range of repositories to start calling the new backend even though they weren't intended to do so just yet. The service responded with 404s.

That problem was resolved on the same day. Please do not revert back to previous versions.

There are only 4 compatible versions with the new service backend. Please only use one of the below versions:

Supported versions and tags
actions/cache@v4
actions/cache@v3
actions/cache@v4.2.0
actions/cache@v3.4.0

Every other version is deprecated: #1510

@Tiphereth-A
Copy link

In my repository, it reports

Failed to save: Failed to CreateCacheEntry: Received non-retryable error: Failed request: (409) Conflict: cache entry with the same key, version, and scope already exists

so maybe there is another issue of new service backend.

@Link-
Copy link
Member

Link- commented Feb 12, 2025

In my repository, it reports

Failed to save: Failed to CreateCacheEntry: Received non-retryable error: Failed request: (409) Conflict: cache entry with the same key, version, and scope already exists

so maybe there is another issue of new service backend.

@Tiphereth-A - This is not an error, cache entries are immutable. If you're trying to create a cache entry with the same key, you will get this warning. In a future release we will be removing this annotation to reduce confusion.

@alexanderb2911
Copy link

I'm currently also facing this issue. The save action is not able to save a new cache although none are existing at that moment. It always tells me that the cache is already existing although it's not.

@yonitou
Copy link

yonitou commented Feb 14, 2025

Same issue here.
We are using the v4.0.0 @actions/cache package and the error Failed to save: Failed to CreateCacheEntry: Received non-retryable error: Failed request: (409) Conflict: cache entry with the same key, version, and scope already exists always pops even if the cache is completely new and clear.
We upgraded to v4.0.0 2 days ago and the error started to happen 2 hours ago

@Link-
Copy link
Member

Link- commented Feb 18, 2025

@yonitou @alexanderb2911 - we need more information to troubleshoot. If you have public repos, share the workflow runs that failed here.

Ohterwise, please create support tickets and make sure to include links to the workflows that are failing with the 409 Conflict errors.

Also, please share the link to this issue and ask the support engineer to escalate to the Actions team please.

@cderv
Copy link

cderv commented Feb 18, 2025

@Link- We are also facing the issue in this public workflow

Both when trying to update to using actions/cache/restore and actions/cache/save but also while using
julia-actions/cache which seems to use a pinned version of actions/cache: https://github.com/julia-actions/cache/blob/2b1bf4d8a138668ac719ea7ca149b53ed8d8401e/action.yml#L124-L131

Our current usage of caching involves:

  • Using job matrix for several job in parallel
  • Each of them having a cache step so they check for cache-hit, and needs to save it (we used save-always in the past).

However, it seems that even using cache-hit is problematic

  • For a new cache, all workflow with have cache-hit = false
  • The first job that end will save the cache without problem
  • All the other will try to save the cache as cache-hit = false, but key already exists (because of first job that end) and warning is thrown

I would be happy to have misunderstood though...

Anyhow, I'll be interested to really understand the new constraint to correctly adapt key and restore/save in a situation of multiple job created through Matrix. All jobs should share a cache and there should not be one for each matrix element.

Thanks a lot.

@Link-
Copy link
Member

Link- commented Feb 18, 2025

@cderv - I looked at your setup, the cache is being saved, it's being saved once and any attempt to save it after that you will get this warning.

The warning annotation is most likely what's causing confusion because the legacy service would have not set a warning annotation for a cache entry key collision. This is not a failure, your workflow jobs are finishing successfully.

Warning: Failed to save: Failed to CreateCacheEntry: Received non-retryable error: Failed request: (409) Conflict: cache entry with the same key, version, and scope already exists

The (409) Conflict is normal and expected behaviour. Cache entries are also immutable at the moment, there are no configurable parameters to change that behaviour.

I'll work with the team to reverse the warning annotation addition to prevent further confusion after I verify the behaviour with the former versions.

Am I misunderstanding the situation?

Image
Image

@cderv
Copy link

cderv commented Feb 19, 2025

Am I misunderstanding the situation?

I think you got it right.

The warning annotation is most likely what's causing confusion because the legacy service would have not set a warning annotation for a cache entry key collision. This is not a failure, your workflow jobs are finishing successfully.

Yes there is no failure. One of the job is doing the saving, and the other jobs try to save later on but there is this warning thrown from each one of them as they indeed can't. This clutter the log with a lot of warning, and this indeed created confusion has it seemed to work ok.

The (409) Conflict is normal and expected behaviour.

That is good to know. This confirm my initial understanding. I will go on with my initial change.

I'll work with the team to reverse the warning annotation addition to prevent further confusion after I verify the behaviour with the former versions.

That would really help to not have this warning thrown by default, especially if everything is going as expected. maybe annotate in workflow debug mode only ? or do not create an annotation that go up in summary ?

This created the confusion indeed, especially as I did not see anyway to avoid each following job to detect cache have been saved already, and not try to save the cache.

Thank you.

@Link-
Copy link
Member

Link- commented Feb 19, 2025

To conclude this report. The previous implementation of actions/cache also posts an annotation when it fails to upload a cache entry if the key exists already. The new version is just a bit more verbose.

I'll adjust the annotation text to match the former version and add some guidance.

Image

@cderv
Copy link

cderv commented Feb 19, 2025

It did not remember this and this was less prominent. It means it woud still be a good thing to find a way to avoid this warning, but this is another topic. Thanks for having looked into this !

@dime10
Copy link

dime10 commented Feb 19, 2025

@Link- We have a case of the 409 error showing up even though there is no existing entry with that key. In fact, we cleared the entire cache and the error still shows up. This is the most recent run: https://github.com/PennyLaneAI/catalyst/actions/runs/13418034619/job/37483671023
Specifically, the Cache LLVM Build step, with the Linux-llvm-3a8316216807d64a586b971f51695e23883331f7-default-build-gcc key. There are no cache entries with that key currently present in the cache, nor have any been created since the entire cache was wiped.

(Note while the Cache LLVM Source step shows the same warning, that one is legitimate since a parallel job created that entry while this job was running.)

@Link-
Copy link
Member

Link- commented Feb 19, 2025

@dime10 - let me review, thanks for sharing the run

@Link-
Copy link
Member

Link- commented Feb 19, 2025

We identified the problem, thank you all for chiming in! The fix will be released in a day or two.

In short, when a cache entry is created, if for whatever reason it's not finalized that slot is locked and will not be reused in future attempts to create a cache entry with the same key. Since the cache entry was never finalized, it will also not be served.

@yonitou
Copy link

yonitou commented Feb 19, 2025

It seems to be exactly the issue :) after changing the name of the cache key I used before (even if my cache was completely empty), the 409 errors disappeared

dime10 added a commit to PennyLaneAI/catalyst that referenced this issue Feb 19, 2025
Circumvents upstream bug in GitHub caching infrastructure: actions/cache#1541 (comment)

---------

Co-authored-by: David Ittah <dime10@users.noreply.github.com>
@erlichmen
Copy link

My case is a little different. I expect to get this warning because I have several parallel jobs, and the first one to complete will save the cache while the rest will receive the warning. I think this is a valid pattern.

In my opinion, this warning should be removed or at least moved to the debug level. I don't care that there is already a cache entry, and moreover, there is nothing I can do about it.

@MikeMcC399
Copy link

@Link-

We identified the problem, thank you all for chiming in! The fix will be released in a day or two.

Is this a backend change? Will you let us know when the fix has been released?

@Link-
Copy link
Member

Link- commented Feb 24, 2025

@Link-

We identified the problem, thank you all for chiming in! The fix will be released in a day or two.

Is this a backend change? Will you let us know when the fix has been released?

This is a backend change and I'll update this issue once we deploy the fix.

@jozefizso
Copy link

We have this issue on Windows Server and macOS runners on many different workflows.

@tking16
Copy link

tking16 commented Feb 26, 2025

Similar issue here (MacOS)

@yonitou
Copy link

yonitou commented Feb 27, 2025

The issue just started to appear again .. It's quite urgent to fix it @Link-

@yonitou
Copy link

yonitou commented Feb 28, 2025

Sorry to post again but we are completely stuck since yesterday and I need to insist on this matter.
@Link- , I have couple of questions :

  • When this will be fixed ? We need something quite precise because it's just breaking all our pipelines ..
  • In our package.json, we are currently using "@actions/cache": "4.0.0" from https://github.com/actions/toolkit/tree/main
    Can it be related ? As I said, the problem was fixed for one week and came back yesterday around 13h00 UTC

@MikeMcC399
Copy link

@yonitou

You are commenting on an issue which is only about a warning which does not prevent workflows from running. You may have a different issue.

The latest version of @actions/cache is 4.0.2 - see https://github.com/actions/toolkit/blob/main/packages/cache/RELEASES.md. Perhaps you should be updating to this version?

@yonitou
Copy link

yonitou commented Feb 28, 2025

Hi @MikeMcC399, thanks for your quick answer.
If I understood well the issue, it's related to a bug with the Failed to CreateCacheEntry log. Our error is exactly the same :

Image

The key mentioned in the error is not even in our cache. It's an error from the actions/cache backend resulting in a false positive and not saving the cache key.

Our workflows need the cache to make up some critic actions depending on the diff between 2 workflows.

If we update the package to 4.0.2, will it fix it ?

@MikeMcC399
Copy link

MikeMcC399 commented Feb 28, 2025

@yonitou

If we update the package to 4.0.2, will it fix it ?

According to #1541 (comment) a backend change is needed to fix the warning. The warning is not fixed by updating @actions/cache.

In your situation I would test updating to the latest version to see if it improves your workflow success.

@yonitou
Copy link

yonitou commented Feb 28, 2025

Thanks @MikeMcC399
Just updated to 4.0.2 and the warning is still there, the cache key is not saved.
So, until the backend change is made, I have no solution ? Do you have an ETA ?

@MikeMcC399
Copy link

@yonitou

Do you have an ETA ?

I am a community member like you, so I can't provide any ETA. I also doubt whether the backend fix will solve your problem. The fix is only to suppress the warning. That's all. You may need to open a new issue for your problem.

@cderv
Copy link

cderv commented Feb 28, 2025

the cache key is not saved.

Did you check your caches ? You can use gh cache list

The problem we discussed until now in this discussion is about the Warning annotations when a workflow tries to save a cache with a key that is already saved (by a concurrent workflow for example).

See #1541 (comment) and #1541 (comment)

And specific one related to new cache #1541 (comment) - maybe you are hitting that one.

In short, when a cache entry is created, if for whatever reason it's not finalized that slot is locked and will not be reused in future attempts to create a cache entry with the same key. Since the cache entry was never finalized, it will also not be served.

They say they will update as soon as backend is fixed (#1541 (comment))

@yonitou
Copy link

yonitou commented Feb 28, 2025

Thanks @MikeMcC399 for your help
@cderv, I check my cache multiple times. The cache key is not there. The slot is just locked as you mentioned. This is exactly what I'm talking about

@cderv
Copy link

cderv commented Feb 28, 2025

The slot is just locked as you mentioned. This is exactly what I'm talking about

Ok. Sorry then for the unnecessary repeat. I wasn't sure. Then we are waiting for the official update.

@Link-
Copy link
Member

Link- commented Mar 6, 2025

@yonitou et al, the backend change is coming this week. We had to iterate on a couple of ideas and put the change in motion. In short, cache keys that are locked due to a failed upload will be cleared every 24 hours from the time they're created (after the backend fix finishes rolling out).

In the meantime, I recommend that you maintain some level of flexibility with your cache keys. As a best practice, cache availability should not be blocking to your workflows even if we aim to provide the same level of stability and availability as workflow artifacts and packages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests