Skip to content

Don't eagerly invalidate remote metadata when its TTL expires #26318

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

fmeum
Copy link
Collaborator

@fmeum fmeum commented Jun 17, 2025

Build and action rewinding can recover from metadata whose corresponding contents are no longer available remotely, but are usually more efficient since they only invalidate action cache entries if the contents really are gone, not simply because their TTL expired.

Fixes #26140

@fmeum
Copy link
Collaborator Author

fmeum commented Jun 18, 2025

@tjgq This implements #26140 (comment), which breaks the existing TTL test cases in the BwoB integration test. I tried to modify them so that they verify the new behavior of the lease extension (deleting known stale action cache entries), but the lack of synchronization makes it difficult to assert on its effect. Can I just delete the tests?

@tjgq
Copy link
Contributor

tjgq commented Jun 18, 2025

cc @coeuvre

@coeuvre
Copy link
Member

coeuvre commented Jun 23, 2025

@tjgq This implements #26140 (comment), which breaks the existing TTL test cases in the BwoB integration test. I tried to modify them so that they verify the new behavior of the lease extension (deleting known stale action cache entries), but the lack of synchronization makes it difficult to assert on its effect. Can I just delete the tests?

How about enabling build rewinding for these tests and assert build succeeded?

@fmeum
Copy link
Collaborator Author

fmeum commented Jun 23, 2025

How about enabling build rewinding for these tests and assert build succeeded?

That unfortunately doesn't work because the tests aren't full integration tests and don't go through BlazeCommandDispatcher. I could possibly catch the error and verify that it is one that would trigger build rewinding.

@fmeum
Copy link
Collaborator Author

fmeum commented Jun 23, 2025

@coeuvre I adapted the tests by simulating the retry.

I think that the metadata stored in FileArtifactValue becomes irrelevant now: The only part that is ever checked is whether it is non-null. Should we keep it around or just assume that all remote files are subject to a TTL?

fmeum added 4 commits June 23, 2025 17:29
Build and action rewinding can recover from metadata whose corresponding contents are no longer available remotely, but are usually more efficient since they only invalidate action cache entries if the contents really are gone, not simply because their TTL expired.
@fmeum fmeum force-pushed the 26140-no-ttl-invalidation branch from cda58be to 77e4be6 Compare June 23, 2025 15:29
@fmeum fmeum requested a review from coeuvre June 23, 2025 15:32
@fmeum fmeum marked this pull request as ready for review June 23, 2025 16:14
@fmeum fmeum requested a review from a team as a code owner June 23, 2025 16:14
@github-actions github-actions bot added team-Performance Issues for Performance teams team-Remote-Exec Issues and PRs for the Execution (Remote) team awaiting-review PR is awaiting review from an assigned reviewer labels Jun 23, 2025
buildTarget("//a:bar");
waitDownloads();

// Assert: target was successfully built
assertValidOutputFile("a/bar.out", "foo\nupdated bar\n");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Keep this assert. (I will do the change during import)

@coeuvre
Copy link
Member

coeuvre commented Jun 24, 2025

I think that the metadata stored in FileArtifactValue becomes irrelevant now: The only part that is ever checked is whether it is non-null. Should we keep it around or just assume that all remote files are subject to a TTL?

Let's keep it around for now to make this change simple.

@github-actions github-actions bot removed the awaiting-review PR is awaiting review from an assigned reviewer label Jun 24, 2025
@fmeum fmeum deleted the 26140-no-ttl-invalidation branch June 24, 2025 20:24
@fmeum
Copy link
Collaborator Author

fmeum commented Jun 24, 2025

Let's keep it around for now to make this change simple.

I can send a follow-up to remove it if you think that's the right move. We can also wait for this commit to settle for a while.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Performance Issues for Performance teams team-Remote-Exec Issues and PRs for the Execution (Remote) team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bazel invalidates and rebuilds all actions from remote cache when their TTL expires
3 participants