Skip to content

Bazel invalidates and rebuilds all actions from remote cache when their TTL expires #26140

@AlexanderGolovlev

Description

@AlexanderGolovlev
Contributor

Description of the bug:

While performing the incremental build with remote cache Bazel invalidates and rebuilds the actions from cache when their TTL expires (after 3 hours with default settings). This is unexpected because the source code and environment remain unchanged since previous build.

Which category does this issue belong to?

Remote Execution

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

I'm using the //src/main/cpp:client target from bazelbuild/bazel as an example.
First, perform full rebuild to ensure that all actions are in remote cache:

bazel clean
bazel build //src/main/cpp:client --remote_cache=http://127.0.0.1:12345/

INFO: Elapsed time: 241.181s, Critical Path: 29.06s
INFO: 1531 processes: 253 internal, 1278 local.
INFO: Build completed successfully, 1531 total actions

Rebuild the target with remote cache and TTL=5 min:

bazel clean
bazel build //src/main/cpp:client --remote_cache=http://127.0.0.1:12345/ --experimental_remote_cache_ttl=5m

INFO: Elapsed time: 6.717s, Critical Path: 2.87s
INFO: 1545 processes: 1276 remote cache hit, 267 internal, 2 local.
INFO: Build completed successfully, 1545 total actions

Wait 5 min.
Rebuild the target once again:

bazel build //src/main/cpp:client --remote_cache=http://127.0.0.1:12345/ --experimental_remote_cache_ttl=5m

Expected result:

Target //src/main/cpp:client up-to-date:
  bazel-bin/src/main/cpp/client.exe

Actual result:

INFO: Elapsed time: 2.980s, Critical Path: 0.36s
INFO: 1277 processes: 1276 remote cache hit, 1 internal.
INFO: Build completed successfully, 1277 total actions

Which operating system are you running Bazel on?

Windows

What is the output of bazel info release?

release 7.6.1

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?


If this is a regression, please try to identify the Bazel commit where the bug was introduced with bazelisk --bisect.

No response

Have you found anything relevant by searching the web?

The similar issue was discussed in Unusual Builds with Bytes | BuildBuddy
and fixed in Don't download all artifacts with BwoB when their TTL expires by fmeum · Pull Request #25398 · bazelbuild/bazel · GitHub, included into release 7.5.0.
However, it seems that this fix resolves the issue for downloading itself, not for invalidating the action.

Any other information, logs, or outputs that you want to share?

Activity

fmeum

fmeum commented on May 23, 2025

@fmeum
Collaborator

In the issue description you mention unexpected "downloads", but the reproducer description doesn't mention any downloads. The "actual result" for the reproducer actually looks correct to me: Bazel has to rerun the actions to refresh the metadata.

If you really do see unexpected downloads of file contents as in the other issue I fixed, could you describe that in more detail for your reproducing example?

AlexanderGolovlev

AlexanderGolovlev commented on May 26, 2025

@AlexanderGolovlev
ContributorAuthor

Hi @fmeum Yes, I was in a hurry with the description of problem, I meant invalidating and rebuilding the actions, not downloading itself. I will fix the description.
Initially the problem was observed with Bazel 7.4.0, and we have seen the real downloads of files from remote cache. When we have found your fix, we tried the same scenario with Bazel 7.6.1. And we still saw the rebuilds for the actions, even though nothing has been changed in environment. This produces several issues.
Let's consider a case when software developer works on a code locally and incrementally builds the results with Bazel. The whole solution may be large (let's say, 50 000 actions to build), but only a small part of code is subject to change. Most of actions are taken from remote cache. In this case:

  • when TTL expires, Bazel rechecks all that actions, although the source code for those actions has not been changed. This increases the build time.
  • checking the AC with remote cache produces the unnecessary load on a cache server.
  • in case when AC is missing in remote cache, the action will be rebuilt locally although it might not be needed actually.
  • in case when locally rebuilt action produces non-deterministic output (such as .pdb, .pch files), this invalidates the whole tree of actions which depend on it. This might significantly increase the build time.
    The described behavior may look as unexpected for users, because not all people know about the TTL feature, which is turned on by default with 3h setting.
    The workaround for issues above is disabling the TTL functionality with --experimental_remote_cache_ttl option.
    We would expect that TTL feature was implemented without invalidating the actions just because they are taken from remote cache. If it is not technically feasible, then we could consider disabling the feature by default, or removing the feature at all.
changed the title [-]Bazel reloads all artifacts from remote cache when their TTL expires[/-] [+]Bazel invalidates and rebuilds all actions from remote cache when their TTL expires[/+] on May 26, 2025
fmeum

fmeum commented on May 26, 2025

@fmeum
Collaborator
coeuvre

coeuvre commented on May 26, 2025

@coeuvre
Member

The described behavior may look as unexpected for users, because not all people know about the TTL feature, which is turned on by default with 3h setting.

While the default TTL is 3h, it doesn't mean the cache entry will be expired after 3h. Bazel will extend the TTL in the background each time the build is running. In theory, it means if the next incremental build happens within 3h, the TTL of the cache entry should be extended for another 3h.

We would expect that TTL feature was implemented without invalidating the actions just because they are taken from remote cache.

This is not true. REAPI doesn't have a way for the server to communicate TTL with Bazel. So we have to implement it the current way.

If it is not technically feasible, then we could consider disabling the feature by default, or removing the feature at all.

One way to improve this is to not invalidate the cache entry when the TTL expires. We can rely on build rewinding to recover from cache eviction, because technically, the cost of build rewinding is roughly the same as invalidating cache entries. We still want to keep TTL because we can extend the lease in the background to help remote server understand that these blobs are still in use. WDTY @fmeum?

fmeum

fmeum commented on May 26, 2025

@fmeum
Collaborator

Short-term relying on build rewinding seems reasonable to me. It avoids eager invalidation but should be equally capable at resolving any issues that arise due to eviction.

Mid- to long-term action rewinding should avoid this wholesale invalidation altogether and the default TTL value would become less relevant, which is good as many users won't be aware of it.

10 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

P2We'll consider working on this in future. (Assignee optional)team-Remote-ExecIssues and PRs for the Execution (Remote) teamtype: bug

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    Participants

    @brentleyjones@fmeum@coeuvre@joeleba@AlexanderGolovlev

    Issue actions

      Bazel invalidates and rebuilds all actions from remote cache when their TTL expires · Issue #26140 · bazelbuild/bazel