Skip to content

[WIP] prototype for negative caching in StoreCache#4042

Open
espg wants to merge 2 commits into
zarr-developers:mainfrom
espg:feat/cache-store-negative-caching
Open

[WIP] prototype for negative caching in StoreCache#4042
espg wants to merge 2 commits into
zarr-developers:mainfrom
espg:feat/cache-store-negative-caching

Conversation

@espg
Copy link
Copy Markdown

@espg espg commented Jun 5, 2026

Adds opt-in negative caching to zarr.experimental.cache_store.CacheStore: when enabled, a full-key read that finds the key absent in the source store is remembered, so subsequent reads of that absent key return None immediately without a source round-trip. The remembered miss is evicted when the key is later written. Default off; no behavior change unless cache_missing=True. Follows from discussion on #4028

Motivation

CacheStore caches present values only. On a full-key miss it deletes any stale entry and stores nothing, so a key absent in the source is a permanent cache miss — every read re-pays a source round-trip. This is the dominant cost when reading sparse arrays through a CacheStore: most chunks are empty, and the positive cache structurally cannot help (there is no value to store, and "not in cache" is indistinguishable from "not cached yet"). Negative caching closes that gap.

It is intentionally narrow: it benefits the stock arr[:] path (which probes every chunk) read repeatedly through a CacheStore. Code using the #4028 discovery primitives never issues the empty-chunk reads in the first place and does not need this.

API

from zarr.experimental.cache_store import CacheStore

cached = CacheStore(
    source_store,
    cache_store=cache_backend,
    # cache_missing=True is the default; pass False to disable
    max_age_seconds=300,  # recommended: bound staleness of remembered misses
)
  • cache_missing: bool = True — remember full-key misses (opt-out).
  • cache_stats() gains negative_hits; cache_info() gains cache_missing and missing_keys.

No new bounding parameter is introduced: remembered misses are bounded by the existing max_age_seconds, mirroring how the positive cache is bounded by max_size.

Design

  • Store-level, key-based. CacheStore wraps a whole store and sees opaque keys (no chunk-grid knowledge), so negative knowledge is tracked per full key in a small dict[str, float] (key → insert time). Negative entries carry no bytes and are kept out of the max_size byte budget, so they never evict real cached data.
  • TTL'd. Remembered misses respect max_age_seconds, so a key written to the source out-of-band becomes visible again after expiry. Like the positive cache (unbounded when max_size is None), the negative cache is bounded only by max_age_seconds; with an infinite TTL a scan over a very large sparse key space accumulates one small entry per absent key, so set a finite TTL (or cache_missing=False) for such workloads. This is called out in the docstring.
  • Write-eviction. set and an overridden set_if_not_exists drop any remembered miss for the key. delete does not create one (a delete is a mutation, not a checked-absence read).
  • Scope. Full-key reads only — byte-range misses and exists() are unchanged. exists() deliberately does not consult the negative cache (the default set_if_not_exists calls exists then set; a stale "missing" there could overwrite present data).
  • Stats. A negative hit is reported separately as negative_hits and counts as neither a hit nor a miss, so the positive hit_rate is unaffected.

Correctness notes

  • TTL staleness: with the default max_age_seconds="infinity" a remembered miss never expires, so a key written by another process stays invisible through the cache until eviction-on-write. Pair cache_missing=True with a finite max_age_seconds when the source may be written concurrently.
  • TOCTOU window: the source get runs outside the state lock, so a concurrent set can land between the source returning None and the miss being recorded. This is the same window the positive cache already has; it is TTL-bounded and self-heals. Documented as a known limitation rather than over-engineered away.

Testing

tests/test_experimental/test_cache_store.py — new TestCacheStoreNegativeCaching: enabled-by-default and cache_missing=False disable, basic negative hit (asserts the source is hit exactly once via monkeypatch), eviction on set and set_if_not_exists, TTL expiry with an out-of-band source write, byte-range reads unaffected, stats/info surfacing, and delete does not record. The existing test_cache_info key-set assertion is updated for the two new info keys. Full suite: 54 passed; ruff, mypy (strict), and numpydoc clean.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/user-guide/*.md
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.59%. Comparing base (fe22910) to head (5ff0af0).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4042      +/-   ##
==========================================
+ Coverage   93.55%   93.59%   +0.04%     
==========================================
  Files          88       88              
  Lines       11896    11926      +30     
==========================================
+ Hits        11129    11162      +33     
+ Misses        767      764       -3     
Files with missing lines Coverage Δ
src/zarr/experimental/cache_store.py 92.07% <100.00%> (+3.86%) ⬆️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant