v3.8.14
Added
- Write-time memory admission — dedup-merge + salience floor (gitlab #969/#970).
A capped knowledge store used to fill with paraphrases of facts it already held,
forcing eviction to drop a good fact to make room for a near-duplicate. The
agent-facingctx_knowledge rememberpath now runs a server-side admission gate
(ProjectKnowledge::remember_admitted) before committing: a new value that is
≥auto_merge_similarity(word-Jaccard, default 0.9) to an existing same
category fact under a different key is merged into it (a confirmation bump, no
new row), and a value whose content salience falls belowmin_salience(default
0= off, lossless) is rejected with a clear reason. Internal restorers (archive
rehydrate, cognition auto-promotion) keep using the ungatedremember, so
admission only disciplines fresh agent writes. Same-key confirm/supersede
(contradictions) is untouched. Tunable via[memory.admission]/
LEAN_CTX_ADMISSION_{ENABLED,MERGE_SIMILARITY,MIN_SALIENCE}. - Cluster compaction — collapse low-value fact piles into recoverable digests
(gitlab #969/#971). Decay + the cap kept a busy store churning at 100% but
never actually shrank it. A new cognition-loop step (8c, hourly, lean-ctx-driven)
collapses a same-category cluster of faded (< max_confidence), barely-confirmed
(<= max_confirmations), never-frequently/recently-retrieved facts — at least
min_clusterof them (default 4) — into a single content-addressed digest fact,
archiving the originals so they rehydrate on recall. Digests and synthesized
summaries are never re-compacted. The digest key/value are byte-stable functions
of the cluster (#498). Surfaced ascompacted=on the cognition-loop report.
Tunable via[memory.compaction]/LEAN_CTX_COMPACTION_*; runs only in the
background loop, never on therememberhot path. - Self-curating memory defaults + actionable capacity guidance (gitlab #969/#972).
prune_unretrieved_after_daysnow defaults to a conservative, recoverable
90 days (was off), so genuinely cold single-confirmation facts are archived
instead of accumulating.lean-ctx doctorcapacity warnings are no longer a dead
end: a store at its cap now prints that this is healthy by design (eviction
holds it there) and which lever to pull, while an over-cap CRIT tells the
operator to run the cognition loop or raise the cap. - Read-cache re-delivery telemetry (gitlab #953). Turns the subjective
"re-reads feel unreliable" signal into data: every event that drops a
fully-delivered cache entry — forcing the next read to re-send the whole file
instead of the cheap[unchanged]stub — increments a process-global counter
grouped by cause (compaction,idle,eviction,conversation), surfaced
as are-deliveries forced:line inctx_cache status. The counters live only
in that diagnostic, never in a cacheable tool-output body, so output
determinism (#498) is preserved. Pure measurement — no behavioral change. - Persistent, conversation-scoped
[unchanged]stub index — survives daemon
restarts and idle clears (gitlab #955). The in-memory read cache is wiped on
every daemon restart and emptied by the idle-TTL clear, so until now the first
unchanged re-read afterwards re-delivered the whole file — the single biggest
remaining source of the "re-reads aren't reliable" feeling. A new focused
modulecore::read_stub_indexpersists the minimal bookkeeping needed to emit
the ~13-token stub —{path, md5, mtime, line_count, file_ref, delivered_conversation}, never the content — to
{data_dir}/read_cache/stub_index.json(atomic tmp+rename, LRU-capped at 1024
records). It is write-through on every full delivery, flushed on the
batch/idle/shutdown save cadence, and rehydrated at startup, so a re-read of an
unchanged file in the same conversation now collapses to the stub even across
a restart. Correctness is gated harder than the warm path: a cold stub (no live
entry) is served only when the file's mtime and md5 still match disk and
the current conversation equals the delivering one
(conversation::conversation_allows_cold_stub— no "no-context → legacy"
escape, because across a process boundary an unknown conversation cannot prove
the content is in context; this keeps #954's cross-chat hazard closed). A host
compaction drops the whole index synchronously (the conversation's context was
summarised away), mirroringSessionCache::reset_delivery_flags. Content is
always re-read from disk — only delivery bookkeeping persists — so tool-output
determinism (#498) is untouched. Side benefit: because the index outlives the
idle clear, same-conversation re-reads after idle no longer re-deliver either.
Kill-switchLEAN_CTX_STUB_PERSIST=0. - Deterministic JSON crusher core —
core::json_crush(gitlab #934/#935,
Headroom "Smart Crusher" port). Real JSON payloads (API responses,kubectl get -o json, DB dumps, RAG chunks) are dominated by arrays of objects that
repeat the same keys and values on every row. The new single-source module
factors that redundancy out:crush_losslesshoists every key present in all
items of an array to its dominant value (a_defaultsblock) and keeps only
per-item deviations, so it is exactly reconstructible viareconstruct;
crush_lossyadditionally records near-unique high-entropy columns
(timestamps/UUIDs) in_droppedfor out-of-band CCR recovery. Output is a pure
function of the inputValue— no timestamps, counters, randomness, or hash-map
order leakage (candidate keys walk aBTreeSet, value frequencies aBTreeMap)
— and it never inflates (a no-op returnsNone). This is the deterministic,
byte-stable answer to Headroom's statistical crusher (#498). - Opt-in lossless JSON crushing for verbatim data commands (gitlab #936). A
newcrush_verbatim_jsonconfig key (envLEAN_CTX_CRUSH_VERBATIM_JSON, default
off) lets the array-heavy JSON of otherwise byte-verbatim data commands
(gh api,jq,kubectl get -o json,curlJSON) flow through the lossless
crusher when it at least halves the payload. Off by default keeps those outputs
verbatim; on, they are reshaped into a compact, fully reconstructible form and
never lose a datum. The gate is a pure, unit-tested function and only ever
touchesVerbatimdata commands —Passthrough(auth flows, dev servers,
streaming) is never reshaped. - Active prompt-cache breakpoint injection for Anthropic (gitlab #939,
Headroom "cache aligner" adjacent). A new opt-incache_breakpointproxy
config key (envLEAN_CTX_PROXY_CACHE_BREAKPOINT, default off) makes the
proxy add a singlecache_control: {type:"ephemeral"}breakpoint to the
systemfield of Anthropic requests only when the client set none of its
own — so a raw API client's large, stable system prompt bills later turns at
the cached rate instead of full price every turn (the cache win it left on the
table). It is Anthropic-only by construction: OpenAI and Gemini cache prefixes
automatically and ignore the marker, so those paths stay byte-unchanged. The
injection is deterministic (a pure function of the body, so the prefix it
creates is itself byte-stable, #498), never adds a second breakpoint (it defers
to any clientcache_controland to a client-cached message prefix), and is
skipped below Anthropic's minimum cacheable size so it never churns bytes for no
cache. It runs even on an otherwise meter-only/byte-passthrough proxy (the one
sanctioned mutation), and every injection is counted on a dedicated
breakpoints_injectedgauge in/statuscache_safety— a pure win signal,
never against the cache-safe ratio. - Cache-aligner volatile-field telemetry (gitlab #940, Headroom "cache aligner"
stage 1, telemetry-first). A single volatile token in an otherwise-stable
system prompt — today's date, a fresh UUID, a git SHA — shifts the prefix bytes
and busts the provider cache on every turn. A new opt-incache_alignerproxy
config key (envLEAN_CTX_PROXY_CACHE_ALIGNER, default off) makes the proxy
scan each unanchored Anthropic system prompt for those fields and report how
many it found on/statuscache_safety(volatile_system_requests,
volatile_fields_detected), so a user can quantify how much prompt-cache their
prompt leaks. The scan is measurement only — the request body is never
mutated, so it stays strictly cache-safe — and deterministic (matches are
collected, sorted, and overlapping spans merged, so a full timestamp counts
once). This is the honest precursor to an opt-in tail-relocate, which is
deliberately deferred until the data shows it pays. - Retrieve-coupled CCR learning (gitlab #941, Headroom CCR "learning" port).
When an agent keeps pulling back originals the inline compressed form dropped,
that is direct evidence the compression was too aggressive.LoopDetectornow
tracksctx_expand/ctx_retrievere-fetches in a dedicated sliding-window
counter (retrieve_count, alongside the existing correction counter), exposed
as theccr_retrieve_rateanomaly metric. The session auto-degrade now reacts
to the stronger of the two pressures (correction loops and CCR retrieves) and
recovers only when neither fires — so a session that over-retrieves dials
compression down toLite(>=3) thenOff(>=5) for itself. The level is
server state that feeds futureCompressionLevel::effective()decisions, never
part of any tool output body, so output determinism (#498) is preserved. - Model-free JSON-crush accuracy gate (gitlab #942). A new
Condition::JsonCrush
arm in the deterministic A/B eval harness (core::eval_ab) routes JSON/JSONL
throughjson_crushinstead of whitespace-only compaction, and a committed
JSON-QA fixture (a redundant operator roster with one outlier field) plus the
gatejson_crush_condition_preserves_answer_and_beats_baselineprove — with no
live model — that the crush keeps every gold answer while packing it in strictly
fewer tokens than the raw baseline. This is the deterministic accuracy floor of
the "crushed >= raw" claim, guarding against a future over-aggressive change. - Per-upstream proxy compression stats + ChatGPT Codex support (#582). The
proxy/statusandlean-ctx proxy statusnow break compression down per
upstream — Anthropic, OpenAI, ChatGPT, Gemini — each with its own request /
byte / token-saved counters, so you can see exactly where the savings come
from. The split is purely additive: the existing top-level totals are
unchanged, and an unknown label is still counted in the totals but never
misattributed to a bucket. ChatGPT Codex traffic
(/backend-api/codex/responses) is recorded under its ownChatGPTlabel
while reusing the OpenAI Responses compression, usage, introspection and
holdout paths, and JSON-encoded tool-result envelopes inside Responses output
are now compressed/pruned without dropping items or breakingfunction_call/
function_call_outputpairing (shrink-only, respectsshould_protect). The
research-prose squeeze cap is tunable viaLEAN_CTX_RESEARCH_PROSE_CAP
(default 20000). Thanks to community contributor @ousatov-ua. - Self-observability + self-curation tooling (gitlab #959–#964). A cluster of
measurement-first additions that let lean-ctx report on — and tune — its own
context footprint: adoctorinjected-context linter plus a budget-gated
per-session overhead report (#960/#964); ahealthper-tool value signal that
recommends disabling tools that never earn their tokens (#961); knowledge-decay
pruning and an ACTIVE-SESSION token budget so the injected session block stays
bounded (#962); a shadow-minimal rules block that trims re-teaching (#963); and
a deterministic footprint delta-eval harness for injected context (#959). All
are diagnostic/state-only — no tool-output body changes — so output determinism
(#498) is preserved.
Changed
json_schema::compressis now crush-backed (gitlab #936). The generic JSON
fallback (and thejqroute) prefers the losslessjson_crushform over the
value-dropping schema outline whenever the array is redundant enough to at least
halve the payload — keeping every datum reconstructible instead of collapsing it
to a structure-only sketch. Heterogeneous or low-redundancy arrays still fall
through to the compact schema outline (unchanged), so there is no regression for
those.curl's top-level array-of-objects path now defers to the same shared
core instead of its useless[object(NK); N]summary, converging the generic
JSON handling on one implementation (docker inspectand theaws
resource summarizers stay intentionally domain-specific).PATTERN_ENGINE_VERSION
is bumped (1→2) so determinism consumers detect the new output shape.ctx_readaggressive mode compacts JSON structurally (gitlab #936). Reading
a.jsonfile inaggressivemode (the auto-resolved mode for large non-code
data files) now routes redundant array-of-object payloads through the lossless
json_crushcore instead of generic text pruning, which mangles JSON structure.
It fires only when the crush at least halves the file and shrinks the token
count; the exact bytes stay recoverable with afull/rawre-read.mapmode
stays a compact structural overview (unchanged). The "must at least halve"
gate is centralized injson_crush::{crush_value_if_beneficial, crush_text_if_beneficial}(oneKEEP_DATA_DIVISOR), so the shell (json_schema,
curl) and read paths can never drift.- Unified, surgical CCR retrieve path across the whole tee store (gitlab #938).
ctx_expandnow resolves every content-addressed original through one resolver
with a fixed precedence: proxy prune/live stubs (proxy_<hash>), the JSON
crusher's lossy originals (json_<hash>), AND every compressed shell command's
already-teed verbatim output (<slug>_<8hex>.log) — before the reference
(ref_) and archive (hex) stores. So an agent can pull back just the slice it
needs (head/tail/search/json_path/range) from any of them instead of
re-reading the whole file; the high-compression shell footer now advertises the
ctx_expandslice form. The resolver trusts only the file name and always
rebuilds the path under{state}/tee/(no traversal). Opt-in verbatim JSON
crushing (crush_verbatim_json) gains a lossy stage 2: when the lossless reshape
does not pay, it drops near-unique high-entropy columns (timestamps, UUIDs) and
persists the verbatim original underjson_<hash>, embedding a content-addressed
ctx_expandhandle so a dropped datum is never irrecoverable. ctx_searchabsorbsctx_semantic_searchandctx_symbol(#509). Search
collapses to a single action-routedctx_search: anactionargument
(regexdefault,semantic,symbol,reindex,find_related) routes to
the same engines as before, and a missingactionis inferred so existing
calls keep working. The two former tools become deprecated aliases — hidden
fromtools/listbut still callable for one release — which trims the
advertised surface (Standard 17→15 tools, Minimal 6→5) so a model picks the
right search on the first try. Underlying search behavior is unchanged; this is
the final step of the #509 read/search consolidation begun in 3.8.12/3.8.13.- Parallel BM25 index build and incremental rebuild (gitlab #933, #581). The
full index build now tokenizes across a rayon pool and merges deterministically
(#933); the edit-loop incremental rebuild — changed/new/removed files on a warm
index — does the same (#581). Both paths are byte-for-byte identical to the
sequential result (covered by determinism tests and a CI build-time regression
gate), so first-index and reindex-after-edit are faster with no change to what
search returns. Credit to the #581 reference work by @ousatov-ua. - Generated dependency lockfiles are excluded from the index (#585). npm/pnpm
lockfiles (package-lock.json,npm-shrinkwrap.json,pnpm-lock.yaml) carry
ingestible.json/.yamlextensions and used to slip into the index, where a
retrieval surface (ctx_compose, BM25 search) would inline a large
auto-generated dependency pin — a pure token sink. They are now dropped at the
ingestion front-door via a new non-ingestibleIngestKind::Generated, joining
the*.lock/*.lockbfiles already excluded there (the scattered"lock"
extension check is removed so detection lives in one place). Detection is by
file name, so it is depth-independent — a monorepo's
frontend/package-lock.jsonis caught too, unlike a root-anchored ignore glob.
An explicitctx_read/ctx_tree/ctx_globof a lockfile is unaffected.
Fixed
- CI on
mainwas red on all threeTestjobs — a stale source-grep test
(gitlab #957).scenario_server_degrade_thresholdsasserted the dispatch
source literallycontains("correction_count >= 5")etc.; the #941
retrieve-coupled refactor renamed that topressure = correction_count .max(retrieve_count), so the literals vanished and the assertion failed on
every platform (the rest of CI stayed green). Replaced the brittle grep with a
behavioral test backed by a new pure, totalCompressionLevel::degrade_action
(Set/Clear/Leave) extracted from the dispatch — runtime behavior is
unchanged (5+ → Off, 3+ → Lite, 0 → clear, 1–2 → hold), but the threshold table
is now unit-tested and immune to internal renames. - Subagents force-freshed every read, so re-reads were never cached inside a
Task (gitlab #956, closes the #952 series).is_subagent_context()set
effective_fresh = fresh || subagent, a blanket cold full read for the whole
subagent run — safe (a subagent must not be served a stub for content only the
parent received) but it threw away exactly the cheap[unchanged]re-read
that #946/#954/#955 reclaimed. Now that the stub is conversation-scoped, the
safety is enforced precisely instead of by bypass: a subagent runs under its
owntask:{CURSOR_TASK_ID}scope (conversation::current_conversation_id), so
the stub gate withholds any stub the parent or a sibling delivered (distinct,
non-Nonescope → never matches), while the subagent's own re-reads of an
unchanged file collapse to the stub. The blanket force-fresh now applies only
when scoping is off (LEAN_CTX_CONVERSATION_SCOPE=0); an explicit
LEAN_CTX_FORCE_FRESH=1still always forces fresh. Stubs stay double-gated
(mtime+md5 vs disk and conversation match), so a subagent is only ever
stubbed for a file it read itself, unchanged — never stale, never cross-agent. auto-mode re-reads bypassed the[unchanged]cache stub and re-delivered
the whole file (gitlab #946). The cheap ~13-token re-read stub
(Fref=path [unchanged NL]) only fired for an explicitmode=fullre-read;
in the defaultautomode a re-read of an unchanged, already fully-delivered
file re-sent the entire body — the "re-reads aren't cached / reliability is
worse than before" regression. Cause:ctx_readresolvedautowith
cache: None, so the resolver's unit-testedunchanged + full_delivered → ("full","cache_hit")short-circuit was dead code on the real read path (a
silent divergence fromctx_smart_read, which threaded the cache correctly;
introduced by the #683 deterministic cascade).resolve_auto_modeis now
cache-aware, the warm path routes anauto→fullcache-hit through the same
try_stub_hit_readonlystub as an explicit full re-read, and the registered
read-lock fast path acceptsautotoo (self-guarded by the stub). Compressed-
first files still serve their cached compressed output on re-read — no wrong
escalation to full. Regression test
auto_reread_of_fully_delivered_file_serves_unchanged_stub.- The
[unchanged]re-read stub was not conversation-scoped — a file
delivered in one chat could be stubbed for a re-read in another (gitlab #954).
The readSessionCacheis shared across every chat served by one daemon, but
the stub asserts "you already have this in context" — true only within the
conversation that received the full content. A re-read from a different chat on
the same daemon could therefore receiveFref=path [unchanged NL]for content
it never saw (the idle-TTL clear only incidentally masked it). Each entry now
records thedelivered_conversation(resolved from the live Cursor
conversation_idthat hooks write toactive_transcript.json), and
try_stub_hit_readonlyserves the stub only when the current conversation
matches; a mismatch re-delivers in full and is counted by the new re-delivery
telemetry (#953). With no conversation context (hooks absent) it falls back to
the legacy process-scoped behavior, so single-chat hit rates are unchanged and
byte-stable (#498). The conversation gate is a pure, unit-tested function
(conversation::conversation_allows_stub) injected into the stub path for
deterministic, host-independent tests. Kill-switch
LEAN_CTX_CONVERSATION_SCOPE=0. ctx_impactmissed Go and Kotlin same-package blast radius (#398 bug class).
The C#/Java fix in 3.8.13 closed one instance of a general gap: any language with
implicit same-package visibility references project types with no import, so
import edges alone leave the consumed type a false-negative leaf. For Go the
miss was total — same-package is same-directory and fully import-free, so changing
a struct used by a sibling file reported "no impact".core::type_ref_edgesnow
resolves Go usages directory-scoped and strict (a common name like
Config/Serverdeclared in many packages still resolves to the one true
same-package definer, with no cross-package leak) and Kotlin usages by
declared package, both durable through thegraph_indexmirror and emitted by the
ctx_impactbuilder. The old coarse Gopackageheuristic — one arbitrary
same-directory edge per file, silently parsed as a top-weightimportsedge in
the mirror — is removed: it both missed the real consumer and pulled
non-consumers (e.g. an unrelatedlogger.go) into the blast radius. Precise
type_refedges replace it, and a genuinely unused file now falls to the standard
low-weight sibling rescue like every other language. Per-language scope is
centralized in oneresolve_scope(previously the namespace logic was duplicated
across three call sites).GRAPH_ENGINE_VERSIONis bumped (3→4) so stale graphs
self-heal. (gitlab #920–#924)- Project-root resolution unified for search and the MCP path jail (#580,
#948). An index built at the git root but searched from a sub-directory
resolved to a different namespace hash and returned zero hits; separately, an
MCP server launched from an agent-config directory (.copilot/.cursor/
.windsurf/.gemini) adopted that directory as the project root and then
rejected in-tree reads with "path escapes project root". A single
git-promotion resolver is now the one source of truth for the root, an explicit
sub-directory becomes a result filter rather than its own namespace, and an
agent-config CWD auto-reroots to the real project. PathJail enforcement is
unchanged — only root derivation is corrected. Adopted from reference PR #581
by @ousatov-ua. lean-ctx call ctx_tools …panicked on the CLIcallpath (#583). Invoking
thectx_toolsmeta-tool from the CLI crashed with "there is no reactor
running" because the runtime was resolved viaHandle::current(), which only
exists on the MCP path (handlers there run insideblock_in_place). It now
usesHandle::try_current(): the ambient handle is reused on the MCP path and
a one-shot runtime is built on the CLI path. Pure control-flow fix — MCP
behavior and output bytes are unchanged.ctx_shellcould silently drop output when a child held the pipe open
(gitlab #945). A process that kept the write end of the pipe open past its
own exit truncated the captured output; the reader now drains to EOF so the
full output is compressed and returned.lean-ctx updatefailed withUnknownIssuerbehind TLS-inspecting proxies
(#578). The updater now validates TLS against the OS trust store via ureq's
PlatformVerifier, so corporate roots installed in the system keychain/store
are honored.gain --deepreported "Daemon: offline" on Windows while the daemon was
running (#576). The footer's daemon-status probe used a Unix-only check; it
now reports the daemon state correctly on Windows too.
Upgrade
lean-ctx update # recommended (auto-downloads + refreshes shell hooks)
cargo install lean-ctx # or
npm update -g lean-ctx-bin # or
brew upgrade lean-ctxNote: After upgrading via cargo/npm/brew, run
lean-ctx setupto refresh shell aliases.lean-ctx updatedoes this automatically.
Full Changelog: v3.8.14...v3.8.14