feat(stablehlo): add KV cache I/O rewriting to program emitter#74
Merged
feat(stablehlo): add KV cache I/O rewriting to program emitter#74
Conversation
Cache key is SHA256(stablehlo_mlir + platform_name). Serialized PJRT executables are stored to $ZERFOO_PJRT_CACHE or ~/.cache/zerfoo/pjrt/. LRU eviction kicks in when total size exceeds configurable max (default 2 GB). Atomic writes via tmp+rename. Thread-safe via sync.Mutex. Implements T64.1.1 and T64.1.2.
PJRT is pure-functional and cannot handle mutable state. The graph's StatefulInputNode KV cache feedback must be rewritten as explicit function I/O: KV cache tensors become both function arguments and return values. Add KVCacheSlot type and EmitKVCacheProgram function that: - Adds KV cache inputs as extra function arguments - Adds KV cache outputs as extra return values (tuple return) - For decode programs, emits stablehlo.concatenate to append new KV step along the sequence axis - For prefill programs, passes KV outputs through directly Implements T61.3.2 from plan-pjrt.md.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Phase 3b PJRT pipeline: KV cache rewriting.