Conversation
There was a problem hiding this comment.
Code Review
Good targeted fix for a well-analyzed regression. The root-cause analysis in the PR description is excellent — identifying both the write-side (1 MB+ ProduceRequests exceeding the 1 MB pool limit) and the read-side (16-32 concurrent 64 KB segments overwhelming the 8-slot bucket) as the dual pressure sources is precisely the right diagnosis.
What's Good
- The XML doc comments now clearly explain why these specific values were chosen, which is the most important thing for these kinds of "magic number" tuning changes.
- The fix is minimal and targeted — exactly 7 changed lines of logic, with the rest being documentation.
- Updating
Rent_LargeBuffer_Succeedsto test above the new 4 MB threshold keeps the "falls through gracefully" guarantee verified.
Concerns
1. Memory retention increase per connection
The original motivation for PipeMemoryPool was to prevent unbounded WorkingSet growth by scoping retention to connection lifetime. The new defaults increase per-connection pool capacity significantly:
- 4 MB bucket × 32 slots = 128 MB potential retention in the largest bucket alone
- Additional smaller buckets (2 MB, 1 MB, 512 KB…) × 32 = further tens of MB
In a multi-broker topology (e.g. 6 brokers), that's potentially hundreds of MB of peak retained memory per connection, just from the 4 MB bucket. This partially undermines the original fix in b13330e.
Suggestion: Consider whether maxArraysPerBucket = 32 is actually needed for the 4 MB bucket, or whether the large value is only needed for the smaller 64 KB bucket. ArrayPool.Create() uses the same maxArraysPerBucket for every bucket size. A pool that retains 32× 4 MB arrays is very different from one that retains 32× 64 KB arrays. There's no built-in way to vary this per bucket with ArrayPool.Create(), but it's worth acknowledging the tradeoff explicitly in the comment — or accepting a custom pool implementation that limits retention only for large buckets.
2. Implicit coupling between pool size and batch size
The maxArrayLength = 4 MB is chosen to cover a 1 MB batch + framing overhead. But this relationship is not enforced in code — it's just a documentation choice. If ProducerOptions.BatchSize (or equivalent) is increased to e.g. 2 MB, the pool will silently revert to fresh allocations on every send.
Suggestion: Either:
- Pass the effective max request size to
PipeMemoryPoolat construction time (sized fromProducerOptions), so the pool scales with configuration, or - At minimum add a comment linking to the relevant option so a future maintainer knows to update both
3. The happy-path isn't directly tested
Rent_LargeBuffer_Succeeds only tests the fallthrough case (above the threshold). There's no test verifying that a 1 MB or 3 MB buffer is actually returned to and reused from the pool — which is the scenario this PR is fixing. This is admittedly hard to test without exposing ArrayPool internals, but it means the unit test suite doesn't directly validate the fix.
Suggestion: A benchmark test or integration test verifying allocation counts in the idempotent producer path would close this gap. The stress test in the test plan is the right vehicle for this — it should be run before merging.
4. Stress test not yet confirmed
The test plan notes [ ] Run stress test with idempotent producer (1 broker) to verify allocation regression is resolved is unchecked. Given the severity of the regression (165x allocation increase, 101x Gen2 collections), confirming the fix with actual profiling data before merging seems worth the wait.
Summary
The analysis and fix are correct. The main risk is that increasing maxArraysPerBucket from 8 to 32 uniformly across all bucket sizes significantly raises per-connection memory retention — especially in multi-broker scenarios — which may create a different kind of memory pressure than the original WorkingSet growth bug. Worth explicitly acknowledging this tradeoff, and ideally running the stress test to confirm both the regression is fixed and memory doesn't spike in multi-broker configs.
There was a problem hiding this comment.
Code Review
This PR correctly diagnoses and fixes the allocation regression introduced by commit b13330e. The root-cause analysis in the PR description is excellent, and the fix is well-scoped. A previous review by claude[bot] (2026-03-29) raised four concerns — I'll assess each and add any additional observations.
Previous Review Concerns — Status
1. Memory retention increase per connection — Still present, not yet addressed.
The math remains: 4 MB bucket × 32 slots = 128 MB potential retention in the largest bucket alone, per connection. In a 6-broker topology, this could reach ~768 MB+ from the large bucket alone. The PR description acknowledges this tradeoff in the XML doc comment, but it doesn't mitigate it.
The comment notes: In practice, only the small buckets (64 KB) fill to capacity; the large buckets (4 MB) rarely retain more than 1-2 arrays. That is a reasonable heuristic, but ConfigurableArrayPool (returned by ArrayPool<byte>.Create()) allocates its bucket array storage for all maxArraysPerBucket slots up front in a fixed-size array per bucket. The arrays themselves won't be pinned in memory unless actually rented and returned — so the idle cost is just the bucket's reference array overhead, not 128 MB. The actual memory impact is bounded by real concurrent usage, which supports the comment's claim. This should be explicitly noted in the doc comment to avoid future confusion — the current wording (peak per-connection retention is bounded by connection lifetime) is correct but may be misread as endorsing a 128 MB ceiling.
2. Implicit coupling between pool size and batch size — Partially mitigated by doc comment, but still not code-enforced.
The XML doc says If BatchSize is increased beyond ~3.5 MB, this value should be increased accordingly. This is an acceptable documentation-only mitigation for a low-change-frequency value. However, if the pool were constructed with a value derived from ProducerOptions.BatchSize (e.g., BatchSize * 4 + overhead), the coupling would be automatic. This is a design suggestion worth tracking, not a blocker.
3. Happy-path not directly tested — Still not addressed.
The test Rent_LargeBuffer_Succeeds now tests above the new 4 MB threshold (8 MB request), which correctly verifies the fallthrough path. But there is still no test confirming that a ~1 MB or ~3 MB buffer is actually returned to the pool and reused (i.e., the fix's core claim). That reuse behavior is internal to ArrayPool, so direct unit testing is not straightforward — but a benchmark or stress test observing allocation counts would validate it. The stress test in the test plan is the right vehicle.
4. Stress test not yet confirmed — Still unchecked.
Given that this PR fixes a 165x allocation regression that caused severe Gen2 collection pressure, merging without stress test confirmation carries real risk: the fix might correct one path while introducing a different regression elsewhere. The stress test should be run and its allocation numbers posted before merge.
Additional Observations
The fallthrough behavior is semantically surprising for writers
When PipeWriter.GetMemory() requests a buffer larger than maxArrayLength, ArrayPool.Rent() falls through to a fresh allocation (via GC.AllocateUninitializedArray or new byte[]) that is never returned to the pool — ArrayPool.Return() silently drops oversized arrays. The PooledMemoryOwner.Dispose() always calls _pool.Return(array), which is correct behavior since ArrayPool handles oversized returns gracefully, but it's worth confirming this is intentional and known. Increasing maxArrayLength to 4 MB means that ProduceRequests up to ~4 MB will now be pooled rather than dropped, which is precisely the fix.
The test name in the diff is now misleading
The updated test comment says Request larger than default maxArrayLength (4 MB) — falls through to new allocation, and the test requests 8 MB. This is correct post-change. However, the test method name Rent_LargeBuffer_Succeeds does not distinguish which behavior (pooled vs fallthrough) is being tested. Consider renaming to Rent_BufferExceedingMaxArrayLength_FallsThrough to make the intent clear and to survive further threshold changes.
Summary
The fix is technically correct and the PR description is exemplary. The two items that should be resolved before merging are:
- Run the stress test and post allocation numbers from the idempotent single-broker scenario — this directly validates the fix and guards against regressions in other paths.
- Consider renaming
Rent_LargeBuffer_Succeedsto reflect the fallthrough-specific behavior being tested.
The memory retention concern is real but acceptable given the ConfigurableArrayPool slot storage behavior — it just needs a clearer comment clarifying that 32 slots per bucket is an upper bound on cached arrays, not pre-allocated memory.
There was a problem hiding this comment.
Code Review (follow-up)
Two previous reviews (both 2026-03-29) raised four concerns. This review focuses on what has and hasn't been addressed after the two follow-up commits.
What has been addressed
Test rename — Rent_LargeBuffer_Succeeds is now Rent_BufferExceedingMaxArrayLength_FallsThrough. Good — the name now survives a future threshold change without becoming misleading.
Slot storage clarification — The XML doc now explicitly states that ConfigurableArrayPool pre-allocates a fixed-size reference array per bucket but that the actual byte[] arrays are only allocated on rent. This directly addresses the prior concern that the comment could be read as endorsing a 128 MB ceiling. The current wording is correct and precise.
What remains open
1. Stress test not yet confirmed (blocking)
The test plan still shows [ ] Run stress test with idempotent producer (1 broker). This is the only direct evidence that the regression (165x allocation increase, 101x Gen2 collections, GC sawtooth every 2 seconds) is actually resolved. The PR description's root-cause analysis is convincing, but the fix is a constants change with no automated verification of the core claim. Given the severity of the original regression, this should be confirmed before merge.
2. Implicit coupling between maxArrayLength and ProducerOptions.BatchSize (non-blocking)
The doc comment now says "If BatchSize is increased beyond ~3.5 MB, this value should be increased accordingly" — this is a documentation-only mitigation. The concern stands: a BatchSize change in ProducerOptions will silently re-introduce the regression with no compiler or runtime warning. A code-level link (e.g. deriving maxArrayLength from the options at construction time) would make this self-healing. This is a design suggestion, not a blocker for this PR.
One new observation: Rent_BufferExceedingMaxArrayLength_FallsThrough allocates 8 MB in a unit test
The test uses pool.Rent(8 * 1024 * 1024) to verify fallthrough. This is correct for testing the behavior, but it allocates an 8 MB array in a unit test — potentially problematic in constrained CI environments running many tests in parallel. Consider using a smaller multiplier (e.g. 5 * 1024 * 1024 — just above the 4 MB threshold) to keep the test minimal while still exercising the fallthrough path. Low priority, but worth noting.
Summary
The fix is correct and the documentation has been meaningfully improved through the follow-up commits. The two remaining items are:
- Blocking: Run the stress test and post allocation numbers confirming the regression is resolved.
- Non-blocking / future work: Consider deriving
maxArrayLengthfromProducerOptions.BatchSizeto eliminate the silent coupling.
The code itself is ready to merge once the stress test is confirmed.
There was a problem hiding this comment.
Code Review (follow-up on latest commit)
Three previous reviews from today raised several concerns. This review tracks the final status.
All previous concerns — resolved or carried over
Test rename ( → ) — ✅ Done (commit c645d6d3)
Slot storage clarification in XML doc — ✅ Done (commit c645d6d3). The doc now correctly distinguishes between the pre-allocated reference array and the lazily-allocated byte[] arrays.
Test allocation size reduced from 8 MB to 5 MB — ✅ Done (latest commit 8de16ccb). The test now uses exactly maxArrayLength + 1 MB which is the minimum size needed to confirm fallthrough, not an arbitrary large value.
Implicit coupling between maxArrayLength and ProducerOptions.BatchSize — Non-blocking, still open. The doc comment "If BatchSize is increased beyond ~3.5 MB, this value should be increased accordingly" is a documentation-only mitigation. This is acceptable for now, but the longer-term fix (deriving maxArrayLength at construction from the configured batch size) would make it self-healing. No action needed before merge.
One remaining blocking item
Stress test not yet confirmed — The test plan still shows:
[ ] Run stress test with idempotent producer (1 broker) to verify allocation regression is resolved
This is the only item that directly validates the core claim — that the 165x allocation regression is actually fixed. The root-cause analysis is compelling, but without profiling data before and after, there's no confirmation that the fix works as described or that it doesn't introduce a different regression (e.g., memory spike from pool saturation under sustained load).
Please run the stress test and post allocation numbers before merging.
Summary
All review feedback has been addressed cleanly and promptly. The code is correct, the documentation is clear, and the test suite is solid. One item remains: run the stress test and confirm the allocation regression is resolved before merging.
There was a problem hiding this comment.
Code Review (follow-up on latest commit)
Three previous reviews from today raised several concerns. This review tracks the final status.
All previous concerns — resolved or carried over
Test rename (Rent_LargeBuffer_Succeeds → Rent_BufferExceedingMaxArrayLength_FallsThrough) — ✅ Done (commit c645d6d3)
Slot storage clarification in XML doc — ✅ Done (commit c645d6d3). The doc now correctly distinguishes between the pre-allocated reference array and the lazily-allocated byte[] arrays.
Test allocation size reduced from 8 MB to 5 MB — ✅ Done (latest commit 8de16ccb). The test now uses exactly maxArrayLength + 1 MB — the minimum needed to confirm fallthrough, not an arbitrary large value.
Implicit coupling between maxArrayLength and ProducerOptions.BatchSize — Non-blocking, still open. The doc comment "If BatchSize is increased beyond ~3.5 MB, this value should be increased accordingly" is a documentation-only mitigation. This is acceptable for now, but the longer-term fix (deriving maxArrayLength at construction from the configured batch size) would make it self-healing. No action needed before merge.
One remaining blocking item
Stress test not yet confirmed — The test plan still shows:
[ ] Run stress test with idempotent producer (1 broker) to verify allocation regression is resolved
This is the only item that directly validates the core claim — that the 165x allocation regression is actually fixed. The root-cause analysis is compelling, but without profiling data before and after, there's no confirmation that the fix works as described or that it doesn't introduce a different regression (e.g., memory spike from pool saturation under sustained load).
Please run the stress test and post allocation numbers before merging.
Summary
All review feedback has been addressed cleanly and promptly. The code is correct, the documentation is clear, and the test suite is solid. One item remains: run the stress test and confirm the allocation regression is resolved before merging.
Main reverted to original defaults (1 MB / 4 buckets) after PR #696 merged. This PR's fix (4 MB / 32 buckets) is the correct resolution for the idempotent allocation regression.
There was a problem hiding this comment.
Code Review (follow-up — latest merge commit e77cb05)
Four previous reviews from today have tracked this PR carefully. This review assesses the final state after all five commits, including the merge commit that resolves the conflict with main.
Commit history summary
| Commit | What changed |
|---|---|
1244f212 |
Initial fix: 1 MB → 4 MB, 8 → 32 buckets |
61b7fb09 |
Docs: clarified BatchSize coupling and memory tradeoffs |
c645d6d3 |
Test rename + slot storage clarification in XML doc |
8de16ccb |
Test allocation reduced from 8 MB to 5 MB |
e77cb050 |
Merge conflict resolved (main reverted to originals after PR #696; this PR correctly re-applies the 4 MB / 32 bucket values) |
Status of all previously raised concerns
Test rename (Rent_LargeBuffer_Succeeds → Rent_BufferExceedingMaxArrayLength_FallsThrough) — resolved in c645d6d3.
XML doc slot storage clarification — resolved in c645d6d3. The comment now correctly states that ConfigurableArrayPool pre-allocates a fixed-size reference array per bucket, but the actual byte[] arrays are lazily allocated and only consume memory when rented and returned, so idle buckets hold null slots, not memory.
Test allocation size — resolved in 8de16ccb. The test now uses 5 MB (just above the 4 MB threshold), which is the minimum size to confirm the fallthrough path.
Merge conflict — correctly resolved in e77cb050. The commit message accurately explains why PR #696 reverted the defaults and why this PR's values are the right resolution.
Implicit coupling between maxArrayLength and ProducerOptions.BatchSize — still a documentation-only mitigation. Acceptable for this PR; worth tracking as future work to make it self-healing.
One blocking item remains
Stress test not yet confirmed.
The test plan checkbox [ ] Run stress test with idempotent producer (1 broker) is still unchecked. This is the only item that directly validates the core claim — that the 165x allocation regression and 101x Gen2 collection rate are actually resolved by this change.
The root-cause analysis is compelling and the fix is logically correct, but there is no automated test that catches regressions in pool-hit-rate at the integration level. The stress test fills that gap. Please run:
dotnet run --project tools/Dekaf.StressTests --configuration Release -- \
--duration 60 \
--message-size 1000 \
--scenario idempotent \
--client dekafand post the allocation numbers (Gen0/Gen1/Gen2 GC counts, total bytes allocated, GC frequency) before merging. Given that this PR fixes a critical performance regression, confirming it actually works is worth the one-time cost.
No new concerns with the merge commit
The merge commit (e77cb050) is clean — it correctly keeps the PR #703 values over the reverted main defaults. The commit message explains the conflict context clearly, which is helpful for future git blame readers.
Summary
The code is correct, documentation is thorough, and all review feedback from previous rounds has been addressed cleanly. The single remaining item is running the stress test and posting allocation numbers to confirm the regression is resolved. Once that's done, this is ready to merge.
…allocation regression The per-connection PipeMemoryPool introduced in b13330e used ArrayPool<byte>.Create(maxArrayLength: 1MB, maxArraysPerBucket: 8). This ConfigurableArrayPool was too constrained for high-throughput idempotent producer workloads: - maxArrayLength of 1 MB caused ProduceRequests slightly over 1 MB (after header/framing overhead) to fall through to new allocations on every send, since the pool cannot cache arrays above its limit. - maxArraysPerBucket of 8 was insufficient for the concurrent segment demand from pipelined ProduceResponses. The input pipe's read pump creates ~64 KB segments that stay alive until the reader advances past them. With multiple pipelined responses in flight, 16-32 segments can be active simultaneously, overwhelming the 8-slot pool and forcing fresh allocations for every overflow. The idempotent path (Acks.All) was hit 165x harder than non-idempotent (Acks.None) because it reads ProduceResponses through the input pipe, creating bidirectional pool pressure. The non-idempotent fire-and-forget path has no response pipeline, so only the write side was affected. Increase maxArrayLength to 4 MB and maxArraysPerBucket to 32 to match the concurrent demand while preserving the per-connection lifecycle scoping that prevents WorkingSet growth in multi-broker scenarios. Fixes #697
Address review feedback: - Document relationship between maxArrayLength and ProducerOptions.BatchSize - Explain that large buckets rarely fill to capacity in practice - Note that per-connection retention is bounded by connection lifetime
Address review feedback: - Rename Rent_LargeBuffer_Succeeds to Rent_BufferExceedingMaxArrayLength_FallsThrough to clearly indicate the fallthrough-specific behavior being tested - Clarify that ConfigurableArrayPool pre-allocates reference arrays per bucket but actual byte[] arrays are only allocated on rent — idle buckets hold null slots, not memory
Use 5 MB (just above the 4 MB pool threshold) instead of 8 MB to minimize unnecessary memory pressure in tests.
The ex.Elapsed and elapsedMs upper-bound checks are inherently flaky on CI runners where thread pool starvation can delay SemaphoreSlim timer callbacks by 40+ seconds. The meaningful assertions (exception type, TimeoutKind, Configured value) are preserved. The lower-bound check (>= 400ms) confirms the timeout fires near the configured 500ms.
492dfc5 to
a396ac3
Compare
There was a problem hiding this comment.
Code Review (follow-up — latest batch of commits)
Six previous reviews have tracked this PR end-to-end. This review covers the final rebased commit batch and provides a clean summary of the overall state.
Commit history — full summary
| Commit | Change |
|---|---|
e345346 |
Initial fix: 1 MB → 4 MB, 8 → 32 buckets |
d2102de |
Docs: BatchSize coupling and memory tradeoff explanation |
7483d65 |
Test rename + XML doc slot-storage clarification |
d31db8f |
Test allocation reduced from 8 MB to 5 MB (minimum above threshold) |
a396ac3 |
Remove upper-bound timing assertions in BufferMemoryTests (flaky on CI) |
Status of all previously raised concerns
Memory retention per connection — acknowledged in docs. The XML doc comment now correctly explains that ConfigurableArrayPool pre-allocates reference arrays per bucket but the actual byte[] arrays are lazily retained — idle buckets hold null slots, not memory. The concern about 128 MB theoretical ceiling has been correctly scoped: in practice the large buckets (4 MB) rarely hold more than 1-2 arrays concurrently. Acceptable.
Implicit coupling between maxArrayLength and ProducerOptions.BatchSize — documentation-only mitigation remains. The comment says "If BatchSize is increased beyond ~3.5 MB, this value should be increased accordingly." This is non-blocking for this PR. Future improvement: derive maxArrayLength from the configured batch size at construction time to make it self-healing.
Test rename — ✅ Rent_LargeBuffer_Succeeds → Rent_BufferExceedingMaxArrayLength_FallsThrough.
Test allocation size — ✅ Reduced to 5 MB (minimum needed to confirm fallthrough behavior).
Timing assertions removed from BufferMemoryTests — The elapsedMs.IsLessThan(10000) and ex.Elapsed.IsLessThanOrEqualTo(ex.Configured + 30s) upper bounds were removed. This is correct: those assertions were inherently flaky under CI thread-pool starvation. The remaining IsGreaterThanOrEqualTo(400) lower bound preserves the meaningful part of the test (the timeout actually fires at or near the configured 500ms).
One blocking item — still unresolved
Stress test not yet confirmed. The test plan still shows:
[ ] Run stress test with idempotent producer (1 broker) to verify allocation regression is resolved
This is the only direct validation that the 165x allocation regression and 101x Gen2 collection rate are actually resolved. The root-cause analysis is sound and the fix is logically correct, but there is no automated test that catches pool-hit-rate regressions at integration level. Please run:
dotnet run --project tools/Dekaf.StressTests --configuration Release -- \
--duration 60 \
--message-size 1000 \
--scenario idempotent \
--client dekafand post the Gen0/Gen1/Gen2 GC counts and allocation rate before merging. Given the severity of the original regression, this one-time confirmation is worth the cost.
Summary
The code is correct, the documentation is thorough, and all review feedback from previous rounds has been cleanly addressed. The only item remaining before merge is running the stress test and confirming the allocation regression is resolved.
There was a problem hiding this comment.
Code Review (follow-up — rebased PR, 7th review pass)
Six previous reviews have tracked this PR end-to-end. All raised concerns have been addressed across five commits. This review provides a final summary.
All previously raised concerns — final status
| Concern | Status |
|---|---|
| Memory retention per connection (128 MB theoretical ceiling) | Resolved — XML doc now correctly explains lazily-retained arrays; idle buckets hold null slots, not memory |
Implicit coupling between maxArrayLength and ProducerOptions.BatchSize |
Documentation-only mitigation. Non-blocking for this PR; tracked as future work |
Test rename (Rent_LargeBuffer_Succeeds → Rent_BufferExceedingMaxArrayLength_FallsThrough) |
Resolved in 7483d65c |
| Test allocation size reduced (8 MB → 5 MB) | Resolved in d31db8fd |
Flaky upper-bound timing assertions in BufferMemoryTests |
Resolved in a396ac3b — correct removal; the lower bound (≥ 400ms) preserves the meaningful assertion |
One blocking item remains: stress test unconfirmed
The test plan still shows:
[ ] Run stress test with idempotent producer (1 broker) to verify allocation regression is resolved
This is the only direct evidence that the 165x allocation regression and 101x Gen2 collection rate are actually resolved. The root-cause analysis is sound and the fix is logically correct, but there is no automated test that catches pool-hit-rate regressions. Please run:
dotnet run --project tools/Dekaf.StressTests --configuration Release -- \
--duration 60 \
--message-size 1000 \
--scenario idempotent \
--client dekafand post the Gen0/Gen1/Gen2 GC counts and allocation rate as a PR comment before merging. Given the severity of the original regression (described as causing GC sawtooth every 2 seconds), this one-time confirmation is worth the cost.
One minor note: maxArraysPerBucket default is still a uniform value
The XML doc correctly explains that ConfigurableArrayPool applies maxArraysPerBucket = 32 uniformly across all bucket sizes. The concern that was raised (32 × 4 MB slots vs 32 × 64 KB slots representing very different memory ceilings) is now adequately documented. For future work, a custom MemoryPool<byte> implementation that limits large-bucket retention to 2–4 slots while keeping 32 slots for small buckets would be more precise — but this is firmly out of scope for this bug-fix PR.
Summary
The code is correct. All review feedback from previous rounds has been addressed cleanly. The only item standing between this PR and merge is running the stress test and posting allocation numbers confirming the regression is resolved. Once that checkbox is checked with actual data, this is ready to merge.
Summary
PipeMemoryPool.maxArrayLengthfrom 1 MB to 4 MB so ProduceRequests with 1 MB batches (plus header/framing overhead) and coalesced multi-batch requests are cached instead of falling through to new allocations every sendPipeMemoryPool.maxArraysPerBucketfrom 8 to 32 to handle concurrent segment demand from pipelined ProduceResponses in the input pipe, where 16-32 active 64 KB segments can be alive simultaneously during high-throughput idempotent productionRoot cause
Commit b13330e replaced
MemoryPool<byte>.Sharedwith a per-connectionPipeMemoryPoolbacked byArrayPool<byte>.Create(maxArrayLength: 1MB, maxArraysPerBucket: 8). TheConfigurableArrayPoolcreated byArrayPool.Create()has far smaller caches thanArrayPool.Shared(TlsOverPerCoreLockedStacksArrayPool). Under high-throughput idempotent production:PipeWriter.GetMemory(length)requests ~1 MB+ buffers for serialized ProduceRequests. Buffers exceeding the 1 MBmaxArrayLengthcannot be cached, causing a freshbyte[]allocation on every send.GetMemory(65536)that stay alive untilAdvanceTo. With pipelinedAcks.Allresponses, multiple response frames accumulate in the pipe, keeping 16-32 segments alive simultaneously. The 8-slot pool overflows, forcing fresh allocations for every excess segment.The idempotent path sees 165x more allocations because it has bidirectional pool pressure (write + read), while non-idempotent
Acks.Nonehas no response pipeline (only 45% increase from the write side alone).Test plan
PipeMemoryPoolTestspass (updatedRent_LargeBuffer_Succeedsto test above new 4 MB threshold)Fixes #697