fix: avoid excessive mem increase during boundary#905
Conversation
📝 WalkthroughWalkthroughIntroduces a WorkBuffer state machine for deterministic block/boundary handling, refactors RollWorkUnit construction to take CardanoConfig and Cache and moves delta computation into load, converts fjall EntityIterator to lazy streaming, and adds memory-usage tests plus dev-dependencies for stats_alloc and dolos-testing. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant WB as WorkBuffer
participant Caller as CardanoLogic/Coordinator
participant RWU as RollWorkUnit
participant Delta as roll::compute_delta
Client->>WB: receive_block(block, eras, stability_window)
activate WB
WB->>WB: detect boundaries / extend batch
WB-->>Client: updated buffer
deactivate WB
Caller->>WB: pop_work(stop_epoch)
activate WB
alt work available
WB-->>Caller: Some(InternalWorkUnit::Blocks or boundary)
else
WB-->>Caller: None
end
deactivate WB
Caller->>RWU: RollWorkUnit::new(batch, genesis, live, config, cache)
activate RWU
RWU-->>Caller: constructed RWU (holds config & cache)
deactivate RWU
Caller->>RWU: load(domain)
activate RWU
RWU->>RWU: decode UTxOs from batch
RWU->>Delta: compute_delta::<D>(..., config, cache)
activate Delta
Delta-->>RWU: deltas/result
deactivate Delta
RWU-->>Caller: ready work unit
deactivate RWU
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/cardano/src/lib.rs (1)
8-12:⚠️ Potential issue | 🟡 MinorFix unused import flagged by CI:
Block as _.The pipeline reports
clippy: unused import 'Block'at line 9. Remove the unused trait import.🔧 Proposed fix
use dolos_core::{ - config::CardanoConfig, Block as _, BlockSlot, ChainError, ChainPoint, Domain, DomainError, + config::CardanoConfig, BlockSlot, ChainError, ChainPoint, Domain, DomainError, EntityKey, EraCbor, Genesis, MempoolAwareUtxoStore, MempoolTx, MempoolUpdate, RawBlock, StateStore, TipEvent, WorkUnit, };As per coding guidelines: "Run
cargo clippy --workspace --all-targets --all-featuresand resolve all clippy warnings before committing changes."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/cardano/src/lib.rs` around lines 8 - 12, Remove the unused trait import `Block as _` from the use list in lib.rs to satisfy Clippy; edit the import statement that currently brings in `Block as _` (alongside symbols like CardanoConfig, BlockSlot, ChainError, etc.) and delete the `Block as _` entry so only actually used symbols remain imported.
🧹 Nitpick comments (6)
tests/memory.rs (1)
36-55: Subtle test ordering: the memory assertion only covers iterator creation, not consumption.The
Regiontracks allocations between lines 36 and 42, which only covers theiter_entities()call — not the subsequentiter.count()on line 54 that actually drives the iterator. This correctly validates that the iterator is lazily initialized (the core fix), but it does not assert that streaming through all 50k entities stays within bounds.If you also want to assert bounded memory during full iteration, move
reg.change()afteriter.count():♻️ Proposed: assert memory across full iteration
let reg = Region::new(&GLOBAL); let iter = store .iter_entities(NS, EntityKey::full_range()) .expect("iter_entities failed"); - let stats = reg.change(); - let heap_delta = stats.bytes_allocated as usize; - - let threshold = 10 * 1024 * 1024; // 10 MB - assert!( - heap_delta < threshold, - "iter_entities should allocate O(1) memory (lazy). \ - Allocated {} bytes but threshold is {} bytes.", - heap_delta, - threshold, - ); - let count = iter.count(); assert_eq!(count, ENTITY_COUNT as usize, "iterator should yield all entities"); + + let stats = reg.change(); + let heap_delta = stats.bytes_allocated as usize; + + let threshold = 10 * 1024 * 1024; // 10 MB + assert!( + heap_delta < threshold, + "iter_entities should allocate O(1) memory (lazy). \ + Allocated {} bytes but threshold is {} bytes.", + heap_delta, + threshold, + );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/memory.rs` around lines 36 - 55, The memory assertion currently measures allocations only during iterator creation (Region::new / reg.change around iter_entities), not during consumption; to ensure bounded memory for full streaming, move the reg.change()/let stats.../let heap_delta... block to after iter.count() (or add a second reg.change() and assertion after iter.count()) so that the heap_delta reflects allocations during iter.count() (the Entity iterator produced by store.iter_entities) and assert it against threshold.crates/fjall/src/state/entities.rs (1)
63-113: Good refactor to lazy streaming — consider logging malformed keys.Both the fjall and redb3 backends now implement lazy streaming iteration: fjall wraps
fjall::Iterand redb3 wrapsredb::Range, each decoding items one at a time innext(). This maintains consistency between backends and keeps memory usage O(1) as verified by the memory tests.The
guard.into_inner()API usage is correct for fjall 3.0.0.One concern: silently skipping malformed keys (lines 102–107) could mask data corruption. A
tracing::warn!on the skip path would aid diagnosis without affecting performance in the normal case.🔧 Suggested: log malformed keys
if key_bytes.len() >= PREFIXED_KEY_SIZE { let entity_key = decode_entity_key(&key_bytes); let entity_value = value_bytes.to_vec(); return Some(Ok((entity_key, entity_value))); } - // Skip malformed keys (too short), continue to next + // Skip malformed keys (too short), continue to next + tracing::warn!(key_len = key_bytes.len(), "skipping malformed entity key (too short)");🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/fjall/src/state/entities.rs` around lines 63 - 113, In the Iterator impl for EntityIterator (fn next()), add a tracing::warn! when you hit the malformed-key branch (where key_bytes.len() < PREFIXED_KEY_SIZE) so skipped keys are visible; include the offending key length and a short hex or base64 snippet of key_bytes to aid debugging, e.g. use hex::encode(&key_bytes) or base64::engine::general_purpose::STANDARD.encode(&key_bytes) in the message, and leave the loop behavior unchanged (continue to next item) so normal streaming semantics in EntityIterator::next() are preserved.crates/cardano/src/work.rs (4)
253-295: Significant duplication betweenfeed_and_drainandfeed_drain_with_cursors.Both helpers share identical drain/feed loop structures. Consider refactoring into a single core function that always tracks cursors, with
feed_and_drainas a thin wrapper that discards them.Also applies to: 302-349
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/cardano/src/work.rs` around lines 253 - 295, Extract the common feed/drain loop into a single core function (e.g., feed_drain_core) that accepts and returns cursor/state info along with the Vec<WorkTag>, reusing the same logic currently in feed_and_drain and feed_drain_with_cursors (including the pop_work/receive_block interactions and the InternalWorkUnit::ForcedStop handling using tag_from_internal). Then implement feed_drain_with_cursors to call feed_drain_core and return both tags and cursors, and make feed_and_drain a thin wrapper that calls feed_drain_core and drops/ignores cursors, preserving existing signatures/behavior for block creation (make_block), eras/stability_window use, and stop_epoch handling.
189-191:ForcedStopis an absorbing state that emits infinitely — caller must break the loop.
pop_workonForcedStopreturns(Some(ForcedStop), ForcedStop)indefinitely. This is handled correctly in the test helpers (line 285-288), but any future caller ofpop_workin a loop must explicitly check forForcedStopto avoid an infinite loop.A brief doc comment on
pop_worknoting this contract would be helpful.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/cardano/src/work.rs` around lines 189 - 191, Add a doc comment to the pop_work function documenting that WorkBuffer::ForcedStop (mapped to InternalWorkUnit::ForcedStop) is an absorbing state and pop_work will repeatedly return (Some(InternalWorkUnit::ForcedStop), WorkBuffer::ForcedStop) until the caller breaks out; instruct callers to check for InternalWorkUnit::ForcedStop / WorkBuffer::ForcedStop and explicitly stop their loops to avoid infinite emission (refer to the existing test helper behavior around ForcedStop for example).
67-83:Emptyarm inextend_batchis dead code viareceive_block.
receive_blockalways interceptsEmptyat line 119 and routes toon_genesis_boundary, soextend_batchis never called in theEmptystate through the public API. Consider removing it or adding a comment explaining if it's intentionally kept for future use.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/cardano/src/work.rs` around lines 67 - 83, The match arm handling WorkBuffer::Empty inside extend_batch is dead because receive_block already routes Empty to on_genesis_boundary; remove the WorkBuffer::Empty arm from extend_batch (leaving Restart and OpenBatch handling and the unreachable!() for other cases) or, if you want to keep it for future use, replace the arm with a clear comment above extend_batch referencing receive_block and on_genesis_boundary to explain why Empty is present; update any affected pattern coverage and add a short comment in extend_batch mentioning the decision.
13-34: Consider derivingDebugonInternalWorkUnitandWorkBuffer.Neither enum derives
Debug, which makes it harder to diagnose state machine issues at runtime (e.g., in logs or panic messages). Theunreachable!()calls throughout would benefit from a debug representation of the current state.Proposed change
+#[derive(Debug)] pub(crate) enum InternalWorkUnit { Genesis, Blocks(WorkBatch), EWrap(BlockSlot), EStart(BlockSlot), Rupd(BlockSlot), ForcedStop, } +#[derive(Debug)] pub(crate) enum WorkBuffer {This assumes
WorkBatchandOwnedMultiEraBlockimplementDebug. If not, a manualDebugimpl that prints the variant name and key metadata (e.g., slot numbers) would still be valuable.#!/bin/bash # Check if WorkBatch and OwnedMultiEraBlock implement Debug echo "=== WorkBatch ===" rg -n "struct WorkBatch" --type rust -A 3 echo "=== OwnedMultiEraBlock ===" rg -n "struct OwnedMultiEraBlock" --type rust -A 3🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/cardano/src/work.rs` around lines 13 - 34, The enums InternalWorkUnit and WorkBuffer currently lack Debug implementations which makes logging and panic traces unhelpful; add derives for Debug (e.g., #[derive(Debug, Clone, PartialEq, Eq)] as appropriate) to both InternalWorkUnit and WorkBuffer, ensuring WorkBatch and OwnedMultiEraBlock also implement Debug (or provide a manual Debug impl that prints variant names and key metadata like slot numbers) so the compiler can derive Debug successfully; update the enum declarations (InternalWorkUnit, WorkBuffer) to include the derive attributes and, if necessary, implement Debug for WorkBatch/OwnedMultiEraBlock or write custom Debug impls that surface useful state for logging.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/cardano/src/work.rs`:
- Line 53: The match arm for WorkBuffer::ForcedStop uses unreachable!() which
can panic for crate-internal callers of the pub method last_point_seen; change
this to a safe, non-panicking behavior: either return the last known point
(e.g., clone and return the stored tip) or make last_point_seen fallible (return
Option<Point> or Result<Point, WorkError>) and propagate that through callers
such as receive_block (which already uses can_receive_block) so you remove
unreachable!(); update the WorkBuffer::ForcedStop arm to return the chosen
value/type and adjust all call sites of last_point_seen (or its signature)
accordingly.
- Around line 634-653: The replay filter currently excludes the boundary slot
because replay_slots is built with .filter(|&s| s > cursor.slot()), but for a
ChainPoint::Slot cursor (established by update_cursor at EStart) semantics are
inclusive; change the filter to .filter(|&s| s >= cursor.slot()) when
constructing replay_slots (located adjacent to estart_idx, full[idx], and
WorkBuffer::Restart usage) and keep using feed_and_drain to produce
restart_tags; additionally tighten the assertion to ensure the boundary block
110 is actually the start of a Blocks unit by asserting restart_tags contains a
WorkTag::Blocks { first, .. } with first == 110.
---
Outside diff comments:
In `@crates/cardano/src/lib.rs`:
- Around line 8-12: Remove the unused trait import `Block as _` from the use
list in lib.rs to satisfy Clippy; edit the import statement that currently
brings in `Block as _` (alongside symbols like CardanoConfig, BlockSlot,
ChainError, etc.) and delete the `Block as _` entry so only actually used
symbols remain imported.
---
Nitpick comments:
In `@crates/cardano/src/work.rs`:
- Around line 253-295: Extract the common feed/drain loop into a single core
function (e.g., feed_drain_core) that accepts and returns cursor/state info
along with the Vec<WorkTag>, reusing the same logic currently in feed_and_drain
and feed_drain_with_cursors (including the pop_work/receive_block interactions
and the InternalWorkUnit::ForcedStop handling using tag_from_internal). Then
implement feed_drain_with_cursors to call feed_drain_core and return both tags
and cursors, and make feed_and_drain a thin wrapper that calls feed_drain_core
and drops/ignores cursors, preserving existing signatures/behavior for block
creation (make_block), eras/stability_window use, and stop_epoch handling.
- Around line 189-191: Add a doc comment to the pop_work function documenting
that WorkBuffer::ForcedStop (mapped to InternalWorkUnit::ForcedStop) is an
absorbing state and pop_work will repeatedly return
(Some(InternalWorkUnit::ForcedStop), WorkBuffer::ForcedStop) until the caller
breaks out; instruct callers to check for InternalWorkUnit::ForcedStop /
WorkBuffer::ForcedStop and explicitly stop their loops to avoid infinite
emission (refer to the existing test helper behavior around ForcedStop for
example).
- Around line 67-83: The match arm handling WorkBuffer::Empty inside
extend_batch is dead because receive_block already routes Empty to
on_genesis_boundary; remove the WorkBuffer::Empty arm from extend_batch (leaving
Restart and OpenBatch handling and the unreachable!() for other cases) or, if
you want to keep it for future use, replace the arm with a clear comment above
extend_batch referencing receive_block and on_genesis_boundary to explain why
Empty is present; update any affected pattern coverage and add a short comment
in extend_batch mentioning the decision.
- Around line 13-34: The enums InternalWorkUnit and WorkBuffer currently lack
Debug implementations which makes logging and panic traces unhelpful; add
derives for Debug (e.g., #[derive(Debug, Clone, PartialEq, Eq)] as appropriate)
to both InternalWorkUnit and WorkBuffer, ensuring WorkBatch and
OwnedMultiEraBlock also implement Debug (or provide a manual Debug impl that
prints variant names and key metadata like slot numbers) so the compiler can
derive Debug successfully; update the enum declarations (InternalWorkUnit,
WorkBuffer) to include the derive attributes and, if necessary, implement Debug
for WorkBatch/OwnedMultiEraBlock or write custom Debug impls that surface useful
state for logging.
In `@crates/fjall/src/state/entities.rs`:
- Around line 63-113: In the Iterator impl for EntityIterator (fn next()), add a
tracing::warn! when you hit the malformed-key branch (where key_bytes.len() <
PREFIXED_KEY_SIZE) so skipped keys are visible; include the offending key length
and a short hex or base64 snippet of key_bytes to aid debugging, e.g. use
hex::encode(&key_bytes) or
base64::engine::general_purpose::STANDARD.encode(&key_bytes) in the message, and
leave the loop behavior unchanged (continue to next item) so normal streaming
semantics in EntityIterator::next() are preserved.
In `@tests/memory.rs`:
- Around line 36-55: The memory assertion currently measures allocations only
during iterator creation (Region::new / reg.change around iter_entities), not
during consumption; to ensure bounded memory for full streaming, move the
reg.change()/let stats.../let heap_delta... block to after iter.count() (or add
a second reg.change() and assertion after iter.count()) so that the heap_delta
reflects allocations during iter.count() (the Entity iterator produced by
store.iter_entities) and assert it against threshold.
| WorkBuffer::EwrapBoundary(block, _) => block.point(), | ||
| WorkBuffer::EstartBoundary(block, _) => block.point(), | ||
| WorkBuffer::PreForcedStop(block) => block.point(), | ||
| WorkBuffer::ForcedStop => unreachable!(), |
There was a problem hiding this comment.
unreachable!() on ForcedStop is a latent panic for crate-internal callers.
last_point_seen is a pub method. While receive_block guards against this via the can_receive_block assert, any other crate-internal caller hitting ForcedStop gets an unhelpful unreachable panic. Consider returning the last known point or making this method fallible.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/cardano/src/work.rs` at line 53, The match arm for
WorkBuffer::ForcedStop uses unreachable!() which can panic for crate-internal
callers of the pub method last_point_seen; change this to a safe, non-panicking
behavior: either return the last known point (e.g., clone and return the stored
tip) or make last_point_seen fallible (return Option<Point> or Result<Point,
WorkError>) and propagate that through callers such as receive_block (which
already uses can_receive_block) so you remove unreachable!(); update the
WorkBuffer::ForcedStop arm to return the chosen value/type and adjust all call
sites of last_point_seen (or its signature) accordingly.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/cardano/src/lib.rs (1)
277-283:⚠️ Potential issue | 🟠 MajorSilently swallowing
refresh_cacheerrors may cause stale era data for subsequent work units.If
refresh_cachefails (e.g., state store I/O error), the error is logged butneeds_cache_refreshis still set tofalse, so the stale cache will be used for all subsequentBlocks/ boundary detection without any retry. This could cause incorrect epoch boundary detection or wrong stability window.Consider propagating the error (changing
pop_workto returnResult<Option<…>, …>) or at least keepingneeds_cache_refresh = trueso the next call retries:Proposed minimal fix (keep flag set on failure)
if self.needs_cache_refresh { if let Err(e) = self.refresh_cache::<D>(domain.state()) { tracing::error!(error = %e, "failed to refresh cache after era-modifying work unit"); - } - self.needs_cache_refresh = false; + } else { + self.needs_cache_refresh = false; + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/cardano/src/lib.rs` around lines 277 - 283, The current code clears self.needs_cache_refresh even when self.refresh_cache::<D>(domain.state()) returns Err, which swallows errors and prevents retries; update the logic in lib.rs so that self.needs_cache_refresh is only set to false on successful refresh (i.e., move the assignment into the Ok branch) or, if you prefer a stronger fix, propagate the error by changing pop_work to return Result<Option<...>, E> and bubble up refresh_cache errors from refresh_cache::<D>(domain.state()) instead of just logging them; refer to the refresh_cache function and the needs_cache_refresh flag and adjust pop_work's return type/signature if choosing propagation.
🧹 Nitpick comments (4)
crates/cardano/src/work.rs (3)
143-193: Consider exhaustive matching inpop_workinstead of_ => unreachable!().Lines 144–146 handle
EmptyandRestartwith an early return, then line 191 uses_ => unreachable!()as a catch-all. If a newWorkBuffervariant is added later, this silently panics at runtime instead of producing a compile-time error.Suggested: replace early return + catch-all with exhaustive arms
pub fn pop_work(self, stop_epoch: Option<Epoch>) -> (Option<InternalWorkUnit>, Self) { - if matches!(self, WorkBuffer::Restart(..)) || matches!(self, WorkBuffer::Empty) { - return (None, self); - } - match self { + WorkBuffer::Empty => (None, self), + WorkBuffer::Restart(..) => (None, self), WorkBuffer::Genesis(block) => ( // ... existing arms unchanged ... ), // ... other arms ... - _ => unreachable!(), } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/cardano/src/work.rs` around lines 143 - 193, The current pop_work implementation returns early for WorkBuffer::Empty and WorkBuffer::Restart then uses a catch-all `_ => unreachable!()`, which hides missing-variant bugs; change pop_work to match exhaustively on WorkBuffer by adding explicit arms for Empty and Restart (returning (None, self)) alongside the existing arms (Genesis, OpenBatch, PreRupdBoundary, RupdBoundary, PreEwrapBoundary, EwrapBoundary, EstartBoundary, PreForcedStop, ForcedStop) and remove the `_ => unreachable!()` arm so the compiler will force you to handle any new WorkBuffer variants (preserve behavior with stop_epoch handling in the EstartBoundary arm).
108-141:receive_blockonly detects the first boundary between two consecutive blocks.If a single block jump spans multiple epoch boundaries (e.g., prev_slot=50 → next_slot=350 crossing boundaries at 100, 200, 300), only the first epoch boundary is detected. Later boundaries would go unnoticed since the intermediate block (at the detected boundary) transitions to
EwrapBoundary, and the subsequentpop_work→EStart→OpenBatchpath uses the same block without re-checking for additional boundaries.In practice this likely requires extremely sparse chains. If this is a known constraint, consider adding a brief comment documenting the assumption that blocks arrive with at most one boundary gap.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/cardano/src/work.rs` around lines 108 - 141, receive_block currently only detects the first epoch/rollback boundary between prev_slot and next_slot; if a block jump spans multiple boundaries you must either document the single-boundary assumption or handle multiple boundaries. Fix by changing receive_block to loop: compute boundary = pallas_extras::epoch_boundary(eras, prev_slot, next_slot) (and rupd via rupd_boundary) repeatedly, advancing prev_slot to the detected boundary slot and invoking the appropriate handler (on_ewrap_boundary/on_rupd_boundary/on_genesis_boundary) for each detected boundary until none remain, then call extend_batch; reference functions/ids: receive_block, last_point_seen, slot, pallas_extras::epoch_boundary, pallas_extras::rupd_boundary, on_ewrap_boundary, on_rupd_boundary, on_genesis_boundary, extend_batch.
252-295:feed_and_drainhelper: good but inner loop may spin on stuck states.In the inner
loop(lines 264–275), ifcan_receive_block()returnsfalseandpop_workreturnsNone, the loop breaks — correct. But ifpop_workkeeps returningSome(...)without ever transitioning to a receivable state (a hypothetical bug in the state machine), this loop would spin indefinitely.Since this is test-only code, it's low risk. Just noting for awareness.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/cardano/src/work.rs` around lines 252 - 295, The inner loop in feed_and_drain can spin indefinitely if buf.can_receive_block() stays false while buf.pop_work(...) keeps returning Some repeatedly; add a simple safety guard: introduce a max iteration counter (e.g. const MAX_DRAINS: usize = 10_000), increment it each time you pop_work inside feed_and_drain, and if it exceeds the limit break the loop (or push a ForcedStop work tag via tag_from_internal(InternalWorkUnit::ForcedStop) before breaking) to avoid an infinite spin. Update the loop that calls buf.pop_work(stop_epoch) and references buf.can_receive_block(), ensuring the counter resets each outer-slot iteration and that you still collect tags via tag_from_internal for drained units.crates/cardano/src/lib.rs (1)
302-309: Consider wrappingconfigandcacheinArcto reduce cloning overhead per work unit.Every
Blockswork unit clones bothCardanoConfigandCache(which includesChainSummary). While Cardano's era list is bounded (~14 entries), cloning still represents unnecessary allocations on each batch. Since both are shared references and lifetimes permit, wrapping them inArcwould make clones cheap reference-count bumps, aligning with the PR's goal of reducing memory pressure.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/cardano/src/lib.rs` around lines 302 - 309, Change CardanoConfig and Cache to be stored and passed as Arc to avoid expensive deep clones: update the types of self.config and self.cache to Arc<CardanoConfig> and Arc<Cache> (and any inner ChainSummary references), adjust constructors and fields that build RollWorkUnit so roll::RollWorkUnit::new accepts Arc<CardanoConfig> and Arc<Cache> (or clones the Arcs), and replace .clone() calls when creating CardanoWorkUnit::Roll in InternalWorkUnit::Blocks with cheap Arc clone (or remove explicit clone if moving the Arc). Ensure all usages that previously expected owned CardanoConfig/Cache are updated to dereference or clone the Arc as needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@crates/cardano/src/lib.rs`:
- Around line 277-283: The current code clears self.needs_cache_refresh even
when self.refresh_cache::<D>(domain.state()) returns Err, which swallows errors
and prevents retries; update the logic in lib.rs so that
self.needs_cache_refresh is only set to false on successful refresh (i.e., move
the assignment into the Ok branch) or, if you prefer a stronger fix, propagate
the error by changing pop_work to return Result<Option<...>, E> and bubble up
refresh_cache errors from refresh_cache::<D>(domain.state()) instead of just
logging them; refer to the refresh_cache function and the needs_cache_refresh
flag and adjust pop_work's return type/signature if choosing propagation.
---
Duplicate comments:
In `@crates/cardano/src/work.rs`:
- Around line 41-55: The method last_point_seen currently panics on
WorkBuffer::ForcedStop via unreachable!(); change it to be fallible by returning
Option<ChainPoint> (i.e., pub fn last_point_seen(&self) -> Option<ChainPoint>)
and return Some(...) for all existing variants and None for ForcedStop, then
update callers to handle the Option (or propagate it) instead of relying on
can_receive_block; alternatively, if you prefer keeping the non-fallible API,
add a field to the ForcedStop variant to store the last known ChainPoint and
return that in last_point_seen (replace unreachable!() with that stored point) —
choose one approach and apply consistently across uses of last_point_seen and
WorkBuffer::ForcedStop.
---
Nitpick comments:
In `@crates/cardano/src/lib.rs`:
- Around line 302-309: Change CardanoConfig and Cache to be stored and passed as
Arc to avoid expensive deep clones: update the types of self.config and
self.cache to Arc<CardanoConfig> and Arc<Cache> (and any inner ChainSummary
references), adjust constructors and fields that build RollWorkUnit so
roll::RollWorkUnit::new accepts Arc<CardanoConfig> and Arc<Cache> (or clones the
Arcs), and replace .clone() calls when creating CardanoWorkUnit::Roll in
InternalWorkUnit::Blocks with cheap Arc clone (or remove explicit clone if
moving the Arc). Ensure all usages that previously expected owned
CardanoConfig/Cache are updated to dereference or clone the Arc as needed.
In `@crates/cardano/src/work.rs`:
- Around line 143-193: The current pop_work implementation returns early for
WorkBuffer::Empty and WorkBuffer::Restart then uses a catch-all `_ =>
unreachable!()`, which hides missing-variant bugs; change pop_work to match
exhaustively on WorkBuffer by adding explicit arms for Empty and Restart
(returning (None, self)) alongside the existing arms (Genesis, OpenBatch,
PreRupdBoundary, RupdBoundary, PreEwrapBoundary, EwrapBoundary, EstartBoundary,
PreForcedStop, ForcedStop) and remove the `_ => unreachable!()` arm so the
compiler will force you to handle any new WorkBuffer variants (preserve behavior
with stop_epoch handling in the EstartBoundary arm).
- Around line 108-141: receive_block currently only detects the first
epoch/rollback boundary between prev_slot and next_slot; if a block jump spans
multiple boundaries you must either document the single-boundary assumption or
handle multiple boundaries. Fix by changing receive_block to loop: compute
boundary = pallas_extras::epoch_boundary(eras, prev_slot, next_slot) (and rupd
via rupd_boundary) repeatedly, advancing prev_slot to the detected boundary slot
and invoking the appropriate handler
(on_ewrap_boundary/on_rupd_boundary/on_genesis_boundary) for each detected
boundary until none remain, then call extend_batch; reference functions/ids:
receive_block, last_point_seen, slot, pallas_extras::epoch_boundary,
pallas_extras::rupd_boundary, on_ewrap_boundary, on_rupd_boundary,
on_genesis_boundary, extend_batch.
- Around line 252-295: The inner loop in feed_and_drain can spin indefinitely if
buf.can_receive_block() stays false while buf.pop_work(...) keeps returning Some
repeatedly; add a simple safety guard: introduce a max iteration counter (e.g.
const MAX_DRAINS: usize = 10_000), increment it each time you pop_work inside
feed_and_drain, and if it exceeds the limit break the loop (or push a ForcedStop
work tag via tag_from_internal(InternalWorkUnit::ForcedStop) before breaking) to
avoid an infinite spin. Update the loop that calls buf.pop_work(stop_epoch) and
references buf.can_receive_block(), ensuring the counter resets each outer-slot
iteration and that you still collect tags via tag_from_internal for drained
units.
Summary by CodeRabbit
New Features
Refactor
Tests
Chores