-
Notifications
You must be signed in to change notification settings - Fork 9
Add detailed job lifecycle logging to EOA executor send flow #90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add detailed job lifecycle logging to EOA executor send flow #90
Conversation
WalkthroughThis PR introduces widespread formatting adjustments across the codebase alongside targeted functional enhancements. Key additions include tracing instrumentation in the EOA worker send flow, new helper functions for transaction registry and metrics, a new response type for EOA diagnostics, and refined error handling in the Solana executor. Most changes are formatting-only or import reordering without functional impact. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Areas requiring extra attention:
Possibly related PRs
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
server/src/http/routes/admin/eoa_diagnostics.rs (1)
396-421: Manual reset scheduling can partially succeed while the API reports an errorIn
schedule_manual_reset:
store.schedule_manual_reset()is executed before callingensure_eoa_executor_job.- If
find_signing_credentialfails insideensure_eoa_executor_job(no pending or borrowed transactions), the function returnsApiEngineError(ValidationError)but the manual reset flag has already been set in the store.- In this case:
- The client sees an error response.
- Manual reset is actually scheduled, but no EOA executor job is enqueued for this EOA/chain, so the reset may not be processed until some other path causes the worker to run.
This partial-success behavior is surprising and makes it hard for operators to know what actually happened.
Consider one of the following adjustments:
- Reorder operations so that you:
- First attempt to ensure or enqueue the EOA executor job.
- Only call
schedule_manual_reset()if that succeeds (or if you decide that a job is already active and sufficient).- Alternatively, treat the “no pending or borrowed transactions” case as a successful manual reset scheduling with
job_enqueued = false, instead of surfacing it as aValidationError, so the API response faithfully reports what was done (reset scheduled, no job needed/enqueued).Either approach would avoid “reset scheduled but API returned error” situations and make this endpoint’s semantics clearer.
Also applies to: 422-441, 466-499, 515-555
executors/src/metrics.rs (1)
121-135: Code inspection confirms the Prometheus re-registration bug; fix requiredThe search results confirm the issue in the review comment:
- Line 104–105:
DEFAULT_EXECUTOR_METRICSis currently a plainExecutorMetrics, not wrapped inArc.- Line 130: The fallback path in
get_metrics()callsExecutorMetrics::new(&DEFAULT_EXECUTOR_METRICS_REGISTRY)unconditionally, which will attempt to re-register the same named histograms every time the fallback is hit.This will panic with a Prometheus
AlreadyRegerror on the second call. The suggested fix in the review comment—wrappingDEFAULT_EXECUTOR_METRICSinArcand reusing it in the fallback—is the correct approach and should be applied.The tests couldn't execute due to sandbox toolchain configuration (unrelated to the code itself), but the bug is confirmed by direct code inspection.
🧹 Nitpick comments (13)
twmq/benches/throughput.rs (1)
75-80: Remove or uncomment the deadreset()method.The commented-out
reset()method at lines 75–80 serves no functional purpose and should be removed. If this functionality is planned for future use, document it with a FIXME comment or issue reference instead.Apply this diff to remove the dead code:
fn new() -> Self { Self { jobs_processed: Arc::new(AtomicU64::new(0)), jobs_nacked: Arc::new(AtomicU64::new(0)), jobs_succeeded: Arc::new(AtomicU64::new(0)), total_processing_time_ms: Arc::new(AtomicU64::new(0)), } } - // fn reset(&self) { - // self.jobs_processed.store(0, Ordering::SeqCst); - // self.jobs_nacked.store(0, Ordering::SeqCst); - // self.jobs_succeeded.store(0, Ordering::SeqCst); - // self.total_processing_time_ms.store(0, Ordering::SeqCst); - // } - fn processed_count(&self) -> u64 { self.jobs_processed.load(Ordering::SeqCst) }eip7702-core/src/delegated_account.rs (1)
31-34: ClarifyNonesemantics fordelegation_contractinis_minimal_accountThe behavior where
delegation_contract: Noneyieldstruefor any EIP‑7702-style delegation is intentional but a bit non-obvious given the “minimal account implementation” wording and comments.Consider a short doc comment on the parameter (or in the function body) clarifying that:
Some(addr)→ verify delegation is specifically toaddrNone→ treat “has any valid 7702 delegation” as “minimal” for callers that don’t care about the concrete implementation.Code itself looks correct as-is.
Also applies to: 69-73
twmq/tests/prune_race_condition.rs (1)
36-88: Race-condition test refinements look good; consider usingPROCESS_SLOWLYThe added
PROCESS_SLOWLYswitch and richer logging make this race-repro test much easier to reason about, and the queue options/comments clearly document the intended pruning stress scenario.Right now the flag is always set to
falsein this test, so the “slow path” is effectively unused. Either:
- flip it to
truefor part of the test (e.g., around the critical race window), or- drop the flag until a second scenario needs it,
to avoid carrying unused configuration.
Also applies to: 181-207, 249-298, 304-333
twmq/src/lib.rs (2)
1194-1289: Stronger observability around lease loss and WATCH retriesThe added
tracing::warn!on missing lease keys andtracing::debug!on WATCH failures in bothcomplete_jobandcomplete_job_queue_errorare well-placed:
- Early exit when
lease_exists == falseensures we don’t apply completion pipelines after a job’s lease expired or was cancelled.- The WATCH-failed debug logs will be very helpful to correlate high contention in Redis with job completion retries.
Given this is an unbounded retry loop, these logs will be the main visibility into “spin” scenarios, so they’re valuable without changing semantics.
Also applies to: 1292-1389
941-1015: Pruning scripts correctly guard against deleting active/pending/delayed jobsThe Lua in
post_success_completionandpost_fail_completionnow explicitly checks:
HEXISTS(active_hash, j_id)for active jobs,LPOS(pending_list, j_id)for pending jobs,ZSCORE(delayed_zset, j_id)for delayed jobs,before deleting job data/meta/errors and trimming the success/failed lists. That matches the intent of protecting jobs that are currently in-flight or about to run, especially when IDs are reused.
The additional debug branch (
"Pruning ran but deleted 0 jobs (all were protected or none eligible)") nicely explains no-op trims in logs. Behavior looks correct.Also applies to: 1122-1192
executors/src/external_bundler/deployment.rs (1)
153-175: Double‑checkset_nxreturn type and consider simplifying lock acquire logicThe new
map_errblocks for system time, JSON serialization, Redisset_nx,get, anddelgive much clearerEngineError::InternalErrormessages, which is a nice improvement for diagnosing lock issues.One thing to verify:
set_nxis typed here as returningOption<String>. In upstreamredis/AsyncCommands,set_nxis generally used as a boolean wrapper overSETNX(1 = acquired, 0 = already present). Iftwmq::redisis still delegating to standardset_nx, asking forOption<String>will cause a type‑mismatch at runtime rather than giving a cleantrue/falseresult.If that’s the case in this repo, consider switching to a bool:
- let result: Option<String> = - conn.set_nx(&key, &lock_data_str) - .await - .map_err(|e| EngineError::InternalError { - message: format!("Lock acquire failed: {e}"), - })?; - - match result { - Some(_) => Ok(AcquireLockResult::Acquired), - None => { + let acquired: bool = conn + .set_nx(&key, &lock_data_str) + .await + .map_err(|e| EngineError::InternalError { + message: format!("Lock acquire failed: {e}"), + })?; + + if acquired { + Ok(AcquireLockResult::Acquired) + } else { // Lock already exists, get the lock_id let existing_data: Option<String> = conn.get(&key) .await .map_err(|e| EngineError::InternalError { message: format!("Failed to read existing lock: {e}"), })?; @@ - Ok(AcquireLockResult::AlreadyLocked(existing_lock_id)) - } - } + Ok(AcquireLockResult::AlreadyLocked(existing_lock_id)) + }Separately, the comment mentions “SET NX EX” but there’s no TTL/EX configured; that means locks can become permanent if not explicitly released. If that’s not intended, you might want to follow up with a small TTL via
set_ex/pexpirein a future change.Also applies to: 180-186, 206-211
executors/src/transaction_registry.rs (1)
3-4: TransactionRegistry pipeline helpers look good; consider using delete count for “not found” semanticsThe refactor to use
ConnectionManagerdirectly plus the newadd_set_command/add_remove_commandhelpers makes it much easier to compose registry updates into existing Redis pipelines atomically; the asyncget/set/hdelpaths remain straightforward.Right now
remove_transactionignores thehdelreturn count and always returnsOk(())if Redis succeeds, soTransactionNotFoundis effectively unused. If you care about distinguishing “no such transaction” from “successful removal”, you could check the count and returnTransactionRegistryError::TransactionNotFoundwhen it’s zero; otherwise the current behavior is fine and keeps callers simple.Also applies to: 39-43, 54-57, 60-77
executors/src/solana_executor/worker.rs (4)
409-454: Priority-fee percentile handling is robust, with minor edge behaviors
get_percentile_compute_unit_pricecorrectly:
- Treats network/RPC failures as NACK-able with
PriorityFeeError.- Sorts by
prioritization_feeand derives an index from the requested percentile.- Falls back to the max fee (or 0) if the computed index is out of bounds (including empty history), logging useful context.
get_compute_unit_pricecleanly routes Auto/Manual/Percentile modes and reuses the percentile helper;Autohard-coded to 75th percentile is a reasonable default.The percentile index can land just past the end for high percentiles and small samples (triggering the "out of bounds" fallback), but this is safe and acceptable.
Also applies to: 456-495
638-666: Serialized transaction handling on blockhash expiry is careful and avoids invalidating existing signaturesThe logic for serialized inputs on expired blockhash is nuanced and looks correct:
- When input is
SolanaTransactionInput::Serialized(t):
- You base64‑decode
t.transactionand bincode‑decode into aVersionedTransaction.- You treat any non‑zero 64‑byte signature as "already signed".
- In that case, you log a clear error, delete the stored attempt, and surface a permanent
TransactionFailedwith an explanatory reason, avoiding a resubmission that would silently invalidate user signatures.- When decoding fails or all signatures are zero, you fall back to the normal resubmission path:
- Old attempt is deleted.
- A new transaction is built and sent with a fresh blockhash.
This is a sensible tradeoff between safety (not re‑signing opaque signed payloads) and resilience (resubmitting unsigned serialized inputs).
Also applies to: 691-706
910-959: Send error classification and retry policy are reasonable, with clear heuristics
- Successful sends return a
TransactionSent“error” that NACKs withRequeuePosition::Last, which integrates neatly with theon_nackspecial case to emit a send‑stage success webhook.- For send failures:
- All Solana RPC errors are converted once via
to_engine_solana_error, centralizing mapping.is_send_error_retryable:
- Retries IO/Reqwest errors and most JSON/parse and unknown errors (conservative).
- Treats preflight failures and structured transaction errors as permanent.
- Treats signing errors as permanent.
- Inspects RPC error messages for obviously permanent conditions (invalid signature, insufficient funds, account not found, transaction already processed, invalid account) and disables retries in those cases.
- For
Customerrors, uses a simple"invalid"substring heuristic to avoid retrying obviously bad requests.- Retryable send errors are NACKed with
SendFailedand a network retry delay; non‑retryable ones clean up the stored attempt andfail()permanently.The heuristics are clear and biased towards not hammering the RPC with bad transactions while still retrying transient failures.
Also applies to: 999-1051
962-977: Redis lock-loss handling correctly distinguishes transient from permanent failures
verify_lock:
- Returns a NACK with
LockLostif the lock is no longer held, allowing another worker to take over.- Treats
still_held()errors as permanentInternalError::fail(), which is appropriate for unexpected Redis issues.handle_redis_error:
- Uses a string check for
"lock lost"to convert known lock‑loss errors intoLockLostNACKs with the standard confirmation retry delay.- Treats all other Redis errors as permanent
InternalError::fail().Given the limitations of error typing from Redis, this is a reasonable compromise to recover gracefully from known lock‑loss scenarios without masking other datastore issues.
Also applies to: 981-996
server/src/http/routes/admin/eoa_diagnostics.rs (1)
501-513: Queue dedupe check is reasonable, but mostly advisory
job_exists_in_queuechecks the queue’s dedupe set withsismemberbefore enqueueing a new job:
- This is a fast, low-overhead way to avoid obvious duplicates and drive the
job_enqueuedflag.- There is a small race window if multiple callers hit this endpoint concurrently, but Twmq’s own idempotency/with_id semantics should prevent harmful duplicates even if the check is stale.
Given expected low contention on this admin route, this is an acceptable pattern.
twmq/tests/delay.rs (1)
63-73: Consider saturating subtraction to guard against clock skew in testsThe reformatted
processsignature is fine, butlet actual_delay = now - job.job.created_at;can panic if the system clock jumps backwards between enqueue and processing. In tests this is unlikely but cheap to harden:- let now = SystemTime::now() - .duration_since(UNIX_EPOCH) - .unwrap() - .as_secs(); - - let actual_delay = now - job.job.created_at; + let now = SystemTime::now() + .duration_since(UNIX_EPOCH) + .unwrap() + .as_secs(); + + let actual_delay = now.saturating_sub(job.job.created_at);This keeps semantics for normal runs while avoiding potential overflow panics under clock skew.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (44)
aa-types/src/lib.rs(1 hunks)core/src/rpc_clients/bundler.rs(1 hunks)core/src/signer.rs(1 hunks)eip7702-core/src/constants.rs(1 hunks)eip7702-core/src/delegated_account.rs(2 hunks)eip7702-core/src/lib.rs(1 hunks)eip7702-core/src/transaction.rs(1 hunks)eip7702-core/tests/integration_tests.rs(5 hunks)executors/src/eip7702_executor/mod.rs(1 hunks)executors/src/eoa/store/mod.rs(1 hunks)executors/src/eoa/worker/confirm.rs(1 hunks)executors/src/eoa/worker/send.rs(11 hunks)executors/src/eoa/worker/transaction.rs(2 hunks)executors/src/external_bundler/confirm.rs(3 hunks)executors/src/external_bundler/deployment.rs(2 hunks)executors/src/external_bundler/send.rs(3 hunks)executors/src/metrics.rs(9 hunks)executors/src/solana_executor/mod.rs(1 hunks)executors/src/solana_executor/worker.rs(26 hunks)executors/src/transaction_registry.rs(4 hunks)integration-tests/src/lib.rs(5 hunks)integration-tests/tests/setup.rs(1 hunks)integration-tests/tests/sign_solana_transaction.rs(1 hunks)server/src/execution_router/mod.rs(1 hunks)server/src/http/error.rs(2 hunks)server/src/http/extractors.rs(2 hunks)server/src/http/routes/admin/eoa_diagnostics.rs(4 hunks)server/src/http/routes/admin/metrics.rs(2 hunks)server/src/http/routes/admin/mod.rs(1 hunks)server/src/http/routes/sign_solana_transaction.rs(2 hunks)server/src/http/routes/solana_transaction.rs(1 hunks)server/src/http/server.rs(1 hunks)server/src/lib.rs(1 hunks)thirdweb-core/src/iaw/mod.rs(1 hunks)twmq/benches/throughput.rs(1 hunks)twmq/src/lib.rs(4 hunks)twmq/src/queue.rs(1 hunks)twmq/src/shutdown.rs(3 hunks)twmq/tests/basic_hook.rs(1 hunks)twmq/tests/delay.rs(1 hunks)twmq/tests/fixtures.rs(1 hunks)twmq/tests/multilane_batch_pop.rs(2 hunks)twmq/tests/prune_race_condition.rs(11 hunks)twmq/tests/prune_race_random_ids.rs(10 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: d4mr
Repo: thirdweb-dev/engine-core PR: 48
File: executors/src/eoa/store/submitted.rs:229-230
Timestamp: 2025-09-20T06:58:40.230Z
Learning: The diff in executors/src/eoa/store/submitted.rs shows correct brace structure - the first closing brace closes the remove_transaction_from_redis_submitted_zset method and the second closing brace closes the impl CleanSubmittedTransactions block. The change only adds whitespace formatting.
📚 Learning: 2025-09-20T06:58:40.230Z
Learnt from: d4mr
Repo: thirdweb-dev/engine-core PR: 48
File: executors/src/eoa/store/submitted.rs:229-230
Timestamp: 2025-09-20T06:58:40.230Z
Learning: The diff in executors/src/eoa/store/submitted.rs shows correct brace structure - the first closing brace closes the remove_transaction_from_redis_submitted_zset method and the second closing brace closes the impl CleanSubmittedTransactions block. The change only adds whitespace formatting.
Applied to files:
twmq/src/queue.rsserver/src/http/server.rstwmq/tests/multilane_batch_pop.rseip7702-core/src/transaction.rsexecutors/src/external_bundler/deployment.rseip7702-core/tests/integration_tests.rsintegration-tests/tests/sign_solana_transaction.rsexecutors/src/eoa/store/mod.rsserver/src/http/routes/sign_solana_transaction.rsexecutors/src/eoa/worker/confirm.rsserver/src/http/routes/solana_transaction.rscore/src/rpc_clients/bundler.rsexecutors/src/external_bundler/confirm.rsserver/src/execution_router/mod.rscore/src/signer.rstwmq/src/lib.rstwmq/tests/fixtures.rsserver/src/http/routes/admin/mod.rstwmq/tests/prune_race_condition.rsintegration-tests/src/lib.rstwmq/tests/prune_race_random_ids.rseip7702-core/src/constants.rsexecutors/src/external_bundler/send.rsexecutors/src/eoa/worker/send.rsexecutors/src/transaction_registry.rsexecutors/src/solana_executor/worker.rsexecutors/src/eoa/worker/transaction.rseip7702-core/src/lib.rsexecutors/src/metrics.rsexecutors/src/eip7702_executor/mod.rsexecutors/src/solana_executor/mod.rstwmq/src/shutdown.rs
📚 Learning: 2025-09-20T05:30:35.171Z
Learnt from: joaquim-verges
Repo: thirdweb-dev/engine-core PR: 48
File: executors/src/eoa/worker/send.rs:20-21
Timestamp: 2025-09-20T05:30:35.171Z
Learning: In executors/src/eoa/worker/send.rs, there is a critical bug where HEALTH_CHECK_INTERVAL is defined as 300 seconds but compared against millisecond timestamps, causing balance checks to occur every 300ms instead of every 5 minutes (1000x more frequent than intended).
Applied to files:
executors/src/eoa/worker/confirm.rstwmq/tests/prune_race_condition.rsexecutors/src/eoa/worker/send.rsexecutors/src/solana_executor/worker.rsexecutors/src/eoa/worker/transaction.rsexecutors/src/metrics.rs
📚 Learning: 2025-07-06T15:44:13.701Z
Learnt from: d4mr
Repo: thirdweb-dev/engine-core PR: 5
File: executors/src/eoa/worker.rs:173-176
Timestamp: 2025-07-06T15:44:13.701Z
Learning: In the EOA executor system, aggressive lock acquisition is safe because every Redis state mutation uses WATCH operations on the lock key. If the lock is lost during a transaction, the WATCH causes the transaction to fail and the worker exits gracefully. This provides coordination between workers even when using forceful lock takeover.
Applied to files:
twmq/tests/prune_race_condition.rs
🧬 Code graph analysis (18)
executors/src/external_bundler/deployment.rs (1)
executors/src/eoa/store/mod.rs (3)
now(834-836)conn(431-431)conn(796-796)
eip7702-core/tests/integration_tests.rs (2)
eip7702-core/src/delegated_account.rs (1)
is_minimal_account(31-84)executors/src/eoa/authorization_cache.rs (1)
is_minimal_account(25-41)
eip7702-core/src/delegated_account.rs (1)
executors/src/eoa/authorization_cache.rs (1)
is_minimal_account(25-41)
server/src/http/routes/admin/metrics.rs (2)
integration-tests/tests/setup.rs (1)
new(110-322)server/src/http/server.rs (1)
new(54-122)
server/src/http/routes/sign_solana_transaction.rs (1)
executors/src/solana_executor/worker.rs (1)
bincode(643-643)
executors/src/eoa/worker/confirm.rs (1)
executors/src/eoa/store/mod.rs (1)
now(834-836)
executors/src/external_bundler/confirm.rs (1)
executors/src/metrics.rs (3)
calculate_duration_seconds_from_twmq(278-284)current_timestamp_ms(287-289)record_transaction_queued_to_confirmed(160-170)
twmq/src/lib.rs (1)
twmq/src/job.rs (1)
id(221-223)
twmq/tests/fixtures.rs (1)
twmq/src/lib.rs (2)
process(68-71)job(157-162)
twmq/tests/prune_race_condition.rs (3)
twmq/src/lib.rs (1)
new(132-151)twmq/src/queue.rs (4)
default(20-22)default(45-55)default(85-87)handler(159-167)twmq/tests/basic_hook.rs (1)
cleanup_redis_keys(24-41)
twmq/tests/prune_race_random_ids.rs (2)
twmq/src/lib.rs (7)
redis(1243-1245)redis(1251-1251)redis(1349-1351)redis(1357-1357)redis(1393-1395)redis(1401-1403)new(132-151)twmq/tests/basic_hook.rs (2)
redis(34-36)cleanup_redis_keys(24-41)
executors/src/external_bundler/send.rs (1)
executors/src/metrics.rs (3)
calculate_duration_seconds_from_twmq(278-284)current_timestamp_ms(287-289)record_transaction_queued_to_sent(148-158)
executors/src/eoa/worker/send.rs (3)
executors/src/eoa/worker/error.rs (2)
is_retryable_preparation_error(250-281)should_update_balance_threshold(215-235)executors/src/metrics.rs (2)
calculate_duration_seconds(273-275)current_timestamp_ms(287-289)executors/src/eoa/store/mod.rs (1)
now(834-836)
executors/src/transaction_registry.rs (3)
twmq/src/lib.rs (6)
redis(1243-1245)redis(1251-1251)redis(1349-1351)redis(1357-1357)redis(1393-1395)redis(1401-1403)executors/src/eoa/store/mod.rs (2)
conn(431-431)conn(796-796)executors/src/external_bundler/deployment.rs (4)
conn(40-42)conn(54-54)conn(69-71)conn(207-207)
executors/src/solana_executor/worker.rs (2)
core/src/error.rs (2)
to_engine_solana_error(570-570)to_engine_solana_error(575-679)executors/src/metrics.rs (1)
current_timestamp_ms(287-289)
server/src/http/routes/admin/eoa_diagnostics.rs (4)
executors/src/eoa/store/atomic.rs (2)
eoa(84-86)chain_id(89-91)twmq/src/lib.rs (7)
redis(1243-1245)redis(1251-1251)redis(1349-1351)redis(1357-1357)redis(1393-1395)redis(1401-1403)new(132-151)executors/src/eoa/store/mod.rs (4)
new(112-118)new(296-310)conn(431-431)conn(796-796)server/src/queue/manager.rs (1)
new(52-318)
executors/src/eoa/worker/transaction.rs (1)
executors/src/eoa/worker/error.rs (2)
is_retryable_preparation_error(250-281)is_unsupported_eip1559_error(330-341)
executors/src/metrics.rs (2)
integration-tests/tests/setup.rs (1)
new(110-322)server/src/http/routes/admin/metrics.rs (1)
get_metrics(12-31)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: coverage
- GitHub Check: test
🔇 Additional comments (45)
thirdweb-core/src/iaw/mod.rs (1)
70-70: Derive macro consolidation improves readability.Combining the derive macros into a single attribute is a clean style improvement. The
Defaultderivation is correct given the#[default]attribute on theTextvariant.integration-tests/tests/sign_solana_transaction.rs (1)
219-219: LGTM! Formatting change improves consistency.This single-line format now matches the identical error handling pattern at line 131, making the codebase more consistent.
server/src/execution_router/mod.rs (1)
509-513: Import refactoring improves readability.The shift to local imports for
QueuedSolanaTransactionResponseandSolanaTransactionOptionsis appropriate since they're only used within this function. This eliminates verbose fully-qualified type names and improves clarity. The multi-line return type formatting is clean and idiomatic.twmq/benches/throughput.rs (2)
103-160: Correct implementation of job handler with proper metrics tracking.The async job processing logic, atomic metric updates, and error handling are all sound. Proper use of sequential ordering for consistency across threads.
291-299: Formatting-only change to print statement – no behavioral impact.The consolidation of the "Load Test Results" header into a single line improves readability while maintaining all output information. The subsequent println statements remain unchanged.
twmq/tests/multilane_batch_pop.rs (1)
330-330: Formatting change approved.These lines inline printf statements for consistency with the broader logging pattern in the test suite. No functional or semantic impact.
Also applies to: 356-356
integration-tests/src/lib.rs (1)
1-106: LGTM! Formatting improvements enhance readability.All changes in this file are formatting-only adjustments:
- Import reorganization for
VersionedMessageandv0- Comment alignment in function parameters
- Idiomatic let-chain pattern in
verify_signature(lines 69-71)- Whitespace improvements for better code structure
The logic in all helper functions (
create_spl_token_transfer_transaction,verify_signature,create_system_transfer) remains unchanged and correct.integration-tests/tests/setup.rs (1)
17-38: LGTM! Import consolidation reduces duplication.The changes consolidate imports from
thirdweb_engineand update type references inTestConfig:
- Imports various config and component types directly from
thirdweb_engine- Uses
RedisConfig as ServerRedisConfigalias for clarity- Updates struct field types to match the imported types
This aligns with the comment on line 24-25 about reusing types instead of duplicating them. No functional changes—just better code organization.
eip7702-core/src/constants.rs (1)
1-5: Trailing newline addition—no functional impact.Cosmetic formatting adjustment aligns with Rust conventions. No concerns.
aa-types/src/lib.rs (1)
1-3: No functional changes—re-export semantics preserved.Formatting adjustment to re-export statement has no impact on public API surface or behavior.
core/src/signer.rs (1)
622-626: Blank line insertion—cosmetic formatting only.Added blank line before signature assignment has no functional impact on Solana transaction signing flow.
twmq/src/queue.rs (1)
1-10: Import reordering—no functional impact.Serde trait imports reordered alphabetically. No effect on functionality, trait availability, or derive macro behavior.
executors/src/eoa/store/mod.rs (1)
585-605: For loop formatting—readability improvement with no logic changes.Multi-line reformatting of iterator chain improves readability while preserving iteration and deletion semantics.
eip7702-core/tests/integration_tests.rs (1)
660-662: Multi-line formatting of test method calls—readability improvement only.Long method chains and arguments reformatted across multiple lines. No changes to test logic, assertions, or coverage.
Also applies to: 672-674, 678-680, 705-709, 743-745, 753-757, 767-769, 776-778
eip7702-core/src/transaction.rs (1)
192-212: Conditional and assignment formatting—logic preserved.Multi-line formatting of the is_minimal_account condition and sign_authorization call improves readability without changing control flow or behavior.
eip7702-core/src/lib.rs (1)
1-3: Module visibility expanded—transaction module now public API.Making the transaction module public represents an intentional expansion of the eip7702-core crate's public API surface. Ensure this aligns with versioning strategy and that all exposed types in transaction.rs are properly documented and stable for public consumption.
server/src/http/routes/admin/mod.rs (1)
1-3: Exposeadmin::queuemodule for existing routesAdding
pub mod queue;correctly wires up thecrate::http::routes::admin::queue::*routes used inserver/src/http/server.rs. No issues here.server/src/lib.rs (1)
9-12: Reorderedconfigre-exports but kept public API stableThe
pub use config::{...}list is just reordered; exported types are unchanged, so this is a no-op for consumers.server/src/http/extractors.rs (1)
10-11: Header-based signing credential extraction remains consistentReordered
credentialsimports and the formatted call totry_extract_aws_kms_with_cachepreserve the existing priority:
- AWS KMS (with cached clients),
- IAW (wallet access token + client/service key),
- Vault access token,
with the same error semantics. No issues spotted.
Also applies to: 107-118
server/src/http/routes/admin/metrics.rs (1)
10-53: Metrics endpoint formatting cleanup; behavior unchanged
get_metricsstill:
- gathers from the shared
prometheus::Registry,- encodes via
TextEncoder,- converts to UTF‑8,
- returns
200with the Prometheus text content type, and- maps failures to a structured
MetricsErrorwith a 500 response.Changes are format-only and look good.
executors/src/eip7702_executor/mod.rs (1)
1-3: Module reordering only in EIP‑7702 executorThe executor now declares modules in the order
confirm,delegation_cache,send, with no visibility changes. This is a pure organizational/formatting tweak.twmq/tests/prune_race_random_ids.rs (1)
7-357: LGTM on additional tracing and Redis debug instrumentationAll changes here are import reordering, improved log formatting, and extra Redis debug reads to introspect pruning state; test flow, QueueOptions, and assertions remain unchanged. This makes the pruning race test more diagnosable without altering its behavior.
core/src/rpc_clients/bundler.rs (1)
180-186: tw_get_delegation_contract reformat is safeThis is a formatting-only change; the RPC method name, params, and return type are unchanged, so behavior and public API remain identical.
executors/src/solana_executor/mod.rs (1)
3-3: Module and re-export reordering is non-semanticMoving
workerbelowrpc_cache/storageand reordering the items in thepub usedoes not affect behavior; the public API remains identical. Looks good.Also applies to: 7-7
twmq/src/shutdown.rs (1)
66-68: Consistent runtime error mapping for worker panicsWrapping
JoinErrorinTwmqError::Runtime { message }in both single-worker and multi-worker shutdown paths standardizes how task panics are surfaced while detailed context is still logged. No issues spotted.Also applies to: 122-124
executors/src/solana_executor/worker.rs (5)
114-122: Error typing for blockhash expiry and helper accessors look solidThe
BlockhashExpired { submission_attempt_number }variant plusis_send_success/signature()helpers onSolanaExecutorErrorgive clear, typed signaling for both success-like and retryable states and make downstream handling (e.g., hooks/webhooks) straightforward. No issues here.Also applies to: 177-191
223-271: Process flow, locking, and RPC client retrieval are well-structured
processnow acquires a per-transaction lock, delegates toexecute_transaction, and always attemptslock.release()afterwards, which keeps lock lifecycle clear and prevents leaks.- Fetching the
RpcClientonce fromrpc_cacheper job using the chain_id in execution options avoids redundant client construction.- The tracing span with
transaction_idand stage fields should greatly improve observability of job lifecycles.This structure looks correct and aligned with the durable execution model.
Also applies to: 258-265
504-523: Blockhash validity, confirmation checks, and Redis interactions look correct
- Using
commitment_levelfrom execution options for both signature status checks andget_latest_blockhash_with_commitmentkeeps behavior aligned with caller expectations.- The
Ok(None)branch for signature status:
- Uses
current_timestamp_ms().saturating_sub(attempt.sent_at)and a millisecond constant (min_wait_before_resubmit_ms = 30_000), so units are consistent and avoid the seconds/ms pitfall seen previously. Based on learnings.- Only proceeds to blockhash validity checks after a reasonable wait and NACKs with
NotYetConfirmedotherwise.is_blockhash_validis called withCommitmentLevel::Finalized, which is stricter than the send/confirm commitment level and seems intentional for expiry decisions.- On blockhash expiry:
MaxRetriesExceededis emitted oncesubmission_attempt_number > max_retries, with attempts cleaned from storage first.- Errors deleting the old attempt are surfaced as permanent
InternalError::fail(), which is appropriate for a data-store failure.- Redis errors during blockhash check and attempt deletion are converted via
handle_redis_erroror explicitInternalErrormapping, ensuring we don't silently drop failure cases.Overall this section correctly separates transient RPC issues (NACK + retry) from permanent on-chain / storage failures.
Also applies to: 539-576, 605-621, 676-706, 714-719
739-767: Retry bounds for send and resubmission attempts are well-guarded
submission_attempt_numberis derived from the stored attempt count plus one, or 1 when there is no prior attempt.- Two guards prevent runaway retries:
MAX_SEND_ATTEMPTS_WITHOUT_TRANSACTIONcaps job attempts before any attempt record exists, protecting against misconfigured queues or persistent pre-send failures.submission_attempt_number > max_retries + 1caps the number of resubmissions per transaction (including the initial send).- In both cases, a
MaxRetriesExceedederror is emitted, and behavior is explicit.The off‑by‑one boundaries (initial send +
max_retriesresubmits) look correct.
790-832: Instruction-based transaction build, memo injection, and attempt storage look correct
- For instruction inputs:
- Priority fees are computed once per send via
get_compute_unit_price; errors NACK cleanly.- A memo instruction embedding
transaction_idis appended, giving each resubmission a unique signature and aiding traceability.- Memo data is encoded as base64 with
InstructionDataEncoding::Base64, consistent with the otherSolanaInstructionDataentries.- For serialized inputs:
- You intentionally ignore execution options to avoid mutating a preconstructed payload, which is important for signature validity.
SolanaTransaction::to_versioned_transactionerrors are logged with transaction_id and mapped toTransactionBuildFailed::fail().- Attempts are stored via
store_attempt_if_not_existsbefore sending, avoiding duplicate network sends; the race path is handled by emitting anInternalErrorNACK with a clear log about the race.No obvious correctness issues with this build + store flow.
Also applies to: 815-823, 869-907
server/src/http/routes/admin/eoa_diagnostics.rs (1)
84-88: Manual reset state exposure and response type are coherent
ManualResetResponse { job_enqueued }gives callers explicit feedback on whether a new EOA executor job was scheduled.EoaStateResponsenow includesmanual_reset_scheduled, wired viais_manual_reset_scheduled(), so operators can introspect reset state from diagnostics.These additions line up well and keep the admin API self-describing.
Also applies to: 164-181, 170-185
executors/src/eoa/worker/confirm.rs (1)
379-381: Time-since-queuing computation is correct and purely reformatted
time_since_queuingstill usesEoaExecutorStore::now()(ms) minuscreated_at(ms) and compares againstNONCE_STALL_LIMIT_MS, so units remain consistent; only formatting changed. Looks good.twmq/tests/basic_hook.rs (1)
259-259: Single-line webhook queue debug print looks goodThe compact
println!still surfaces both pending and success counts and is consistent with the rest of the test logging.server/src/http/error.rs (1)
2-4: Solana RPC error mapping remains correctReformatted imports and the
SolanaRpcErrorKind::Reqwestbranch preserve existing semantics: HTTP status is mapped viaStatusCode::from_u16, falling back toBAD_GATEWAYwhen missing/invalid.Also applies to: 91-93
server/src/http/server.rs (1)
4-8: Reflowed engine_core imports are fineMulti-line import grouping is clearer and keeps the same set of dependencies; nothing functionally changed.
server/src/http/routes/solana_transaction.rs (1)
63-63: 202 JSON response construction is unchangedUsing the tuple
(StatusCode::ACCEPTED, Json(SuccessResponse::new(response)))keeps the same HTTP status and body shape in a more compact form.executors/src/external_bundler/send.rs (1)
30-33: Queued→sent timing metrics wiring looks correctImporting
current_timestamp_msandcalculate_duration_seconds_from_twmqand using them withjob.job.created_at(seconds) and a millisecondsent_timestampfixes the unit mismatch and records a sane duration for"erc4337-external"by chain.Also applies to: 357-357, 508-516
server/src/http/routes/sign_solana_transaction.rs (1)
80-84: Consistent validation error mapping for Solana signingWrapping both
get_latest_blockhashand bincode serialization failures intoEngineError::ValidationErrorviaApiEngineErrorgives a consistent client-facing error story without changing the happy-path behavior.Also applies to: 113-118
executors/src/metrics.rs (1)
2-5: New metrics helpers, EOA degradation tracking, and tests look solid
- Histogram definitions and bucket choices for queued→sent, queued→confirmed, and EOA degradation/stuck durations look reasonable for typical executor latencies.
- The
record_transaction_queued_to_*andrecord_eoa_job_processing_timehelpers cleanly wrap observation logic and keep label usage consistent.EoaMetrics::record_transaction_sent/confirmed/record_stuck_eoacorrectly gate the low-cardinality degradation/stuck histograms on configurable thresholds while always recording the base metrics.calculate_duration_seconds_from_twmqandcurrent_timestamp_mscorrectly normalize TWMQ seconds timestamps with wall-clock millisecond timestamps, and the tests cover both normal and edge cases (including end < start) as well as metrics export for both default and custom registries.Aside from the
get_metricsfallback issue noted separately, this overall metrics surface is well-structured and well-tested.Also applies to: 22-29, 35-41, 57-62, 66-75, 77-83, 148-179, 203-270, 278-289, 295-321, 330-360, 366-368, 379-387
executors/src/eoa/worker/transaction.rs (2)
22-33: LGTM! Import restructuring aligns with module organization.The import path changes are purely organizational, moving items to their appropriate module locations without affecting functionality.
405-410: LGTM! Improved readability with multi-line formatting.The chained method call is now more readable with the multi-line format. Functionally identical to the previous single-line version.
executors/src/eoa/worker/send.rs (3)
4-17: LGTM! Metrics imports added to support timing instrumentation.The addition of
calculate_duration_secondsandcurrent_timestamp_msimports from the metrics module enables the detailed lifecycle logging throughout this file.
25-141: LGTM! Comprehensive lifecycle logging added to send_flow.The timing instrumentation provides excellent observability into each phase of the send flow:
- EOA health check
- Balance validation
- Recycled nonce processing
- Inflight budget retrieval
- New transaction processing
- Overall completion
All logs consistently include relevant context (eoa, chain_id, worker_id) and use the "JOB_LIFECYCLE" prefix for easy filtering.
376-605: LGTM! Detailed timing instrumentation for new transaction processing.The comprehensive logging in
process_new_transactionsprovides excellent visibility into:
- Pending transaction retrieval
- Nonce management
- Transaction building and signing
- Preparation result cleaning
- State transitions
- Blockchain submission
- Result processing
Each iteration and sub-step is timed, making it easy to identify bottlenecks. The consistent use of structured logging with contextual fields (eoa, chain_id, worker_id, iteration) enables effective debugging and monitoring.
twmq/tests/fixtures.rs (1)
60-63: LGTM! Formatting improvement for method signature.The multi-line format with trailing comma improves readability and aligns with the trait definition style. No functional changes.
| metrics::{ | ||
| calculate_duration_seconds_from_twmq, current_timestamp_ms, | ||
| record_transaction_queued_to_confirmed, | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify timestamp units used for queued→confirmed metrics
original_queued_timestamp is documented as milliseconds, but calculate_duration_seconds_from_twmq expects a start timestamp in seconds and multiplies it by 1000 internally. If you are actually storing milliseconds in original_queued_timestamp, this will inflate queued_to_confirmed_duration by ~1000x. Either:
- store seconds here (e.g., straight from twmq’s job timestamp), or
- switch to a helper that expects milliseconds and update the docstring accordingly.
Also worth double‑checking that "erc4337-external" matches the executor label you’re using elsewhere for these metrics so dashboards aggregate correctly.
Also applies to: 221-231
🏁 Script executed:
#!/bin/bash
# Examine the confirm.rs file at the specified lines and context
echo "=== Lines 18-21 (imports) ==="
sed -n '15,25p' executors/src/external_bundler/confirm.rs
echo -e "\n=== Lines 221-231 (metrics recording) ==="
sed -n '218,235p' executors/src/external_bundler/confirm.rsLength of output: 1161
🏁 Script executed:
#!/bin/bash
# Find JobData struct definition and original_queued_timestamp field
echo "=== JobData struct definition ==="
rg -n "struct JobData" -A 20 executors/src/Length of output: 33
🏁 Script executed:
#!/bin/bash
# Find calculate_duration_seconds_from_twmq function implementation
echo "=== calculate_duration_seconds_from_twmq implementation ==="
rg -n "fn calculate_duration_seconds_from_twmq" -A 15 executors/src/Length of output: 956
🏁 Script executed:
#!/bin/bash
# Search for all executor label usage in metrics context
echo "=== Executor label usage ==="
rg -n "erc4337-external" executors/src/ -C 3Length of output: 1258
🏁 Script executed:
#!/bin/bash
# Find where original_queued_timestamp is set/assigned
echo "=== original_queued_timestamp assignments ==="
rg -n "original_queued_timestamp\s*=" executors/src/ -C 2Length of output: 45
🏁 Script executed:
#!/bin/bash
# Check other executor labels used for metrics to verify consistency
echo "=== All executor labels in metrics ==="
rg -n 'record_transaction|"[^"]*external' executors/src/external_bundler/ -C 1 | head -50Length of output: 1315
🏁 Script executed:
#!/bin/bash
# Search for JobData more broadly - may be struct or type alias
echo "=== JobData definition (any context) ==="
rg -n "JobData" executors/src/external_bundler/confirm.rs -B 2 -A 8 | head -40
echo -e "\n=== Where original_queued_timestamp is used ==="
rg -n "original_queued_timestamp" executors/src/ -B 2 -A 2Length of output: 4478
🏁 Script executed:
#!/bin/bash
# Check the job queue struct to understand created_at unit
echo "=== Job/JobData queue structures ==="
rg -n "pub struct.*Job|created_at|original_queued_timestamp" executors/src/ -B 1 -A 1 | head -60Length of output: 3566
🏁 Script executed:
#!/bin/bash
# Find where original_queued_timestamp is populated
echo "=== Population of original_queued_timestamp ==="
rg -n "original_queued_timestamp:" executors/src/ -B 3 -A 1Length of output: 1981
🏁 Script executed:
#!/bin/bash
# Check if there are any time conversion utilities being used
echo "=== Time/duration conversion patterns ==="
rg -n "timestamp.*seconds|timestamp.*milliseconds|as u64|timestamp_millis|timestamp_secs" executors/src/external_bundler/ -C 1 | head -50Length of output: 337
Fix timestamp unit mismatch in queued-to-confirmed metrics
The bug flagged in the review is confirmed and critical. The field original_queued_timestamp is documented as milliseconds and populated from job.job.created_at (which is stored in milliseconds per eoa/store/mod.rs:96). However, it's passed directly to calculate_duration_seconds_from_twmq(start_timestamp_seconds, end_timestamp_ms) which treats the first parameter as seconds and multiplies by 1000. This causes the duration to inflate by ~1000x.
Fix: Convert original_timestamp to seconds before passing it:
let queued_to_confirmed_duration =
calculate_duration_seconds_from_twmq(original_timestamp / 1000, confirmed_timestamp);Or update the field to store seconds instead of milliseconds and update the docstring.
This same bug exists in eip7702_executor/confirm.rs:292-294 with identical code structure.
The executor label "erc4337-external" is correctly consistent across both files.
🤖 Prompt for AI Agents
In executors/src/external_bundler/confirm.rs around lines 18 to 21 (and
similarly in eip7702_executor/confirm.rs around lines 292 to 294), the code
passes original_queued_timestamp (documented/stored in milliseconds) into
calculate_duration_seconds_from_twmq which expects seconds, inflating durations
~1000x; change the call to pass the queued timestamp converted to seconds
(divide by 1000) before calling calculate_duration_seconds_from_twmq, or
alternatively change the stored/documented field to seconds consistently and
update usages and docstring to match.

Summary by CodeRabbit
Release Notes
Improvements
Chores
✏️ Tip: You can customize this high-level summary in your review settings.