-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Describe the bug
The TFChain bridge is experiencing a critical operational issue where BurnTransactionReady events (internally handled as WithdrawReadyEvents by the bridge daemon) are processed with significant delays, often exceeding the 2-minute window during which their associated Stellar signatures are available on TFChain. This leads to a growing backlog of unhandled transactions, impacting the bridge's efficiency and reliability.
Additional context
Stellar Multi-signature and TFChain's Role
-
Stellar Multi-signature for TFT Burns
When TFT is to be unlock on the Stellar network (e.g., as part of a withdrawal from TFChain), a multi-signature Stellar transaction is required.
This transaction needs to be signed by multiple designated validators (bridge daemons) to ensure security and decentralization. -
Signatures on TFChain
To facilitate this, individual validators submit their signatures for a given burn transaction to TFChain. These signatures are stored on TFChain's state, making them accessible to the bridge daemon. -
Stellar Sequence Numbers
Every Stellar account has a sequence number, which is a monotonically increasing integer. Each new transaction originating from that account must have a sequence number exactly one greater than the account's current sequence number.
This mechanism prevents transaction replay attacks and ensures transaction ordering. If the Stellar bridge account's sequence number advances before a transaction is submitted, any signatures based on the old sequence number become invalid.
The 2-Minute Signature Expiry Mechanism
TFChain incorporates a defensive mechanism within its pallet-tft-bridge to manage these Stellar signatures. Signatures associated with BurnTransactionReady events are periodically cleared from TFChain state after approximately 2 minutes.
- Why the Expiry?
This timeout is a safety measure. Stellar's sequence numbers are critical for transaction validity. If the Stellar bridge account's sequence number advances (e.g., due to another transaction being submitted), any pending signatures for older transactions become stale and unusable.
Clearing these old signatures prevents the accumulation of invalid data on-chain and reduces the risk of being stuck attempting to submit transactions that are guaranteed to fail.
How the Expiry Causes the Issue
-
Event Backlog
The bridge daemon processes all incoming TFChain events sequentially. This includes:WithdrawCreatedEventsWithdrawExpiredEventsRefundExpiredEventsRefundReadyEventsWithdrawReadyEvents(BurnTransactionReadyevents)
-
Processing Delays
- High Event Volume: During periods of high activity, or after bridge outages, a large volume of events can accumulate.
- Legacy Unhandled Transactions: The bridge also faces a backlog of transactions that previous versions were unable to process.
- These factors cause the bridge daemon to spend significant time processing other events before reaching
WithdrawReadyEvents.
-
Race Condition / Stale Signatures
By the time the bridge daemon reaches aWithdrawReadyEventemitted more than 2 minutes ago, the corresponding Stellar signatures have already been cleared from TFChain by the expiry mechanism. -
Failed Processing and Accumulation
When the bridge attempts to process such an event, it finds no valid signatures. Consequently:- It cannot construct and submit the stellar transaction.
- The transaction remains in an unhandled state, contributing to the growing count of "stuck" transactions in TFChain storage.
- This creates a continuous cycle where new
BurnTransactionReadyevents are emitted, but processing is delayed until their signatures expire, leading to an ever-increasing backlog.
Proposed Solution: Batching processing TFChain events
To alleviate the processing bottleneck and prevent BurnTransactionReady events from becoming unprocessable, I propose to optimize the handling of all TFChain events.
-
Batching all TFChain calls
- Instead of processing each event (
WithdrawExpiredEvent, etc) individually, the bridge daemon will collect and process these events into a batch
- Instead of processing each event (
-
Single Batched Transaction
- A single Substrate extrinsic using
Utility.force_allwill be constructed and submitted to TFChain. - This batched extrinsic will contain multiple calls (
TFTBridgeModule.propose_burn_transaction_or_add_sig,proposeOrVoteMintTransactionCall, etc, each corresponding to a deposit, withdrawal, or refund operation.
- A single Substrate extrinsic using
-
Benefits
- Reduced Transaction Overhead: Submitting one batched transaction instead of many individual ones reduces overhead from signing, network propagation, and block inclusion.
- Improved Throughput: Efficiently clears the backlog of
WithdrawExpiredEvents, allowing the bridge daemon to reach and process criticalWithdrawReadyEventsmore quickly—ideally before their 2-minute signature expiry.
This approach should help the bridge catch up on the existing backlog of unhandled transactions and prevent future ones from accumulating due to signature expiry.