bridge: Significant Delays in BurnTransactionReady/RefundTransactionReady Processing Leading to Accumulating Unhandled Transactions

## Describe the bug

The TFChain bridge is experiencing a critical operational issue where `BurnTransactionReady` events (internally handled as `WithdrawReadyEvents` by the bridge daemon) are processed with significant delays, often exceeding the **2-minute window** during which their associated Stellar signatures are available on TFChain. This leads to a growing backlog of unhandled transactions, impacting the bridge's efficiency and reliability.

## Additional context

### Stellar Multi-signature and TFChain's Role

1. **Stellar Multi-signature for TFT Burns**  
   When TFT is to be unlock on the Stellar network (e.g., as part of a withdrawal from TFChain), a multi-signature Stellar transaction is required.  
   This transaction needs to be signed by multiple designated validators (bridge daemons) to ensure security and decentralization.

2. **Signatures on TFChain**  
   To facilitate this, individual validators submit their signatures for a given burn transaction to TFChain. These signatures are stored on TFChain's state, making them accessible to the bridge daemon.

3. **Stellar Sequence Numbers**  
   Every Stellar account has a sequence number, which is a monotonically increasing integer. Each new transaction originating from that account must have a sequence number exactly one greater than the account's current sequence number.  
   This mechanism prevents transaction replay attacks and ensures transaction ordering. If the Stellar bridge account's sequence number advances before a transaction is submitted, any signatures based on the old sequence number become invalid.

---

### The 2-Minute Signature Expiry Mechanism

TFChain incorporates a defensive mechanism within its `pallet-tft-bridge` to manage these Stellar signatures. Signatures associated with `BurnTransactionReady` events are periodically cleared from TFChain state after approximately **2 minutes**.

- **Why the Expiry?**  
  This timeout is a safety measure. Stellar's sequence numbers are critical for transaction validity. If the Stellar bridge account's sequence number advances (e.g., due to another transaction being submitted), any pending signatures for older transactions become stale and unusable.  
  Clearing these old signatures prevents the accumulation of invalid data on-chain and reduces the risk of being stuck attempting to submit transactions that are guaranteed to fail.

---

### How the Expiry Causes the Issue

1. **Event Backlog**  
   The bridge daemon processes all incoming TFChain events sequentially. This includes:
   - `WithdrawCreatedEvents`
   - `WithdrawExpiredEvents`
   - `RefundExpiredEvents`
   - `RefundReadyEvents`
   - `WithdrawReadyEvents` (`BurnTransactionReady` events)

2. **Processing Delays**
   - **High Event Volume:** During periods of high activity, or after bridge outages, a large volume of events can accumulate.
   - **Legacy Unhandled Transactions:** The bridge also faces a backlog of transactions that previous versions were unable to process.
   - These factors cause the bridge daemon to spend significant time processing other events before reaching `WithdrawReadyEvents`.

3. **Race Condition / Stale Signatures**  
   By the time the bridge daemon reaches a `WithdrawReadyEvent` emitted more than 2 minutes ago, the corresponding Stellar signatures have already been cleared from TFChain by the expiry mechanism.

4. **Failed Processing and Accumulation**  
   When the bridge attempts to process such an event, it finds no valid signatures. Consequently:
   - It cannot construct and submit the stellar transaction.
   - The transaction remains in an unhandled state, contributing to the growing count of "stuck" transactions in TFChain storage.
   - This creates a continuous cycle where new `BurnTransactionReady` events are emitted, but processing is delayed until their signatures expire, leading to an ever-increasing backlog.

---

### Proposed Solution: Batching processing TFChain events 

To alleviate the processing bottleneck and prevent `BurnTransactionReady` events from becoming unprocessable, I propose to optimize the handling of all TFChain events.

1. **Batching all TFChain calls**
   - Instead of processing each event (`WithdrawExpiredEvent`, etc) individually, the bridge daemon will collect and process these events into a batch

2. **Single Batched Transaction**
   - A single Substrate extrinsic using `Utility.force_all` will be constructed and submitted to TFChain.
   - This batched extrinsic will contain multiple calls ( `TFTBridgeModule.propose_burn_transaction_or_add_sig`, `proposeOrVoteMintTransactionCall`, etc, each corresponding to a deposit, withdrawal, or refund operation.

3. **Benefits**
   - **Reduced Transaction Overhead:** Submitting one batched transaction instead of many individual ones reduces overhead from signing, network propagation, and block inclusion.
   - **Improved Throughput:** Efficiently clears the backlog of `WithdrawExpiredEvents`, allowing the bridge daemon to reach and process critical `WithdrawReadyEvents` more quickly—ideally before their 2-minute signature expiry.
---

This approach should help the bridge catch up on the existing backlog of unhandled transactions and prevent future ones from accumulating due to signature expiry.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bridge: Significant Delays in BurnTransactionReady/RefundTransactionReady Processing Leading to Accumulating Unhandled Transactions #1053

Describe the bug

Additional context

Stellar Multi-signature and TFChain's Role

The 2-Minute Signature Expiry Mechanism

How the Expiry Causes the Issue

Proposed Solution: Batching processing TFChain events

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bridge: Significant Delays in BurnTransactionReady/RefundTransactionReady Processing Leading to Accumulating Unhandled Transactions #1053

Description

Describe the bug

Additional context

Stellar Multi-signature and TFChain's Role

The 2-Minute Signature Expiry Mechanism

How the Expiry Causes the Issue

Proposed Solution: Batching processing TFChain events

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions