# Distributed Transactions: 2PC, 3PC & Saga Pattern

Distributed transactions ensure data consistency across multiple services or databases. This notebook covers the main approaches: **Two-Phase Commit (2PC)**, **Three-Phase Commit (3PC)**, and the **Saga Pattern**.

---

## Why Distributed Transactions?

In microservices and distributed systems, a single business operation often spans multiple services/databases. We need mechanisms to ensure:
- **Atomicity**: All operations succeed or all fail
- **Consistency**: Data remains valid across all nodes
- **Isolation**: Concurrent transactions don't interfere
- **Durability**: Committed changes persist

---

## Two-Phase Commit (2PC)

2PC is a **blocking atomic commitment protocol** that coordinates all participants to commit or abort a transaction.

### How It Works

```
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê         ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê         ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ Coordinator ‚îÇ         ‚îÇ Participant ‚îÇ         ‚îÇ Participant ‚îÇ
‚îÇ             ‚îÇ         ‚îÇ     A       ‚îÇ         ‚îÇ     B       ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò         ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò         ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
       ‚îÇ                       ‚îÇ                       ‚îÇ
       ‚îÇ  PHASE 1: PREPARE     ‚îÇ                       ‚îÇ
       ‚îÇ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ>‚îÇ                       ‚îÇ
       ‚îÇ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ>‚îÇ
       ‚îÇ                       ‚îÇ                       ‚îÇ
       ‚îÇ<‚îÄ‚îÄ VOTE_COMMIT ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÇ                       ‚îÇ
       ‚îÇ<‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ VOTE_COMMIT ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÇ
       ‚îÇ                       ‚îÇ                       ‚îÇ
       ‚îÇ  PHASE 2: COMMIT      ‚îÇ                       ‚îÇ
       ‚îÇ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ>‚îÇ                       ‚îÇ
       ‚îÇ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ>‚îÇ
       ‚îÇ                       ‚îÇ                       ‚îÇ
       ‚îÇ<‚îÄ‚îÄ ACK ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÇ                       ‚îÇ
       ‚îÇ<‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ ACK ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÇ
       ‚ñº                       ‚ñº                       ‚ñº
```

### Phase 1: Prepare (Voting Phase)
1. Coordinator sends `PREPARE` request to all participants
2. Each participant:
   - Executes the transaction up to the commit point
   - Writes changes to local log (for recovery)
   - Votes `COMMIT` if ready, `ABORT` if cannot proceed

### Phase 2: Commit (Decision Phase)
1. If **all participants voted COMMIT**:
   - Coordinator sends `COMMIT` to all
   - Participants finalize and release locks
2. If **any participant voted ABORT**:
   - Coordinator sends `ROLLBACK` to all
   - Participants undo changes

In [None]:
import enum
import time
import random
import threading
from dataclasses import dataclass, field
from typing import Dict, List, Optional, Callable
from abc import ABC, abstractmethod
import logging

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(message)s')

class TransactionState(enum.Enum):
    INIT = "INIT"
    PREPARING = "PREPARING"
    PREPARED = "PREPARED"
    COMMITTING = "COMMITTING"
    COMMITTED = "COMMITTED"
    ABORTING = "ABORTING"
    ABORTED = "ABORTED"

class Vote(enum.Enum):
    COMMIT = "COMMIT"
    ABORT = "ABORT"

In [None]:
@dataclass
class TransactionLog:
    """Write-ahead log for durability and recovery."""
    entries: List[Dict] = field(default_factory=list)
    
    def write(self, tx_id: str, state: TransactionState, data: Optional[Dict] = None):
        entry = {
            "tx_id": tx_id,
            "state": state.value,
            "timestamp": time.time(),
            "data": data or {}
        }
        self.entries.append(entry)
        # In production: fsync to disk
        print(f"  üìù LOG: tx={tx_id}, state={state.value}")
    
    def get_last_state(self, tx_id: str) -> Optional[TransactionState]:
        for entry in reversed(self.entries):
            if entry["tx_id"] == tx_id:
                return TransactionState(entry["state"])
        return None

In [None]:
class Participant:
    """2PC Participant that can prepare, commit, or abort transactions."""
    
    def __init__(self, name: str, failure_probability: float = 0.0):
        self.name = name
        self.failure_probability = failure_probability
        self.log = TransactionLog()
        self.pending_data: Dict[str, Dict] = {}  # tx_id -> data
        self.committed_data: Dict[str, Dict] = {}
        self.logger = logging.getLogger(f"Participant-{name}")
    
    def prepare(self, tx_id: str, data: Dict) -> Vote:
        """Phase 1: Prepare to commit - validate and lock resources."""
        self.logger.info(f"Preparing transaction {tx_id}")
        
        # Simulate random failure
        if random.random() < self.failure_probability:
            self.logger.warning(f"Simulated failure during prepare!")
            self.log.write(tx_id, TransactionState.ABORTED)
            return Vote.ABORT
        
        # Validate transaction (e.g., check constraints)
        if not self._validate(data):
            self.log.write(tx_id, TransactionState.ABORTED)
            return Vote.ABORT
        
        # Write to log and hold data in pending state
        self.pending_data[tx_id] = data
        self.log.write(tx_id, TransactionState.PREPARED, data)
        
        self.logger.info(f"Voted COMMIT for {tx_id}")
        return Vote.COMMIT
    
    def commit(self, tx_id: str) -> bool:
        """Phase 2: Commit the prepared transaction."""
        self.logger.info(f"Committing transaction {tx_id}")
        
        if tx_id not in self.pending_data:
            self.logger.error(f"No pending data for {tx_id}")
            return False
        
        # Move from pending to committed
        self.committed_data[tx_id] = self.pending_data.pop(tx_id)
        self.log.write(tx_id, TransactionState.COMMITTED)
        
        self.logger.info(f"Successfully committed {tx_id}")
        return True
    
    def abort(self, tx_id: str) -> bool:
        """Abort and rollback the transaction."""
        self.logger.info(f"Aborting transaction {tx_id}")
        
        # Remove from pending if exists
        self.pending_data.pop(tx_id, None)
        self.log.write(tx_id, TransactionState.ABORTED)
        
        return True
    
    def _validate(self, data: Dict) -> bool:
        """Validate transaction data (e.g., check balance >= 0)."""
        amount = data.get("amount", 0)
        return amount >= 0  # Simple validation

In [None]:
class TwoPhaseCommitCoordinator:
    """Coordinator for Two-Phase Commit protocol."""
    
    def __init__(self, participants: List[Participant], timeout: float = 5.0):
        self.participants = participants
        self.timeout = timeout
        self.log = TransactionLog()
        self.logger = logging.getLogger("Coordinator")
    
    def execute(self, tx_id: str, participant_data: Dict[str, Dict]) -> bool:
        """
        Execute a distributed transaction across all participants.
        
        Args:
            tx_id: Unique transaction identifier
            participant_data: Dict mapping participant name to transaction data
        
        Returns:
            bool: True if committed, False if aborted
        """
        self.logger.info(f"Starting 2PC for transaction {tx_id}")
        self.log.write(tx_id, TransactionState.INIT)
        
        # Phase 1: Prepare
        votes = self._phase1_prepare(tx_id, participant_data)
        
        # Decision
        all_committed = all(v == Vote.COMMIT for v in votes.values())
        
        # Phase 2: Commit or Abort
        if all_committed:
            self.logger.info(f"All participants voted COMMIT - committing")
            return self._phase2_commit(tx_id)
        else:
            self.logger.warning(f"Some participants voted ABORT - aborting")
            return self._phase2_abort(tx_id)
    
    def _phase1_prepare(self, tx_id: str, participant_data: Dict[str, Dict]) -> Dict[str, Vote]:
        """Send PREPARE to all participants and collect votes."""
        self.log.write(tx_id, TransactionState.PREPARING)
        votes: Dict[str, Vote] = {}
        
        print(f"\n{'='*50}")
        print(f"PHASE 1: PREPARE")
        print(f"{'='*50}")
        
        for participant in self.participants:
            data = participant_data.get(participant.name, {})
            try:
                vote = participant.prepare(tx_id, data)
                votes[participant.name] = vote
            except Exception as e:
                self.logger.error(f"Participant {participant.name} failed: {e}")
                votes[participant.name] = Vote.ABORT
        
        self.log.write(tx_id, TransactionState.PREPARED, {"votes": {k: v.value for k, v in votes.items()}})
        return votes
    
    def _phase2_commit(self, tx_id: str) -> bool:
        """Send COMMIT to all participants."""
        self.log.write(tx_id, TransactionState.COMMITTING)
        
        print(f"\n{'='*50}")
        print(f"PHASE 2: COMMIT")
        print(f"{'='*50}")
        
        success = True
        for participant in self.participants:
            if not participant.commit(tx_id):
                success = False
        
        self.log.write(tx_id, TransactionState.COMMITTED)
        return success
    
    def _phase2_abort(self, tx_id: str) -> bool:
        """Send ABORT to all participants."""
        self.log.write(tx_id, TransactionState.ABORTING)
        
        print(f"\n{'='*50}")
        print(f"PHASE 2: ABORT")
        print(f"{'='*50}")
        
        for participant in self.participants:
            participant.abort(tx_id)
        
        self.log.write(tx_id, TransactionState.ABORTED)
        return False

In [None]:
# Example: Successful 2PC Transaction
print("\n" + "="*60)
print("EXAMPLE 1: Successful Distributed Transaction")
print("="*60)

# Create participants (bank accounts)
account_a = Participant("AccountA")
account_b = Participant("AccountB")

# Create coordinator
coordinator = TwoPhaseCommitCoordinator([account_a, account_b])

# Execute transfer: debit A, credit B
result = coordinator.execute(
    tx_id="TX-001",
    participant_data={
        "AccountA": {"operation": "debit", "amount": 100},
        "AccountB": {"operation": "credit", "amount": 100}
    }
)

print(f"\n‚úÖ Transaction result: {'COMMITTED' if result else 'ABORTED'}")

In [None]:
# Example: Failed 2PC Transaction (one participant fails)
print("\n" + "="*60)
print("EXAMPLE 2: Failed Transaction (Participant Failure)")
print("="*60)

# Create participants with one having high failure rate
account_a = Participant("AccountA")
account_b = Participant("AccountB", failure_probability=1.0)  # Always fails

coordinator = TwoPhaseCommitCoordinator([account_a, account_b])

result = coordinator.execute(
    tx_id="TX-002",
    participant_data={
        "AccountA": {"operation": "debit", "amount": 100},
        "AccountB": {"operation": "credit", "amount": 100}
    }
)

print(f"\n‚ùå Transaction result: {'COMMITTED' if result else 'ABORTED'}")

### 2PC Limitations

| Issue | Description |
|-------|-------------|
| **Blocking** | Participants hold locks while waiting for coordinator decision |
| **Coordinator SPOF** | If coordinator fails after prepare, participants are stuck |
| **Network partitions** | Timeouts can cause inconsistent states |
| **Latency** | Two round-trips required (prepare + commit) |

---

## Three-Phase Commit (3PC)

3PC adds a **pre-commit phase** to reduce blocking scenarios.

### Phases

```
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê         ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ Coordinator ‚îÇ         ‚îÇ Participant ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò         ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
       ‚îÇ                       ‚îÇ
       ‚îÇ  1. CAN_COMMIT?       ‚îÇ
       ‚îÇ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ>‚îÇ
       ‚îÇ<‚îÄ‚îÄ YES/NO ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÇ
       ‚îÇ                       ‚îÇ
       ‚îÇ  2. PRE_COMMIT        ‚îÇ  ‚Üê New phase!
       ‚îÇ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ>‚îÇ
       ‚îÇ<‚îÄ‚îÄ ACK ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÇ
       ‚îÇ                       ‚îÇ
       ‚îÇ  3. DO_COMMIT         ‚îÇ
       ‚îÇ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ>‚îÇ
       ‚îÇ<‚îÄ‚îÄ ACK ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÇ
       ‚ñº                       ‚ñº
```

### Phase 1: CanCommit (Voting)
- Coordinator asks "Can you commit?"
- Participants respond YES or NO
- No locks acquired yet

### Phase 2: PreCommit
- If all voted YES: send PRE_COMMIT
- Participants prepare and acknowledge
- Coordinator failure here ‚Üí participants can timeout and commit

### Phase 3: DoCommit
- Coordinator sends final COMMIT
- Participants finalize transaction

### 3PC vs 2PC

| Aspect | 2PC | 3PC |
|--------|-----|-----|
| Rounds | 2 | 3 |
| Blocking | High (locks during prepare) | Lower (timeout recovery) |
| Coordinator failure recovery | Stuck until recovery | Can proceed with timeout |
| Network partition tolerance | Poor | Still limited |

---

## Saga Pattern

The **Saga Pattern** is an alternative that avoids distributed locking. Instead of one atomic transaction, we use a sequence of **local transactions** with **compensating actions** for rollback.

### How It Works

```
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ                         SAGA                                  ‚îÇ
‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î§
‚îÇ                                                              ‚îÇ
‚îÇ  T1 ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ> T2 ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ> T3 ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ> T4 ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ> SUCCESS ‚úì      ‚îÇ
‚îÇ   ‚îÇ          ‚îÇ          ‚îÇ          ‚îÇ                         ‚îÇ
‚îÇ   ‚îÇ          ‚îÇ          ‚îÇ          ‚îÇ                         ‚îÇ
‚îÇ   ‚ñº          ‚ñº          ‚ñº          ‚ñº                         ‚îÇ
‚îÇ  C1 <‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ C2 <‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ C3 <‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ C4         FAILURE ‚úó      ‚îÇ
‚îÇ  (compensating transactions - executed on failure)          ‚îÇ
‚îÇ                                                              ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
```

### Saga Types

#### 1. Choreography (Event-Driven)
- Services emit events when completing their step
- Next service listens and reacts
- Decentralized, no single coordinator

#### 2. Orchestration (Central Coordinator)
- Central orchestrator tells each service what to do
- Easier to understand and debug
- Single point of control

In [None]:
from dataclasses import dataclass
from typing import Callable, Optional, Any

@dataclass
class SagaStep:
    """A single step in a saga with its compensating action."""
    name: str
    action: Callable[..., Any]          # Forward action
    compensation: Callable[..., Any]    # Rollback action
    

class SagaExecutionError(Exception):
    """Raised when a saga step fails."""
    def __init__(self, step_name: str, original_error: Exception):
        self.step_name = step_name
        self.original_error = original_error
        super().__init__(f"Saga failed at step '{step_name}': {original_error}")

In [None]:
class SagaOrchestrator:
    """Orchestrator for executing sagas with automatic compensation."""
    
    def __init__(self, name: str):
        self.name = name
        self.steps: List[SagaStep] = []
        self.completed_steps: List[SagaStep] = []
        self.context: Dict[str, Any] = {}  # Shared context between steps
        self.logger = logging.getLogger(f"Saga-{name}")
    
    def add_step(self, step: SagaStep) -> 'SagaOrchestrator':
        """Add a step to the saga (builder pattern)."""
        self.steps.append(step)
        return self
    
    def execute(self, initial_context: Dict[str, Any] = None) -> Dict[str, Any]:
        """
        Execute all saga steps.
        On failure, automatically executes compensating transactions.
        """
        self.context = initial_context or {}
        self.completed_steps = []
        
        self.logger.info(f"Starting saga '{self.name}'")
        print(f"\n{'='*60}")
        print(f"SAGA: {self.name}")
        print(f"{'='*60}")
        
        try:
            for step in self.steps:
                self._execute_step(step)
            
            print(f"\n‚úÖ Saga '{self.name}' completed successfully!")
            return self.context
            
        except SagaExecutionError as e:
            self.logger.error(f"Saga failed: {e}")
            self._compensate()
            raise
    
    def _execute_step(self, step: SagaStep):
        """Execute a single step and track it for potential compensation."""
        print(f"\n‚ñ∂ Executing step: {step.name}")
        
        try:
            result = step.action(self.context)
            if result:
                self.context.update(result)
            self.completed_steps.append(step)
            print(f"  ‚úì Step '{step.name}' completed")
            
        except Exception as e:
            print(f"  ‚úó Step '{step.name}' failed: {e}")
            raise SagaExecutionError(step.name, e)
    
    def _compensate(self):
        """Execute compensating transactions in reverse order."""
        print(f"\n{'='*60}")
        print(f"COMPENSATING (Rolling back {len(self.completed_steps)} steps)")
        print(f"{'='*60}")
        
        for step in reversed(self.completed_steps):
            try:
                print(f"\n‚óÄ Compensating step: {step.name}")
                step.compensation(self.context)
                print(f"  ‚úì Compensation for '{step.name}' completed")
            except Exception as e:
                # Log but continue with other compensations
                self.logger.error(f"Compensation failed for '{step.name}': {e}")
                print(f"  ‚ö† Compensation for '{step.name}' failed: {e}")

In [None]:
# Simulated services for e-commerce order saga

class OrderService:
    orders = {}
    
    @staticmethod
    def create_order(ctx: Dict) -> Dict:
        order_id = f"ORD-{random.randint(1000, 9999)}"
        OrderService.orders[order_id] = {
            "customer_id": ctx["customer_id"],
            "items": ctx["items"],
            "status": "PENDING"
        }
        print(f"    Created order {order_id}")
        return {"order_id": order_id}
    
    @staticmethod
    def cancel_order(ctx: Dict):
        order_id = ctx.get("order_id")
        if order_id and order_id in OrderService.orders:
            OrderService.orders[order_id]["status"] = "CANCELLED"
            print(f"    Cancelled order {order_id}")


class InventoryService:
    inventory = {"ITEM-001": 10, "ITEM-002": 5}
    reserved = {}
    
    @staticmethod
    def reserve_items(ctx: Dict) -> Dict:
        order_id = ctx["order_id"]
        items = ctx["items"]
        
        for item_id, qty in items.items():
            if InventoryService.inventory.get(item_id, 0) < qty:
                raise ValueError(f"Insufficient stock for {item_id}")
        
        # Reserve items
        InventoryService.reserved[order_id] = items
        for item_id, qty in items.items():
            InventoryService.inventory[item_id] -= qty
        
        print(f"    Reserved items for order {order_id}")
        return {"inventory_reserved": True}
    
    @staticmethod
    def release_items(ctx: Dict):
        order_id = ctx.get("order_id")
        if order_id and order_id in InventoryService.reserved:
            items = InventoryService.reserved.pop(order_id)
            for item_id, qty in items.items():
                InventoryService.inventory[item_id] += qty
            print(f"    Released items for order {order_id}")


class PaymentService:
    payments = {}
    should_fail = False  # Toggle for testing failure scenarios
    
    @staticmethod
    def process_payment(ctx: Dict) -> Dict:
        if PaymentService.should_fail:
            raise ValueError("Payment declined!")
        
        order_id = ctx["order_id"]
        payment_id = f"PAY-{random.randint(1000, 9999)}"
        PaymentService.payments[payment_id] = {
            "order_id": order_id,
            "amount": ctx.get("amount", 100),
            "status": "COMPLETED"
        }
        print(f"    Processed payment {payment_id}")
        return {"payment_id": payment_id}
    
    @staticmethod
    def refund_payment(ctx: Dict):
        payment_id = ctx.get("payment_id")
        if payment_id and payment_id in PaymentService.payments:
            PaymentService.payments[payment_id]["status"] = "REFUNDED"
            print(f"    Refunded payment {payment_id}")


class ShippingService:
    shipments = {}
    
    @staticmethod
    def create_shipment(ctx: Dict) -> Dict:
        order_id = ctx["order_id"]
        shipment_id = f"SHIP-{random.randint(1000, 9999)}"
        ShippingService.shipments[shipment_id] = {
            "order_id": order_id,
            "status": "SCHEDULED"
        }
        print(f"    Created shipment {shipment_id}")
        return {"shipment_id": shipment_id}
    
    @staticmethod
    def cancel_shipment(ctx: Dict):
        shipment_id = ctx.get("shipment_id")
        if shipment_id and shipment_id in ShippingService.shipments:
            ShippingService.shipments[shipment_id]["status"] = "CANCELLED"
            print(f"    Cancelled shipment {shipment_id}")

In [None]:
def create_order_saga() -> SagaOrchestrator:
    """Create an e-commerce order saga with all steps."""
    saga = SagaOrchestrator("CreateOrder")
    
    saga.add_step(SagaStep(
        name="Create Order",
        action=OrderService.create_order,
        compensation=OrderService.cancel_order
    ))
    
    saga.add_step(SagaStep(
        name="Reserve Inventory",
        action=InventoryService.reserve_items,
        compensation=InventoryService.release_items
    ))
    
    saga.add_step(SagaStep(
        name="Process Payment",
        action=PaymentService.process_payment,
        compensation=PaymentService.refund_payment
    ))
    
    saga.add_step(SagaStep(
        name="Create Shipment",
        action=ShippingService.create_shipment,
        compensation=ShippingService.cancel_shipment
    ))
    
    return saga

In [None]:
# Example 1: Successful Saga
print("\n" + "#"*60)
print("EXAMPLE 1: Successful Order Saga")
print("#"*60)

PaymentService.should_fail = False

saga = create_order_saga()
try:
    result = saga.execute({
        "customer_id": "CUST-123",
        "items": {"ITEM-001": 2},
        "amount": 99.99
    })
    print(f"\nFinal context: {result}")
except SagaExecutionError:
    print("\n‚ùå Saga was rolled back!")

In [None]:
# Example 2: Failed Saga with Compensation
print("\n" + "#"*60)
print("EXAMPLE 2: Failed Order Saga (Payment Declined)")
print("#"*60)

PaymentService.should_fail = True  # Simulate payment failure

saga = create_order_saga()
try:
    result = saga.execute({
        "customer_id": "CUST-456",
        "items": {"ITEM-001": 1},
        "amount": 149.99
    })
except SagaExecutionError as e:
    print(f"\n‚ùå Saga failed and was compensated: {e.step_name}")

---

## Comparison of Distributed Transaction Approaches

| Aspect | 2PC | 3PC | Saga (Orchestration) | Saga (Choreography) |
|--------|-----|-----|---------------------|---------------------|
| **Consistency** | Strong (ACID) | Strong (ACID) | Eventual | Eventual |
| **Isolation** | Full (locks held) | Full (locks held) | None (compensations) | None (compensations) |
| **Blocking** | High | Medium | Low | Low |
| **Latency** | High (2 phases) | Higher (3 phases) | Low per step | Low per step |
| **Failure Recovery** | Coordinator WAL | Timeout recovery | Compensating txns | Compensating txns |
| **Coordinator SPOF** | Yes | Reduced | Yes | No |
| **Complexity** | Medium | High | Medium | High (debugging) |
| **Use Cases** | Databases, XA | Rare (theoretical) | Microservices | Event-driven systems |
| **Scalability** | Limited | Limited | High | High |

### When to Use Each

| Pattern | Best For |
|---------|----------|
| **2PC** | Traditional databases, short transactions, strong consistency requirements |
| **3PC** | Rarely used in practice; theoretical improvement over 2PC |
| **Saga Orchestration** | Microservices with clear business workflows, easier debugging |
| **Saga Choreography** | Highly decoupled services, event-driven architectures |

---

## Saga Design Considerations

### 1. Compensating Transaction Design
```python
# Good: Idempotent compensation
def cancel_order(order_id):
    order = get_order(order_id)
    if order.status != "CANCELLED":  # Check before cancelling
        order.status = "CANCELLED"
        save(order)

# Bad: Non-idempotent
def cancel_order(order_id):
    order.status = "CANCELLED"  # No check - might fail on retry
```

### 2. Semantic Locks
Use application-level flags to prevent concurrent modifications:
```python
order.status = "PENDING"      # Semantic lock
order.status = "CONFIRMED"    # Release lock
```

### 3. Handling Pivot Transactions
A **pivot transaction** is the point of no return:
- Before pivot: Can compensate
- After pivot: Must complete forward
- Example: Once shipment is dispatched, we can't "unship" - we must complete delivery

---

## üéØ Key Takeaways

### Two-Phase Commit (2PC)
- ‚úÖ Provides strong consistency (ACID)
- ‚ùå Blocking protocol - participants hold locks
- ‚ùå Coordinator is a single point of failure
- üìå Best for: Traditional databases with XA support

### Three-Phase Commit (3PC)
- ‚úÖ Reduces blocking with pre-commit phase
- ‚úÖ Better timeout-based recovery
- ‚ùå Still doesn't handle network partitions well
- üìå Rarely used in practice

### Saga Pattern
- ‚úÖ No distributed locks - higher availability
- ‚úÖ Works well with microservices
- ‚ùå Only eventual consistency
- ‚ùå Compensations must be carefully designed (idempotent, commutative)
- üìå Best for: Long-running transactions, microservices

### Design Principles
1. **Make compensations idempotent** - Safe to retry
2. **Use semantic locks** - Prevent concurrent saga conflicts
3. **Identify pivot transactions** - Point of no return
4. **Log everything** - Enable debugging and recovery
5. **Consider timeouts carefully** - Balance consistency vs availability

### The CAP Trade-off
- **2PC/3PC**: Choose **Consistency** over Availability
- **Saga**: Choose **Availability** over Consistency (eventual consistency)