# Consistency Models: Strong, Eventual, Causal

In distributed systems, **consistency models** define the contract between the data store and applications about how updates propagate across replicas. Understanding these models is crucial for designing systems that balance correctness, availability, and performance.

---

## Overview of Consistency Levels

| Model | Guarantee | Latency | Use Case |
|-------|-----------|---------|----------|
| **Strong** | All reads see the latest write | High | Banking, inventory |
| **Eventual** | Reads eventually converge | Low | Social media, DNS |
| **Causal** | Preserves cause-effect ordering | Medium | Collaborative editing |

## Strong Consistency

**Strong consistency** (also called **linearizability**) guarantees that:
- All operations appear to execute atomically in some total order
- Every read returns the most recent write
- The system behaves as if there's only one copy of the data

### How It Works
```
Client A writes X=5  ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚ñ∫  All replicas synchronized
                                              ‚îÇ
Client B reads X     ‚óÑ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò  Returns 5 (guaranteed)
```

### Trade-offs
- ‚úÖ **Simplicity**: Applications don't need to handle stale reads
- ‚úÖ **Correctness**: Critical for financial transactions
- ‚ùå **Latency**: Must wait for consensus/synchronization
- ‚ùå **Availability**: Network partitions may block operations (CAP theorem)

## Eventual Consistency

**Eventual consistency** guarantees that:
- If no new updates are made, all replicas will *eventually* converge
- Reads may return stale data temporarily
- No ordering guarantees between operations

### How It Works
```
Client A writes X=5  ‚îÄ‚îÄ‚ñ∫ Replica 1 (X=5)
                              ‚îÇ (async replication)
                              ‚ñº
                         Replica 2 (X=5) ... eventually
                              ‚îÇ
Client B reads X     ‚óÑ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò  May return old value!
```

### Trade-offs
- ‚úÖ **Low latency**: Writes return immediately
- ‚úÖ **High availability**: Works during network partitions
- ‚ùå **Complexity**: Applications must handle stale/conflicting data
- ‚ùå **Conflict resolution**: Need strategies like LWW, CRDTs

## Causal Consistency

**Causal consistency** is a middle ground that preserves **cause-and-effect** relationships:
- If operation A *happened before* operation B, all nodes see A before B
- Concurrent operations (no causal relationship) may be seen in any order

### Vector Clocks

Vector clocks track causality across distributed nodes:

```
Node A: [A:1, B:0, C:0]  ‚îÄ‚îÄwrite‚îÄ‚îÄ‚ñ∫  [A:2, B:0, C:0]
             ‚îÇ
             ‚ñº (message)
Node B: [A:0, B:0, C:0]  ‚îÄ‚îÄreceive‚îÄ‚îÄ‚ñ∫  [A:2, B:1, C:0]
```

**Rules:**
1. Before each local event, increment own counter
2. When sending, include current vector clock
3. When receiving, merge clocks (element-wise max) and increment

**Comparing clocks:**
- V1 < V2 if all V1[i] ‚â§ V2[i] and at least one V1[i] < V2[i]
- Concurrent if neither V1 < V2 nor V2 < V1

In [None]:
import random
import time
from dataclasses import dataclass, field
from typing import Dict, List, Optional, Tuple
from collections import defaultdict
import plotly.graph_objects as go
from plotly.subplots import make_subplots

## Python Simulation: Consistency Models

Let's build a simulation to demonstrate how different consistency models behave and where violations can occur.

In [None]:
@dataclass
class Operation:
    """Represents a read or write operation."""
    op_type: str  # 'read' or 'write'
    key: str
    value: Optional[int] = None
    timestamp: float = 0.0
    node_id: str = ""
    result: Optional[int] = None


@dataclass
class Replica:
    """Simulates a database replica."""
    node_id: str
    data: Dict[str, int] = field(default_factory=dict)
    vector_clock: Dict[str, int] = field(default_factory=lambda: defaultdict(int))
    pending_writes: List[Tuple[str, int, Dict[str, int]]] = field(default_factory=list)
    
    def local_write(self, key: str, value: int) -> Dict[str, int]:
        """Perform a local write and update vector clock."""
        self.vector_clock[self.node_id] += 1
        self.data[key] = value
        return dict(self.vector_clock)
    
    def local_read(self, key: str) -> Optional[int]:
        """Read from local data store."""
        return self.data.get(key)
    
    def receive_write(self, key: str, value: int, sender_clock: Dict[str, int]):
        """Receive a replicated write from another node."""
        # Merge vector clocks (element-wise max)
        for node, ts in sender_clock.items():
            self.vector_clock[node] = max(self.vector_clock[node], ts)
        self.vector_clock[self.node_id] += 1
        self.data[key] = value

In [None]:
class DistributedSystem:
    """Simulates a distributed database with configurable consistency."""
    
    def __init__(self, num_replicas: int = 3, consistency: str = 'eventual'):
        self.replicas = {f"node_{i}": Replica(f"node_{i}") for i in range(num_replicas)}
        self.consistency = consistency  # 'strong', 'eventual', 'causal'
        self.operation_log: List[Operation] = []
        self.replication_delay = 0.1  # Simulated network delay
        
    def write(self, key: str, value: int, node_id: str = None) -> Operation:
        """Write to the distributed system."""
        if node_id is None:
            node_id = random.choice(list(self.replicas.keys()))
        
        op = Operation('write', key, value, time.time(), node_id)
        replica = self.replicas[node_id]
        clock = replica.local_write(key, value)
        
        if self.consistency == 'strong':
            # Synchronous replication to all nodes
            for other_id, other_replica in self.replicas.items():
                if other_id != node_id:
                    other_replica.receive_write(key, value, clock)
        elif self.consistency == 'eventual':
            # Async replication (delayed)
            for other_id, other_replica in self.replicas.items():
                if other_id != node_id:
                    other_replica.pending_writes.append((key, value, clock))
        elif self.consistency == 'causal':
            # Replication respects causal order via vector clocks
            for other_id, other_replica in self.replicas.items():
                if other_id != node_id:
                    other_replica.pending_writes.append((key, value, clock))
        
        self.operation_log.append(op)
        return op
    
    def read(self, key: str, node_id: str = None) -> Operation:
        """Read from the distributed system."""
        if node_id is None:
            node_id = random.choice(list(self.replicas.keys()))
        
        replica = self.replicas[node_id]
        result = replica.local_read(key)
        
        op = Operation('read', key, timestamp=time.time(), node_id=node_id, result=result)
        self.operation_log.append(op)
        return op
    
    def apply_pending_writes(self, probability: float = 0.5):
        """Simulate async replication with random delays."""
        for replica in self.replicas.values():
            remaining = []
            for key, value, clock in replica.pending_writes:
                if random.random() < probability:
                    replica.receive_write(key, value, clock)
                else:
                    remaining.append((key, value, clock))
            replica.pending_writes = remaining
    
    def get_all_values(self, key: str) -> Dict[str, Optional[int]]:
        """Get the value from all replicas."""
        return {node_id: replica.local_read(key) 
                for node_id, replica in self.replicas.items()}

### Demonstrating Consistency Violations

Let's simulate scenarios where eventual consistency leads to stale reads.

In [None]:
def simulate_consistency_violation():
    """Demonstrate how eventual consistency can lead to stale reads."""
    
    print("=" * 60)
    print("EVENTUAL CONSISTENCY SIMULATION")
    print("=" * 60)
    
    system = DistributedSystem(num_replicas=3, consistency='eventual')
    
    # Step 1: Write to node_0
    print("\n1. Client writes X=100 to node_0")
    system.write('X', 100, 'node_0')
    print(f"   Values across replicas: {system.get_all_values('X')}")
    
    # Step 2: Read from different nodes (before replication)
    print("\n2. Client reads X from different nodes (before replication):")
    for node_id in ['node_0', 'node_1', 'node_2']:
        op = system.read('X', node_id)
        status = "‚úì Fresh" if op.result == 100 else "‚úó STALE!"
        print(f"   {node_id}: X = {op.result} {status}")
    
    # Step 3: Apply some replication
    print("\n3. Partial replication occurs...")
    system.apply_pending_writes(probability=0.5)
    print(f"   Values across replicas: {system.get_all_values('X')}")
    
    # Step 4: Full convergence
    print("\n4. Full replication completes...")
    system.apply_pending_writes(probability=1.0)
    print(f"   Values across replicas: {system.get_all_values('X')}")
    print("   All replicas converged! ‚úì")
    
    return system

eventual_system = simulate_consistency_violation()

In [None]:
def simulate_strong_consistency():
    """Demonstrate strong consistency behavior."""
    
    print("=" * 60)
    print("STRONG CONSISTENCY SIMULATION")
    print("=" * 60)
    
    system = DistributedSystem(num_replicas=3, consistency='strong')
    
    # Step 1: Write to node_0
    print("\n1. Client writes X=100 to node_0")
    system.write('X', 100, 'node_0')
    print(f"   Values across replicas: {system.get_all_values('X')}")
    
    # Step 2: Read from different nodes (immediately consistent)
    print("\n2. Client reads X from different nodes (immediately):")
    for node_id in ['node_0', 'node_1', 'node_2']:
        op = system.read('X', node_id)
        print(f"   {node_id}: X = {op.result} ‚úì Fresh")
    
    print("\n   All reads return the latest value immediately!")
    
    return system

strong_system = simulate_strong_consistency()

### Vector Clock Implementation for Causal Consistency

In [None]:
class VectorClock:
    """Implementation of vector clocks for causal ordering."""
    
    def __init__(self, node_id: str, nodes: List[str]):
        self.node_id = node_id
        self.clock = {n: 0 for n in nodes}
    
    def increment(self):
        """Increment own logical time."""
        self.clock[self.node_id] += 1
        return self.copy()
    
    def merge(self, other_clock: Dict[str, int]):
        """Merge with another vector clock (element-wise max)."""
        for node, ts in other_clock.items():
            self.clock[node] = max(self.clock.get(node, 0), ts)
        self.increment()
    
    def copy(self) -> Dict[str, int]:
        return dict(self.clock)
    
    @staticmethod
    def happens_before(vc1: Dict[str, int], vc2: Dict[str, int]) -> bool:
        """Check if vc1 happened before vc2."""
        all_nodes = set(vc1.keys()) | set(vc2.keys())
        less_or_equal = all(vc1.get(n, 0) <= vc2.get(n, 0) for n in all_nodes)
        strictly_less = any(vc1.get(n, 0) < vc2.get(n, 0) for n in all_nodes)
        return less_or_equal and strictly_less
    
    @staticmethod
    def concurrent(vc1: Dict[str, int], vc2: Dict[str, int]) -> bool:
        """Check if two events are concurrent."""
        return not VectorClock.happens_before(vc1, vc2) and \
               not VectorClock.happens_before(vc2, vc1)
    
    def __repr__(self):
        return str(self.clock)

In [None]:
def demonstrate_vector_clocks():
    """Show how vector clocks track causality."""
    
    print("=" * 60)
    print("VECTOR CLOCKS DEMONSTRATION")
    print("=" * 60)
    
    nodes = ['A', 'B', 'C']
    clocks = {n: VectorClock(n, nodes) for n in nodes}
    events = []
    
    # Event 1: A writes
    print("\n1. Node A performs a local write")
    e1 = clocks['A'].increment()
    events.append(('A', 'write X=1', e1.copy()))
    print(f"   A's clock: {clocks['A']}")
    
    # Event 2: A sends to B
    print("\n2. A sends message to B")
    clocks['B'].merge(e1)
    e2 = clocks['B'].copy()
    events.append(('B', 'receive from A', e2.copy()))
    print(f"   B's clock after receiving: {clocks['B']}")
    
    # Event 3: C writes (concurrent with A‚ÜíB)
    print("\n3. Node C performs an independent write (concurrent)")
    e3 = clocks['C'].increment()
    events.append(('C', 'write Y=2', e3.copy()))
    print(f"   C's clock: {clocks['C']}")
    
    # Event 4: B writes
    print("\n4. Node B performs another write")
    e4 = clocks['B'].increment()
    events.append(('B', 'write X=5', e4.copy()))
    print(f"   B's clock: {clocks['B']}")
    
    # Analyze causality
    print("\n" + "=" * 60)
    print("CAUSALITY ANALYSIS")
    print("=" * 60)
    
    print(f"\nEvent 1 (A:write) ‚Üí Event 2 (B:receive): "
          f"{VectorClock.happens_before(e1, e2)}")
    print(f"Event 1 (A:write) ‚Üí Event 4 (B:write):   "
          f"{VectorClock.happens_before(e1, e4)}")
    print(f"Event 3 (C:write) concurrent with Event 2: "
          f"{VectorClock.concurrent(e3, e2)}")
    
    return events

vector_events = demonstrate_vector_clocks()

## Plotly Visualization: Consistency Timeline

Let's visualize how data propagates across replicas under different consistency models.

In [None]:
def create_consistency_timeline():
    """Create a visualization showing consistency model timelines."""
    
    fig = make_subplots(
        rows=2, cols=1,
        subplot_titles=(
            'Strong Consistency: All Replicas Updated Synchronously',
            'Eventual Consistency: Async Propagation with Stale Reads'
        ),
        vertical_spacing=0.15
    )
    
    # Timeline data
    nodes = ['Node A', 'Node B', 'Node C']
    
    # Strong Consistency Timeline
    # Write happens at t=1, all nodes updated immediately
    strong_times = [0, 1, 1, 1, 2, 3, 4, 5]
    strong_values_a = [0, 100, 100, 100, 100, 100, 100, 100]
    strong_values_b = [0, 100, 100, 100, 100, 100, 100, 100]
    strong_values_c = [0, 100, 100, 100, 100, 100, 100, 100]
    
    for i, (node, values, color) in enumerate([
        ('Node A (Primary)', strong_values_a, '#2ecc71'),
        ('Node B', strong_values_b, '#3498db'),
        ('Node C', strong_values_c, '#9b59b6')
    ]):
        fig.add_trace(
            go.Scatter(
                x=strong_times, y=values,
                mode='lines+markers',
                name=node,
                line=dict(color=color, width=3),
                marker=dict(size=10),
                legendgroup='strong'
            ),
            row=1, col=1
        )
    
    # Add write marker for strong consistency
    fig.add_annotation(
        x=1, y=100, text="Write X=100",
        showarrow=True, arrowhead=2,
        row=1, col=1,
        font=dict(color='red', size=12)
    )
    
    # Eventual Consistency Timeline
    # Write at t=1 to Node A, propagates gradually
    eventual_times = [0, 1, 2, 3, 4, 5]
    eventual_values_a = [0, 100, 100, 100, 100, 100]  # Immediate
    eventual_values_b = [0, 0, 0, 100, 100, 100]      # Delayed
    eventual_values_c = [0, 0, 0, 0, 100, 100]        # More delayed
    
    for i, (node, values, color) in enumerate([
        ('Node A (Primary)', eventual_values_a, '#2ecc71'),
        ('Node B (Stale)', eventual_values_b, '#3498db'),
        ('Node C (Stale)', eventual_values_c, '#9b59b6')
    ]):
        fig.add_trace(
            go.Scatter(
                x=eventual_times, y=values,
                mode='lines+markers',
                name=node,
                line=dict(color=color, width=3),
                marker=dict(size=10),
                legendgroup='eventual',
                showlegend=False
            ),
            row=2, col=1
        )
    
    # Add stale read markers
    fig.add_annotation(
        x=1, y=100, text="Write X=100",
        showarrow=True, arrowhead=2,
        row=2, col=1,
        font=dict(color='red', size=12)
    )
    
    # Add stale read zone
    fig.add_vrect(
        x0=1, x1=3,
        fillcolor="rgba(255, 0, 0, 0.1)",
        layer="below",
        line_width=0,
        row=2, col=1,
        annotation_text="Stale Read Window",
        annotation_position="top left"
    )
    
    fig.update_layout(
        height=600,
        title=dict(
            text='Consistency Models: Data Propagation Timeline',
            font=dict(size=20)
        ),
        showlegend=True,
        legend=dict(orientation='h', yanchor='bottom', y=1.02)
    )
    
    fig.update_xaxes(title_text='Time (units)', row=1, col=1)
    fig.update_xaxes(title_text='Time (units)', row=2, col=1)
    fig.update_yaxes(title_text='Value of X', row=1, col=1)
    fig.update_yaxes(title_text='Value of X', row=2, col=1)
    
    return fig

fig_timeline = create_consistency_timeline()
fig_timeline.show()

In [None]:
def create_causal_consistency_diagram():
    """Visualize causal consistency with vector clocks."""
    
    fig = go.Figure()
    
    # Node timelines
    nodes = {'A': 3, 'B': 2, 'C': 1}
    
    # Draw horizontal lines for each node
    for node, y in nodes.items():
        fig.add_trace(go.Scatter(
            x=[0, 10], y=[y, y],
            mode='lines',
            line=dict(color='gray', width=2, dash='dash'),
            showlegend=False,
            hoverinfo='skip'
        ))
        fig.add_annotation(
            x=-0.5, y=y, text=f"Node {node}",
            showarrow=False, font=dict(size=14, color='black')
        )
    
    # Events
    events = [
        (1, 'A', 'Write X=1', '[A:1,B:0,C:0]', '#e74c3c'),
        (3, 'B', 'Recv from A', '[A:1,B:1,C:0]', '#3498db'),
        (2, 'C', 'Write Y=2', '[A:0,B:0,C:1]', '#9b59b6'),
        (5, 'B', 'Write X=5', '[A:1,B:2,C:0]', '#2ecc71'),
        (7, 'C', 'Recv from B', '[A:1,B:2,C:2]', '#f39c12'),
    ]
    
    for x, node, label, clock, color in events:
        y = nodes[node]
        fig.add_trace(go.Scatter(
            x=[x], y=[y],
            mode='markers+text',
            marker=dict(size=20, color=color, symbol='circle'),
            text=[label],
            textposition='top center',
            showlegend=False,
            hovertemplate=f'{label}<br>Vector Clock: {clock}<extra></extra>'
        ))
        fig.add_annotation(
            x=x, y=y-0.15, text=clock,
            showarrow=False, font=dict(size=9, color='gray')
        )
    
    # Causal arrows (messages)
    arrows = [
        (1, 3, 3, 2, 'A‚ÜíB'),  # A sends to B
        (5, 2, 7, 1, 'B‚ÜíC'),  # B sends to C
    ]
    
    for x1, y1, x2, y2, label in arrows:
        fig.add_annotation(
            x=x2, y=y2, ax=x1, ay=y1,
            xref='x', yref='y', axref='x', ayref='y',
            showarrow=True, arrowhead=2, arrowsize=1.5,
            arrowwidth=2, arrowcolor='#34495e'
        )
    
    fig.update_layout(
        title=dict(
            text='Causal Consistency: Vector Clock Propagation',
            font=dict(size=18)
        ),
        xaxis=dict(title='Logical Time', range=[-1, 10], showgrid=True),
        yaxis=dict(range=[0.5, 3.5], showticklabels=False, showgrid=False),
        height=400,
        plot_bgcolor='white'
    )
    
    return fig

fig_causal = create_causal_consistency_diagram()
fig_causal.show()

In [None]:
def create_consistency_comparison_chart():
    """Create a radar chart comparing consistency model characteristics."""
    
    categories = [
        'Data Freshness',
        'Availability',
        'Latency (inverse)',
        'Partition Tolerance',
        'Simplicity'
    ]
    
    fig = go.Figure()
    
    # Strong Consistency
    fig.add_trace(go.Scatterpolar(
        r=[10, 4, 3, 3, 9],
        theta=categories,
        fill='toself',
        name='Strong Consistency',
        line=dict(color='#e74c3c')
    ))
    
    # Eventual Consistency
    fig.add_trace(go.Scatterpolar(
        r=[4, 10, 10, 10, 5],
        theta=categories,
        fill='toself',
        name='Eventual Consistency',
        line=dict(color='#3498db')
    ))
    
    # Causal Consistency
    fig.add_trace(go.Scatterpolar(
        r=[7, 8, 7, 8, 6],
        theta=categories,
        fill='toself',
        name='Causal Consistency',
        line=dict(color='#2ecc71')
    ))
    
    fig.update_layout(
        polar=dict(
            radialaxis=dict(
                visible=True,
                range=[0, 10]
            )
        ),
        title=dict(
            text='Consistency Model Trade-offs',
            font=dict(size=18)
        ),
        showlegend=True,
        height=500
    )
    
    return fig

fig_comparison = create_consistency_comparison_chart()
fig_comparison.show()

## Real-World Examples

| System | Consistency Model | Reason |
|--------|-------------------|--------|
| **PostgreSQL** (single node) | Strong | ACID transactions |
| **Google Spanner** | Strong (external) | TrueTime + 2PC |
| **Amazon DynamoDB** | Eventual (default) | High availability |
| **Apache Cassandra** | Tunable | Configurable per query |
| **MongoDB** | Causal (sessions) | Read-your-writes |
| **CockroachDB** | Serializable | Distributed SQL |

## üìå Key Takeaways

### Summary

1. **Strong Consistency**
   - All reads see the latest write (linearizability)
   - Higher latency due to synchronization
   - Essential for financial systems, inventory

2. **Eventual Consistency**
   - Replicas converge *eventually* if updates stop
   - Low latency, high availability
   - Requires conflict resolution strategies

3. **Causal Consistency**
   - Preserves cause-effect ordering
   - Vector clocks track happens-before relationships
   - Good balance for collaborative applications

### Decision Framework

```
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ  Need ACID guarantees?                                  ‚îÇ
‚îÇ    YES ‚Üí Strong Consistency                             ‚îÇ
‚îÇ    NO  ‚Üì                                                ‚îÇ
‚îÇ  Need cause-effect ordering?                            ‚îÇ
‚îÇ    YES ‚Üí Causal Consistency                             ‚îÇ
‚îÇ    NO  ‚Üì                                                ‚îÇ
‚îÇ  Can tolerate stale reads?                              ‚îÇ
‚îÇ    YES ‚Üí Eventual Consistency                           ‚îÇ
‚îÇ    NO  ‚Üí Consider tunable consistency (Cassandra-style) ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
```

### Best Practices

- **Start with stronger consistency** and relax only when needed
- **Use sessions** for read-your-writes consistency
- **Implement idempotency** for eventual consistency systems
- **Monitor replication lag** to understand convergence times
- **Choose based on business requirements**, not just performance