# System Design: Scalability Patterns

Essential patterns and strategies for building scalable distributed systems.

## Topics Covered
1. Horizontal vs Vertical Scaling
2. Load Balancing Algorithms
3. Caching Strategies
4. Database Scaling
5. Message Queues & Event-Driven Architecture
6. CQRS & Event Sourcing
7. Rate Limiting Algorithms

---
## 1. Horizontal vs Vertical Scaling

```
VERTICAL SCALING (Scale Up)          HORIZONTAL SCALING (Scale Out)
┌─────────────────────┐              ┌─────────┐ ┌─────────┐ ┌─────────┐
│                     │              │ Server  │ │ Server  │ │ Server  │
│   BIGGER SERVER     │              │   1     │ │   2     │ │   3     │
│   ↑ CPU, RAM, SSD   │              └────┬────┘ └────┬────┘ └────┬────┘
│                     │                   │          │          │
└─────────────────────┘                   └──────────┼──────────┘
                                                     │
                                              ┌──────┴──────┐
                                              │Load Balancer│
                                              └─────────────┘
```

| Aspect | Vertical Scaling | Horizontal Scaling |
|--------|------------------|-------------------|
| **Approach** | Add resources to single server | Add more servers |
| **Limit** | Hardware ceiling | Virtually unlimited |
| **Downtime** | Usually required | Zero downtime possible |
| **Cost** | Expensive at scale | Cost-effective |
| **Complexity** | Simple | Complex (distributed) |
| **Fault Tolerance** | Single point of failure | High availability |
| **Data Consistency** | Easy | Challenging |
| **Use Case** | Databases, legacy apps | Stateless web servers |

---
## 2. Load Balancing Algorithms

```
                    ┌─────────────────────────────────────────┐
                    │           LOAD BALANCER                 │
                    │                                         │
  Clients ──────────►  Algorithms:                           │
                    │  • Round Robin                          │
                    │  • Weighted Round Robin                 │
                    │  • Least Connections                    │
                    │  • IP Hash (Consistent Hashing)         │
                    │  • Random                               │
                    └────────────┬───────────┬───────────┬────┘
                                 │           │           │
                                 ▼           ▼           ▼
                            ┌────────┐ ┌────────┐ ┌────────┐
                            │Server 1│ │Server 2│ │Server 3│
                            └────────┘ └────────┘ └────────┘
```

| Algorithm | How It Works | Best For |
|-----------|--------------|----------|
| **Round Robin** | Sequential distribution | Homogeneous servers |
| **Weighted RR** | Higher weight = more requests | Mixed capacity servers |
| **Least Connections** | Route to server with fewest active connections | Long-lived connections |
| **IP Hash** | Hash client IP to consistent server | Session affinity |
| **Random** | Random server selection | Simple, stateless |

In [None]:
import hashlib
from bisect import bisect_right
from collections import defaultdict

class ConsistentHashing:
    """Consistent hashing for distributed load balancing."""
    
    def __init__(self, replicas: int = 100):
        self.replicas = replicas  # Virtual nodes per server
        self.ring = []            # Sorted hash positions
        self.nodes = {}           # Hash -> server mapping
    
    def _hash(self, key: str) -> int:
        return int(hashlib.md5(key.encode()).hexdigest(), 16)
    
    def add_node(self, node: str):
        """Add a server with virtual nodes."""
        for i in range(self.replicas):
            virtual_key = f"{node}:{i}"
            hash_val = self._hash(virtual_key)
            self.ring.append(hash_val)
            self.nodes[hash_val] = node
        self.ring.sort()
    
    def remove_node(self, node: str):
        """Remove a server and its virtual nodes."""
        for i in range(self.replicas):
            virtual_key = f"{node}:{i}"
            hash_val = self._hash(virtual_key)
            self.ring.remove(hash_val)
            del self.nodes[hash_val]
    
    def get_node(self, key: str) -> str:
        """Get the server responsible for a key."""
        if not self.ring:
            return None
        hash_val = self._hash(key)
        idx = bisect_right(self.ring, hash_val) % len(self.ring)
        return self.nodes[self.ring[idx]]

# Demo
ch = ConsistentHashing(replicas=50)
for server in ["server-1", "server-2", "server-3"]:
    ch.add_node(server)

# Distribution test
distribution = defaultdict(int)
for i in range(1000):
    key = f"user:{i}"
    distribution[ch.get_node(key)] += 1

print("Key Distribution:")
for server, count in sorted(distribution.items()):
    print(f"  {server}: {count} keys ({count/10:.1f}%)")

---
## 3. Caching Strategies

### Cache-Aside (Lazy Loading)
```
┌──────┐    1. Get    ┌───────┐   2. Miss   ┌──────────┐
│Client│ ──────────► │ Cache │ ──────────► │ Database │
└──────┘             └───┬───┘             └────┬─────┘
    ▲                    │                      │
    │     4. Return      │    3. Return Data   │
    └────────────────────┴──────────────────────┘
           (+ Update Cache)
```

### Write-Through
```
┌──────┐   1. Write   ┌───────┐   2. Write   ┌──────────┐
│Client│ ──────────► │ Cache │ ──────────► │ Database │
└──────┘             └───────┘   (sync)     └──────────┘
```

### Write-Behind (Write-Back)
```
┌──────┐   1. Write   ┌───────┐  2. Async   ┌──────────┐
│Client│ ──────────► │ Cache │ ─ ─ ─ ─ ─► │ Database │
└──────┘  (returns)  └───────┘  (batched)  └──────────┘
```

| Strategy | Consistency | Latency | Data Loss Risk | Use Case |
|----------|-------------|---------|----------------|----------|
| **Cache-Aside** | Eventual | Read: Low on hit | Low | Read-heavy workloads |
| **Write-Through** | Strong | Write: Higher | None | Critical data |
| **Write-Behind** | Eventual | Write: Lowest | Possible | High write throughput |
| **Read-Through** | Eventual | Read: Low | Low | Simplified app logic |

### Cache Invalidation Strategies
- **TTL (Time-to-Live)**: Expires after fixed duration
- **Event-based**: Invalidate on data change
- **Version-based**: Cache key includes version number

---
## 4. Database Scaling

### Replication
```
                    ┌──────────────┐
     Writes ──────► │   PRIMARY    │ ◄────── Reads
                    └──────┬───────┘
                           │ Replication
              ┌────────────┼────────────┐
              ▼            ▼            ▼
        ┌──────────┐ ┌──────────┐ ┌──────────┐
        │ REPLICA  │ │ REPLICA  │ │ REPLICA  │ ◄── Reads
        └──────────┘ └──────────┘ └──────────┘
```

### Sharding (Horizontal Partitioning)
```
                    ┌─────────────────┐
      Requests ───► │ Shard Router    │
                    └────────┬────────┘
                             │ Shard Key Routing
              ┌──────────────┼──────────────┐
              ▼              ▼              ▼
        ┌──────────┐   ┌──────────┐   ┌──────────┐
        │ Shard 1  │   │ Shard 2  │   │ Shard 3  │
        │ Users A-H│   │ Users I-P│   │ Users Q-Z│
        └──────────┘   └──────────┘   └──────────┘
```

| Approach | Pros | Cons | Best For |
|----------|------|------|----------|
| **Read Replicas** | Simple, read scaling | Write bottleneck, replication lag | Read-heavy apps |
| **Sharding** | Write scaling, data locality | Complex queries, resharding pain | Large datasets |

### Sharding Strategies
- **Range-based**: Shard by value ranges (e.g., date, ID range)
- **Hash-based**: Consistent hashing on shard key
- **Directory-based**: Lookup table maps keys to shards

---
## 5. Message Queues & Event-Driven Architecture

### Message Queue Pattern
```
┌──────────┐     ┌─────────────────────────┐     ┌──────────┐
│ Producer │────►│     MESSAGE QUEUE       │────►│ Consumer │
└──────────┘     │  ┌───┬───┬───┬───┬───┐  │     └──────────┘
                 │  │msg│msg│msg│msg│msg│  │     ┌──────────┐
┌──────────┐     │  └───┴───┴───┴───┴───┘  │────►│ Consumer │
│ Producer │────►│                         │     └──────────┘
└──────────┘     └─────────────────────────┘
```

### Pub/Sub Pattern
```
┌───────────┐              ┌───────────────┐
│ Publisher │──── Topic ──►│   BROKER      │
└───────────┘    "orders"  │               │
                           │  ┌─────────┐  │     ┌────────────┐
                           │  │Topic:   │──┼────►│Subscriber A│
                           │  │"orders" │  │     └────────────┘
                           │  └─────────┘  │     ┌────────────┐
                           │               │────►│Subscriber B│
                           └───────────────┘     └────────────┘
```

| Feature | Message Queue | Pub/Sub |
|---------|---------------|---------|
| **Delivery** | One consumer per message | All subscribers receive |
| **Use Case** | Task distribution | Event broadcasting |
| **Examples** | RabbitMQ, SQS | Kafka, Redis Pub/Sub |

### Benefits
- **Decoupling**: Services communicate without direct dependencies
- **Buffering**: Handle traffic spikes gracefully
- **Reliability**: Message persistence and retry mechanisms
- **Scalability**: Add consumers independently

---
## 6. CQRS & Event Sourcing

### CQRS (Command Query Responsibility Segregation)
```
                         ┌──────────────────────────────────────┐
                         │              APPLICATION             │
                         └──────────────────────────────────────┘
                                    │              │
              Commands (Write)      │              │       Queries (Read)
                    ┌───────────────┘              └───────────────┐
                    ▼                                              ▼
           ┌─────────────────┐                          ┌─────────────────┐
           │  Command Model  │                          │   Query Model   │
           │  (Write Store)  │ ─────── Sync ──────────► │  (Read Store)   │
           │  Normalized     │                          │  Denormalized   │
           └─────────────────┘                          └─────────────────┘
```

### Event Sourcing
```
┌─────────────────────────────────────────────────────────────────┐
│                      EVENT STORE                                │
│  ┌────────┐  ┌────────┐  ┌────────┐  ┌────────┐  ┌────────┐    │
│  │Created │─►│Updated │─►│Shipped │─►│Delivered│─►│ ...    │   │
│  │Order   │  │Address │  │        │  │         │  │        │   │
│  │t=1     │  │t=2     │  │t=3     │  │t=4      │  │        │   │
│  └────────┘  └────────┘  └────────┘  └────────┘  └────────┘   │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼ Replay Events
                    ┌─────────────────────┐
                    │   Current State     │
                    │   (Materialized)    │
                    └─────────────────────┘
```

| Pattern | Key Idea | Benefits | Challenges |
|---------|----------|----------|------------|
| **CQRS** | Separate read/write models | Optimized reads, scale independently | Eventual consistency |
| **Event Sourcing** | Store events, not state | Full audit trail, temporal queries | Event schema evolution |

---
## 7. Rate Limiting Algorithms

```
┌──────────────────────────────────────────────────────────────────┐
│                    RATE LIMITING ALGORITHMS                      │
├─────────────────┬───────────────────┬────────────────────────────┤
│  Token Bucket   │  Leaky Bucket     │  Sliding Window Log        │
│                 │                   │                            │
│  ┌───────────┐  │  ┌───────────┐    │  [t1][t2][t3]...[tn]      │
│  │● ● ● ●    │  │  │~~~~~~~~~│▼   │  └────── window ────────┘  │
│  │  tokens   │  │  │  queue   │    │                            │
│  └───────────┘  │  └──────────┘    │  Count requests in window  │
│                 │                   │                            │
│  Bursty traffic │  Smooth output    │  Precise, memory heavy     │
└─────────────────┴───────────────────┴────────────────────────────┘
```

In [None]:
import time
from collections import deque
from dataclasses import dataclass, field

@dataclass
class TokenBucket:
    """Token Bucket: Allows burst traffic up to bucket size."""
    capacity: int          # Max tokens
    refill_rate: float     # Tokens per second
    tokens: float = field(init=False)
    last_refill: float = field(init=False)
    
    def __post_init__(self):
        self.tokens = self.capacity
        self.last_refill = time.time()
    
    def _refill(self):
        now = time.time()
        elapsed = now - self.last_refill
        self.tokens = min(self.capacity, self.tokens + elapsed * self.refill_rate)
        self.last_refill = now
    
    def allow_request(self, tokens: int = 1) -> bool:
        self._refill()
        if self.tokens >= tokens:
            self.tokens -= tokens
            return True
        return False

@dataclass
class LeakyBucket:
    """Leaky Bucket: Smooths output rate regardless of input bursts."""
    capacity: int          # Queue size
    leak_rate: float       # Requests processed per second
    queue: deque = field(default_factory=deque)
    last_leak: float = field(init=False)
    
    def __post_init__(self):
        self.last_leak = time.time()
    
    def _leak(self):
        now = time.time()
        elapsed = now - self.last_leak
        leak_count = int(elapsed * self.leak_rate)
        for _ in range(min(leak_count, len(self.queue))):
            self.queue.popleft()
        if leak_count > 0:
            self.last_leak = now
    
    def allow_request(self) -> bool:
        self._leak()
        if len(self.queue) < self.capacity:
            self.queue.append(time.time())
            return True
        return False

@dataclass
class SlidingWindowLog:
    """Sliding Window Log: Precise rate limiting with request timestamps."""
    window_size: float     # Window in seconds
    max_requests: int      # Max requests per window
    timestamps: deque = field(default_factory=deque)
    
    def allow_request(self) -> bool:
        now = time.time()
        window_start = now - self.window_size
        
        # Remove expired timestamps
        while self.timestamps and self.timestamps[0] < window_start:
            self.timestamps.popleft()
        
        if len(self.timestamps) < self.max_requests:
            self.timestamps.append(now)
            return True
        return False

@dataclass 
class SlidingWindowCounter:
    """Sliding Window Counter: Memory-efficient approximation."""
    window_size: float
    max_requests: int
    current_count: int = 0
    previous_count: int = 0
    current_window_start: float = field(init=False)
    
    def __post_init__(self):
        self.current_window_start = time.time()
    
    def allow_request(self) -> bool:
        now = time.time()
        elapsed = now - self.current_window_start
        
        # Rotate windows if needed
        if elapsed >= self.window_size:
            self.previous_count = self.current_count
            self.current_count = 0
            self.current_window_start = now
            elapsed = 0
        
        # Weighted count from previous window
        weight = 1 - (elapsed / self.window_size)
        estimated = self.previous_count * weight + self.current_count
        
        if estimated < self.max_requests:
            self.current_count += 1
            return True
        return False

In [None]:
# Demo: Compare rate limiters
def test_rate_limiter(limiter, name: str, requests: int = 15):
    """Test rate limiter with burst of requests."""
    results = [limiter.allow_request() for _ in range(requests)]
    allowed = sum(results)
    print(f"{name}: {allowed}/{requests} allowed")
    print(f"  Pattern: {''.join(['✓' if r else '✗' for r in results])}")

print("=== Rate Limiter Comparison (10 req/sec limit, burst of 15) ===\n")

test_rate_limiter(TokenBucket(capacity=10, refill_rate=10), "Token Bucket")
test_rate_limiter(LeakyBucket(capacity=10, leak_rate=10), "Leaky Bucket")
test_rate_limiter(SlidingWindowLog(window_size=1.0, max_requests=10), "Sliding Window Log")
test_rate_limiter(SlidingWindowCounter(window_size=1.0, max_requests=10), "Sliding Window Counter")

### Rate Limiting Comparison

| Algorithm | Memory | Precision | Burst Handling | Use Case |
|-----------|--------|-----------|----------------|----------|
| **Token Bucket** | O(1) | Good | Allows bursts | API rate limiting |
| **Leaky Bucket** | O(n) | Good | Smooths bursts | Traffic shaping |
| **Sliding Window Log** | O(n) | Exact | No bursts | Strict limits |
| **Sliding Window Counter** | O(1) | Approximate | Partial bursts | Scalable APIs |

---
## Quick Reference: Scalability Checklist

```
┌──────────────────────────────────────────────────────────────────┐
│                 SCALABILITY DECISION TREE                        │
├──────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌─ Read Heavy? ──► Add Read Replicas + Caching                 │
│  │                                                               │
│  ├─ Write Heavy? ──► Sharding + Write-Behind Cache              │
│  │                                                               │
│  ├─ Spiky Traffic? ──► Auto-scaling + Message Queues            │
│  │                                                               │
│  ├─ Global Users? ──► CDN + Regional Deployments                │
│  │                                                               │
│  └─ Complex Queries? ──► CQRS + Materialized Views              │
│                                                                  │
└──────────────────────────────────────────────────────────────────┘
```

### Key Takeaways
1. **Start simple** - Vertical scaling until you can't
2. **Stateless services** - Enable horizontal scaling
3. **Cache aggressively** - Reduce database load
4. **Async processing** - Decouple with message queues
5. **Measure first** - Profile before optimizing