# Caching & Message Queues

Essential patterns for building scalable, high-performance backend systems.

**Topics Covered:**
- Caching patterns (Cache-Aside, Write-Through)
- Redis vs Memcached comparison
- Cache invalidation strategies
- Message queue patterns
- Kafka vs RabbitMQ comparison

## Caching Patterns

### Cache-Aside (Lazy Loading)
Application manages cache explicitly. Most common pattern.

```
┌─────────────┐      1. Check cache       ┌─────────────┐
│             │ ─────────────────────────▶│             │
│ Application │      2. Cache miss        │    Cache    │
│             │ ◀─────────────────────────│   (Redis)   │
└─────────────┘                           └─────────────┘
       │                                         ▲
       │ 3. Query DB                             │
       ▼                                         │
┌─────────────┐                                  │
│  Database   │ ──────────────────────────────────
└─────────────┘    4. Populate cache with result
```

**Pros:** Only requested data cached, resilient to cache failures  
**Cons:** Cache miss penalty, potential stale data

---

### Write-Through
Data written to cache and database simultaneously.

```
┌─────────────┐      1. Write data        ┌─────────────┐
│             │ ─────────────────────────▶│             │
│ Application │                           │    Cache    │
│             │ ◀─────────────────────────│   (Redis)   │
└─────────────┘      4. Confirm           └─────────────┘
                                                 │
                                   2. Sync write │
                                                 ▼
                                          ┌─────────────┐
                                          │  Database   │
                                          └─────────────┘
                                            3. Confirm
```

**Pros:** Cache always consistent, no stale reads  
**Cons:** Write latency, unused data may be cached

---

### Write-Behind (Write-Back)
Write to cache immediately, async write to DB.

**Pros:** Low write latency, batch DB writes  
**Cons:** Risk of data loss if cache fails before DB sync

## Redis vs Memcached

| Feature | Redis | Memcached |
|---------|-------|----------|
| **Data Structures** | Strings, Lists, Sets, Hashes, Sorted Sets, Streams | Strings only |
| **Persistence** | RDB snapshots, AOF logs | None (pure cache) |
| **Replication** | Master-replica, Redis Cluster | None built-in |
| **Pub/Sub** | ✅ Native support | ❌ |
| **Lua Scripting** | ✅ | ❌ |
| **Memory Efficiency** | Higher overhead | More efficient for simple k/v |
| **Multi-threading** | Single-threaded (I/O threads in 6.0+) | Multi-threaded |
| **Max Key Size** | 512 MB | 250 bytes |
| **Use Case** | Feature-rich caching, sessions, queues | Simple high-throughput caching |

**Rule of Thumb:** Use **Redis** for complex data needs; **Memcached** for simple, high-volume caching.

## Cache Invalidation Strategies

> "There are only two hard things in CS: cache invalidation and naming things." — Phil Karlton

| Strategy | Description | When to Use |
|----------|-------------|-------------|
| **TTL (Time-To-Live)** | Auto-expire after N seconds | General purpose, acceptable staleness |
| **Event-Driven** | Invalidate on write/update events | Real-time consistency needed |
| **Version-Based** | Key includes version; new version = new key | Immutable data, CDN caching |
| **Write-Through** | Update cache on every write | Strong consistency required |
| **Cache Tags** | Group keys by tag, invalidate by tag | Related data (e.g., user's all data) |

```
Event-Driven Invalidation Flow:

┌──────────┐   update    ┌──────────┐   publish    ┌──────────┐
│  Writer  │ ──────────▶ │ Database │ ────────────▶│  Event   │
└──────────┘             └──────────┘              │   Bus    │
                                                   └────┬─────┘
                                                        │
                         ┌──────────────────────────────┼───────────────┐
                         ▼                              ▼               ▼
                   ┌──────────┐                  ┌──────────┐    ┌──────────┐
                   │ Cache 1  │                  │ Cache 2  │    │ Cache N  │
                   │(invalidate)                 │(invalidate)   │(invalidate)
                   └──────────┘                  └──────────┘    └──────────┘
```

## Message Queue Patterns

### Point-to-Point (Work Queue)
Each message consumed by exactly one consumer.

```
┌──────────┐     ┌─────────────────┐     ┌────────────┐
│ Producer │────▶│     Queue       │────▶│ Consumer 1 │
└──────────┘     │  [M1][M2][M3]   │     └────────────┘
                 └─────────────────┘            │
                         │                      │ (round-robin)
                         └─────────────────────▶│
                                          ┌────────────┐
                                          │ Consumer 2 │
                                          └────────────┘
```
**Use Case:** Task distribution, background jobs

---

### Publish-Subscribe (Fan-Out)
Each message delivered to all subscribers.

```
                                          ┌────────────┐
                              ┌──────────▶│Subscriber 1│
┌──────────┐     ┌─────────┐  │           └────────────┘
│Publisher │────▶│  Topic  │──┤
└──────────┘     └─────────┘  │           ┌────────────┐
                              └──────────▶│Subscriber 2│
                                          └────────────┘
```
**Use Case:** Event notifications, real-time updates

---

### Consumer Groups (Scalable Pub/Sub)
Messages partitioned across consumers in a group.

```
                              Group A                    Group B
┌──────────┐     ┌─────────┐  ┌────────────┐           ┌────────────┐
│ Producer │────▶│  Topic  │─▶│ Consumer A1│    ──────▶│ Consumer B1│
└──────────┘     │ P0 P1 P2│  └────────────┘           └────────────┘
                 └─────────┘  ┌────────────┐           ┌────────────┐
                         └───▶│ Consumer A2│    ──────▶│ Consumer B2│
                              └────────────┘           └────────────┘
```
**Use Case:** High-throughput event streaming

## Kafka vs RabbitMQ

| Feature | Apache Kafka | RabbitMQ |
|---------|--------------|----------|
| **Model** | Distributed log (pull-based) | Message broker (push-based) |
| **Message Retention** | Configurable (days/weeks) | Until acknowledged |
| **Ordering** | Per-partition ordering | Per-queue ordering |
| **Throughput** | Very high (millions/sec) | High (tens of thousands/sec) |
| **Replay** | ✅ Consumers can replay | ❌ Once consumed, gone |
| **Routing** | Topic + partitions | Flexible exchanges (direct, fanout, topic, headers) |
| **Protocol** | Custom binary protocol | AMQP, MQTT, STOMP |
| **Consumer Groups** | Native, built-in | Plugin/manual |
| **Latency** | Higher (batching) | Lower (immediate delivery) |
| **Complexity** | Higher (ZooKeeper/KRaft) | Lower (standalone) |

### When to Use

| Use Case | Recommendation |
|----------|---------------|
| Event streaming, analytics | **Kafka** |
| Log aggregation | **Kafka** |
| Task queues, background jobs | **RabbitMQ** |
| Complex routing logic | **RabbitMQ** |
| Need message replay | **Kafka** |
| Low latency required | **RabbitMQ** |
| Microservices events | Either (Kafka for scale) |

In [None]:
# Redis Cache-Aside Pattern Example
import json
from typing import Optional

# pip install redis
# import redis

class CacheAsideExample:
    """Demonstrates cache-aside pattern with Redis."""
    
    def __init__(self, redis_client, db_client):
        self.cache = redis_client
        self.db = db_client
        self.default_ttl = 3600  # 1 hour
    
    def get_user(self, user_id: str) -> Optional[dict]:
        """Get user with cache-aside pattern."""
        cache_key = f"user:{user_id}"
        
        # 1. Try cache first
        cached = self.cache.get(cache_key)
        if cached:
            return json.loads(cached)  # Cache hit
        
        # 2. Cache miss - query database
        user = self.db.find_user(user_id)
        if user is None:
            return None
        
        # 3. Populate cache with TTL
        self.cache.setex(
            cache_key, 
            self.default_ttl, 
            json.dumps(user)
        )
        return user
    
    def update_user(self, user_id: str, data: dict) -> dict:
        """Update user and invalidate cache."""
        # 1. Update database first
        updated = self.db.update_user(user_id, data)
        
        # 2. Invalidate cache (write-invalidate strategy)
        self.cache.delete(f"user:{user_id}")
        
        return updated


# Usage example (pseudo-code)
# redis_client = redis.Redis(host='localhost', port=6379, decode_responses=True)
# cache = CacheAsideExample(redis_client, db_client)
# user = cache.get_user("123")  # First call hits DB, subsequent calls hit cache

print("Cache-Aside Pattern Flow:")
print("  READ:  Check cache → Miss → Query DB → Populate cache → Return")
print("  WRITE: Update DB → Invalidate cache (next read repopulates)")

## Key Takeaways

### Caching
- **Cache-Aside**: Most flexible, app controls cache logic
- **Write-Through**: Strong consistency, higher write latency
- **Redis**: Rich features (data structures, persistence, pub/sub)
- **Memcached**: Simple, fast, multi-threaded
- Always set **TTL** as a safety net for invalidation

### Message Queues
- **Kafka**: Event streaming, high throughput, replay capability
- **RabbitMQ**: Traditional queuing, flexible routing, lower latency
- Use **consumer groups** for horizontal scaling
- Ensure **idempotent consumers** for at-least-once delivery

### Common Pitfalls
| Problem | Solution |
|---------|----------|
| Cache stampede | Use locks or probabilistic early expiration |
| Unbounded queues | Set limits, implement backpressure |
| Stale cache | Combine TTL with event-driven invalidation |
| Message loss | Enable persistence, use acknowledgments |