# Module 4: Reliability & Security Deep Dive

## 🎯 Interactive Lab: Connection Patterns & Security

**Duration:** 45 minutes  
**Level:** Intermediate  

In this lab, you'll:
- 🔐 Implement secure connection patterns
- 🔄 Build retry logic with exponential backoff
- 🛡️ Handle connection failures gracefully
- 📊 Monitor connection health
- ✅ Apply production best practices

---


## Part 1: Setup


In [None]:
!pip install -q redis tenacity

import redis
import time
import random
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type

print('✅ Packages installed!')


---

## Part 2: Connection Patterns

### Basic Connection

❌ **Don't do this in production:**
```python
r = redis.Redis(host='localhost', port=6379)
r.set('key', 'value')  # No error handling!
```

✅ **Do this instead:**
- Connection pooling
- Timeout configuration
- Error handling
- Retry logic


### Connection Pool Pattern


In [None]:
class RedisConnectionManager:
    """Production-ready Redis connection manager"""
    
    def __init__(self, host='localhost', port=6379, max_connections=50):
        # Create connection pool
        self.pool = redis.ConnectionPool(
            host=host,
            port=port,
            max_connections=max_connections,
            socket_connect_timeout=5,
            socket_timeout=5,
            decode_responses=True,
            health_check_interval=30
        )
        
        self.redis_client = redis.Redis(connection_pool=self.pool)
        
        # Stats
        self.success_count = 0
        self.error_count = 0
    
    def get_client(self):
        """Get Redis client from pool"""
        return self.redis_client
    
    def health_check(self):
        """Check if Redis is healthy"""
        try:
            self.redis_client.ping()
            return True
        except Exception as e:
            print(f'❌ Health check failed: {e}')
            return False
    
    def get_stats(self):
        """Get connection pool stats"""
        return {
            'success': self.success_count,
            'errors': self.error_count,
            'success_rate': f'{(self.success_count / (self.success_count + self.error_count) * 100):.1f}%' if self.success_count + self.error_count > 0 else 'N/A'
        }

# Create connection manager
manager = RedisConnectionManager()

# Test connection
if manager.health_check():
    print('✅ Redis connection healthy')
    print(f'   Connection pool created with max 50 connections')
else:
    print('❌ Redis connection failed')


---

## Part 3: Retry Logic with Exponential Backoff

### Why Retry?

Network issues are temporary. Retrying with backoff:
- ✅ Handles transient failures
- ✅ Prevents overwhelming the server
- ✅ Improves reliability

### Exponential Backoff Pattern

```
Attempt 1: Immediate
Attempt 2: Wait 1 second
Attempt 3: Wait 2 seconds
Attempt 4: Wait 4 seconds
Attempt 5: Wait 8 seconds
```


In [None]:
class ResilientRedisClient:
    """Redis client with automatic retry logic"""
    
    def __init__(self, redis_client):
        self.client = redis_client
        self.retry_count = 0
    
    @retry(
        stop=stop_after_attempt(5),
        wait=wait_exponential(multiplier=1, min=1, max=10),
        retry=retry_if_exception_type((redis.ConnectionError, redis.TimeoutError))
    )
    def get_with_retry(self, key):
        """GET with automatic retry"""
        self.retry_count += 1
        return self.client.get(key)
    
    @retry(
        stop=stop_after_attempt(5),
        wait=wait_exponential(multiplier=1, min=1, max=10),
        retry=retry_if_exception_type((redis.ConnectionError, redis.TimeoutError))
    )
    def set_with_retry(self, key, value, ex=None):
        """SET with automatic retry"""
        self.retry_count += 1
        return self.client.set(key, value, ex=ex)

# Create resilient client
r = manager.get_client()
resilient = ResilientRedisClient(r)

# Test retry logic
try:
    resilient.set_with_retry('test:key', 'test:value', ex=60)
    value = resilient.get_with_retry('test:key')
    print('✅ Retry logic working')
    print(f'   Value: {value}')
    print(f'   Retry count: {resilient.retry_count}')
except Exception as e:
    print(f'❌ Failed after retries: {e}')


---

## Part 4: Circuit Breaker Pattern

### What is Circuit Breaker?

Prevents cascading failures by:
1. **Closed**: Normal operation
2. **Open**: Too many failures, stop trying
3. **Half-Open**: Test if service recovered

```
┌─────────┐
│ CLOSED  │ ──[Too many failures]──> ┌──────┐
└─────────┘                           │ OPEN │
     ↑                                └──────┘
     │                                    │
     │                            [Timeout expires]
     │                                    │
     │                                    ↓
[Success] <─────────────────── ┌──────────────┐
                                │  HALF-OPEN   │
                                └──────────────┘
```


In [None]:
class CircuitBreaker:
    """Simple circuit breaker implementation"""
    
    def __init__(self, failure_threshold=3, timeout=30):
        self.failure_threshold = failure_threshold
        self.timeout = timeout
        self.failure_count = 0
        self.last_failure_time = None
        self.state = 'CLOSED'  # CLOSED, OPEN, HALF_OPEN
    
    def call(self, func, *args, **kwargs):
        """Execute function with circuit breaker"""
        if self.state == 'OPEN':
            # Check if timeout expired
            if time.time() - self.last_failure_time > self.timeout:
                self.state = 'HALF_OPEN'
                print('🔄 Circuit breaker: HALF_OPEN (testing)')
            else:
                raise Exception('Circuit breaker is OPEN - service unavailable')
        
        try:
            result = func(*args, **kwargs)
            # Success - reset
            if self.state == 'HALF_OPEN':
                self.state = 'CLOSED'
                self.failure_count = 0
                print('✅ Circuit breaker: CLOSED (recovered)')
            return result
            
        except Exception as e:
            self.failure_count += 1
            self.last_failure_time = time.time()
            
            if self.failure_count >= self.failure_threshold:
                self.state = 'OPEN'
                print(f'🔴 Circuit breaker: OPEN (failures: {self.failure_count})')
            
            raise e
    
    def get_state(self):
        return {
            'state': self.state,
            'failures': self.failure_count,
            'threshold': self.failure_threshold
        }

# Create circuit breaker
breaker = CircuitBreaker(failure_threshold=3, timeout=10)

# Test circuit breaker
def test_operation():
    return r.ping()

try:
    result = breaker.call(test_operation)
    print(f'✅ Operation successful')
    print(f'   Circuit breaker state: {breaker.get_state()}')
except Exception as e:
    print(f'❌ Operation failed: {e}')


---

## Part 5: Connection Monitoring

Let's simulate various scenarios and monitor behavior:


In [None]:
import statistics

def benchmark_with_monitoring(operations=100):
    """Benchmark with health monitoring"""
    latencies = []
    errors = 0
    
    print(f'�� Running {operations} operations with monitoring...')
    
    for i in range(operations):
        try:
            start = time.perf_counter()
            r.set(f'monitor:key:{i}', f'value_{i}', ex=60)
            r.get(f'monitor:key:{i}')
            elapsed = (time.perf_counter() - start) * 1000
            latencies.append(elapsed)
            manager.success_count += 1
        except Exception as e:
            errors += 1
            manager.error_count += 1
    
    if latencies:
        print(f'\n✅ Results:')
        print(f'   Total operations: {operations}')
        print(f'   Successful: {len(latencies)}')
        print(f'   Errors: {errors}')
        print(f'   Success rate: {(len(latencies) / operations * 100):.1f}%')
        print(f'\n⚡ Performance:')
        print(f'   Average latency: {statistics.mean(latencies):.2f} ms')
        print(f'   Median latency: {statistics.median(latencies):.2f} ms')
        print(f'   P95 latency: {sorted(latencies)[int(len(latencies) * 0.95)]:.2f} ms')
    else:
        print(f'❌ All operations failed')

# Run benchmark
benchmark_with_monitoring(100)

# Show connection stats
print(f'\n📊 Connection Pool Stats:')
stats = manager.get_stats()
print(f'   Total success: {stats["success"]}')
print(f'   Total errors: {stats["errors"]}')
print(f'   Success rate: {stats["success_rate"]}')


---

## Part 6: Security Best Practices

### Connection Security Checklist

✅ **Always use TLS/SSL in production**
```python
r = redis.Redis(
    host='your-redis.azure.com',
    port=6380,  # SSL port
    ssl=True,
    ssl_cert_reqs='required'
)
```

✅ **Use Entra ID instead of access keys**
```python
from azure.identity import DefaultAzureCredential

credential = DefaultAzureCredential()
token = credential.get_token('https://redis.azure.com/.default')

r = redis.Redis(
    username=token.token,  # Token as username
    password='',           # Empty password
    ssl=True
)
```

✅ **Set appropriate timeouts**
```python
r = redis.Redis(
    socket_connect_timeout=5,
    socket_timeout=5
)
```

✅ **Use connection pooling**
```python
pool = redis.ConnectionPool(
    max_connections=50,
    health_check_interval=30
)
r = redis.Redis(connection_pool=pool)
```

✅ **Implement retry logic**
```python
@retry(stop=stop_after_attempt(3))
def get_value(key):
    return r.get(key)
```


## Cleanup


In [None]:
# Clean up test keys
keys = r.keys('monitor:*') + r.keys('test:*')
if keys:
    deleted = r.delete(*keys)
    print(f'✅ Cleanup complete: {deleted} keys deleted')
else:
    print('✅ No keys to clean up')


---

## 🎯 Key Takeaways

### ✅ Connection Patterns

1. **Use Connection Pooling**
   - Reuse connections
   - Configure max connections
   - Enable health checks

2. **Implement Retry Logic**
   - Exponential backoff
   - Max retry attempts
   - Handle transient failures

3. **Circuit Breaker**
   - Prevent cascading failures
   - Fail fast when service down
   - Auto-recovery testing

4. **Security**
   - Always use TLS/SSL
   - Prefer Entra ID over keys
   - Set appropriate timeouts
   - Never log credentials

### 🔧 Production Checklist

- ✅ Connection pooling configured
- ✅ Retry logic implemented
- ✅ Circuit breaker for failures
- ✅ TLS/SSL enabled
- ✅ Timeouts configured
- ✅ Health checks enabled
- ✅ Monitoring and logging

---

## 🎉 Excellent Work!

You now know how to build resilient, secure Redis connections!
