# Backend Engineering — Overview

## Purpose
Backend engineering encompasses the server-side logic, data management, and infrastructure that power modern applications. This section covers:

- **API Design** — Building robust, versioned, and well-documented interfaces
- **Data Management** — Storage, modeling, and querying strategies
- **Security** — Authentication, authorization, and defense in depth
- **Scalability** — Handling growth in users, data, and traffic
- **Reliability** — Ensuring uptime through resilience patterns

## Key Questions This Section Answers
1. How do we design APIs that are easy to use and maintain?
2. What are the trade-offs between different database types?
3. How do we secure our services against common attacks?
4. How do we scale horizontally and handle traffic spikes?
5. How do we build observable, debuggable systems?

---

## 1. API Design Principles

### REST Best Practices

| Principle | Description |
|-----------|-------------|
| **Resources** | Use nouns, not verbs (`/users` not `/getUsers`) |
| **HTTP Methods** | GET (read), POST (create), PUT (replace), PATCH (update), DELETE (remove) |
| **Status Codes** | 2xx success, 4xx client error, 5xx server error |
| **Versioning** | `/v1/users` or `Accept: application/vnd.api+json;version=1` |
| **Pagination** | `?page=2&limit=20` or cursor-based for large datasets |

### GraphQL vs REST

| Aspect | REST | GraphQL |
|--------|------|---------|
| Data fetching | Multiple endpoints | Single endpoint |
| Over-fetching | Common | Avoided (client specifies fields) |
| Versioning | URL or header | Schema evolution |
| Caching | HTTP caching | Requires client-side caching |
| Learning curve | Lower | Higher |

In [None]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import numpy as np

# HTTP Status Codes Distribution (typical API)
status_codes = ['2xx Success', '3xx Redirect', '4xx Client Error', '5xx Server Error']
typical_dist = [85, 3, 10, 2]  # Healthy API
unhealthy_dist = [60, 2, 25, 13]  # Problematic API

fig = make_subplots(rows=1, cols=2, specs=[[{'type': 'pie'}, {'type': 'pie'}]],
                    subplot_titles=('Healthy API', 'Problematic API'))

colors = ['#2ecc71', '#3498db', '#f39c12', '#e74c3c']

fig.add_trace(go.Pie(labels=status_codes, values=typical_dist, marker_colors=colors,
                     hole=0.4, name='Healthy'), row=1, col=1)
fig.add_trace(go.Pie(labels=status_codes, values=unhealthy_dist, marker_colors=colors,
                     hole=0.4, name='Unhealthy'), row=1, col=2)

fig.update_layout(title='API Health: HTTP Status Code Distribution', template='plotly_white')
fig

## 2. Communication Patterns

### Synchronous vs Asynchronous

```
Synchronous (Request-Response):
Client ──request──> Server
       <──response──
       
Asynchronous (Message Queue):
Producer ──msg──> Queue ──msg──> Consumer
         <──ack──       <──ack──
```

### When to Use What

| Pattern | Use Case | Examples |
|---------|----------|----------|
| **HTTP REST** | CRUD operations, simple queries | User management, product catalog |
| **WebSockets** | Real-time bidirectional | Chat, gaming, live dashboards |
| **Server-Sent Events** | Server push, one-way | Notifications, live feeds |
| **Message Queues** | Decoupled processing | Order processing, email sending |
| **gRPC** | High-performance, microservices | Internal service communication |

In [None]:
# Latency comparison across communication patterns
patterns = ['HTTP REST', 'gRPC', 'WebSocket', 'Message Queue\n(async)']
latency_p50 = [50, 15, 5, 100]  # milliseconds
latency_p99 = [200, 50, 20, 500]
throughput = [1000, 5000, 10000, 2000]  # requests per second

fig = make_subplots(rows=1, cols=2, subplot_titles=(
    'Latency by Communication Pattern', 'Throughput Comparison'
))

fig.add_trace(go.Bar(name='P50 Latency', x=patterns, y=latency_p50, marker_color='#3498db'), row=1, col=1)
fig.add_trace(go.Bar(name='P99 Latency', x=patterns, y=latency_p99, marker_color='#e74c3c'), row=1, col=1)
fig.add_trace(go.Bar(name='Throughput', x=patterns, y=throughput, marker_color='#2ecc71'), row=1, col=2)

fig.update_layout(title='Communication Pattern Performance Characteristics', template='plotly_white', barmode='group')
fig.update_yaxes(title_text='Latency (ms)', row=1, col=1)
fig.update_yaxes(title_text='Requests/Second', row=1, col=2)
fig

## 3. Data Storage & Modeling

### Database Selection Guide

| Type | Examples | Best For | Trade-offs |
|------|----------|----------|------------|
| **Relational (SQL)** | PostgreSQL, MySQL | ACID transactions, complex queries | Scaling writes is hard |
| **Document** | MongoDB, DynamoDB | Flexible schema, JSON data | Limited joins |
| **Key-Value** | Redis, Memcached | Caching, sessions | Limited query capability |
| **Column-Family** | Cassandra, HBase | Time-series, high write volume | Complex data modeling |
| **Graph** | Neo4j, Neptune | Relationships, social networks | Scaling challenges |

### CAP Theorem
In a distributed system, you can only guarantee 2 of 3:
- **C**onsistency — Every read receives the most recent write
- **A**vailability — Every request receives a response
- **P**artition tolerance — System continues despite network failures

Most systems choose **AP** (available, partition-tolerant) or **CP** (consistent, partition-tolerant).

In [None]:
# CAP Theorem visualization
import plotly.graph_objects as go

fig = go.Figure()

# Triangle vertices
fig.add_trace(go.Scatter(
    x=[0, 1, 0.5, 0], y=[0, 0, 0.866, 0],
    mode='lines', line=dict(color='#34495e', width=3),
    name='CAP Triangle'
))

# Labels
fig.add_annotation(x=0, y=-0.1, text="<b>Consistency</b>", showarrow=False, font=dict(size=14))
fig.add_annotation(x=1, y=-0.1, text="<b>Availability</b>", showarrow=False, font=dict(size=14))
fig.add_annotation(x=0.5, y=0.95, text="<b>Partition Tolerance</b>", showarrow=False, font=dict(size=14))

# Database positions
databases = {
    'PostgreSQL': (0.15, 0.15),
    'MongoDB': (0.75, 0.35),
    'Cassandra': (0.65, 0.55),
    'Redis': (0.85, 0.1),
    'DynamoDB': (0.55, 0.45),
}

for db, (x, y) in databases.items():
    fig.add_trace(go.Scatter(
        x=[x], y=[y], mode='markers+text',
        marker=dict(size=12), text=[db], textposition='top center',
        name=db
    ))

fig.update_layout(
    title='CAP Theorem: Database Trade-offs',
    xaxis=dict(showgrid=False, zeroline=False, showticklabels=False, range=[-0.2, 1.2]),
    yaxis=dict(showgrid=False, zeroline=False, showticklabels=False, scaleanchor='x', range=[-0.3, 1.1]),
    template='plotly_white',
    showlegend=False
)
fig

## 4. Authentication & Security

### Authentication Methods

| Method | How It Works | Use Case |
|--------|--------------|----------|
| **Session-based** | Server stores session, client sends cookie | Traditional web apps |
| **JWT (Token-based)** | Signed token with claims | Stateless APIs, SPAs |
| **OAuth 2.0** | Delegated authorization | Third-party access |
| **API Keys** | Static key per client | Machine-to-machine |
| **mTLS** | Mutual certificate verification | Zero-trust, service mesh |

### Security Layers (Defense in Depth)

```
┌─────────────────────────────────────────┐
│  WAF / DDoS Protection                  │
├─────────────────────────────────────────┤
│  Load Balancer / TLS Termination        │
├─────────────────────────────────────────┤
│  API Gateway (Rate Limiting, Auth)      │
├─────────────────────────────────────────┤
│  Application (Input Validation)         │
├─────────────────────────────────────────┤
│  Database (Encryption at Rest)          │
└─────────────────────────────────────────┘
```

In [None]:
# Security threats visualization
threats = ['SQL Injection', 'XSS', 'CSRF', 'Broken Auth', 'Sensitive Data Exposure', 
           'XXE', 'Broken Access Control', 'Security Misconfig', 'Insecure Deserialization', 'Insufficient Logging']
severity = [9.5, 8.0, 7.5, 9.0, 8.5, 7.0, 9.2, 6.5, 8.8, 6.0]
prevalence = [7, 8, 6, 9, 8, 4, 9, 10, 5, 7]

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=prevalence, y=severity, mode='markers+text',
    marker=dict(size=[s*4 for s in severity], color=severity, colorscale='Reds', showscale=True,
                colorbar=dict(title='Severity')),
    text=threats, textposition='top center', textfont=dict(size=10),
    hovertemplate='<b>%{text}</b><br>Prevalence: %{x}<br>Severity: %{y}<extra></extra>'
))

fig.update_layout(
    title='OWASP Top 10: Severity vs Prevalence',
    xaxis_title='Prevalence (how common)',
    yaxis_title='Severity (impact)',
    template='plotly_white',
    xaxis=dict(range=[3, 11]),
    yaxis=dict(range=[5, 10])
)
fig

## 5. Caching & Performance

### Caching Strategies

| Strategy | Description | Use Case |
|----------|-------------|----------|
| **Cache-Aside** | App checks cache, loads from DB on miss | General purpose |
| **Read-Through** | Cache loads from DB automatically | Simplified logic |
| **Write-Through** | Write to cache and DB together | Strong consistency |
| **Write-Behind** | Write to cache, async DB update | High write throughput |
| **Refresh-Ahead** | Proactively refresh before expiry | Predictable access patterns |

### Cache Hierarchy

```
Client Cache (Browser)     ~1ms
       ↓
CDN Edge Cache            ~10ms  
       ↓
Application Cache (Redis)  ~1-5ms
       ↓
Database                   ~10-100ms
```

In [None]:
# Cache hit rate impact on latency
cache_hit_rates = np.linspace(0, 1, 100)
cache_latency = 2  # ms
db_latency = 50  # ms

avg_latency = cache_hit_rates * cache_latency + (1 - cache_hit_rates) * db_latency
throughput_multiplier = db_latency / avg_latency

fig = make_subplots(rows=1, cols=2, subplot_titles=(
    'Average Latency vs Cache Hit Rate', 'Throughput Improvement'
))

fig.add_trace(go.Scatter(x=cache_hit_rates*100, y=avg_latency, mode='lines',
                         line=dict(color='#3498db', width=2), name='Avg Latency'), row=1, col=1)
fig.add_trace(go.Scatter(x=cache_hit_rates*100, y=throughput_multiplier, mode='lines',
                         line=dict(color='#2ecc71', width=2), name='Throughput Multiplier'), row=1, col=2)

fig.add_hline(y=db_latency, line_dash='dash', line_color='red', row=1, col=1,
              annotation_text='DB Only')
fig.add_hline(y=cache_latency, line_dash='dash', line_color='green', row=1, col=1,
              annotation_text='Cache Only')

fig.update_layout(title='The Power of Caching', template='plotly_white')
fig.update_xaxes(title_text='Cache Hit Rate (%)', row=1, col=1)
fig.update_xaxes(title_text='Cache Hit Rate (%)', row=1, col=2)
fig.update_yaxes(title_text='Latency (ms)', row=1, col=1)
fig.update_yaxes(title_text='Throughput Multiplier', row=1, col=2)
fig.show()

print(f"At 90% cache hit rate: {avg_latency[90]:.1f}ms avg latency, {throughput_multiplier[90]:.1f}x throughput")
print(f"At 99% cache hit rate: {avg_latency[99]:.1f}ms avg latency, {throughput_multiplier[99]:.1f}x throughput")

## 6. Resilience Patterns

### Circuit Breaker
Prevents cascading failures by "breaking" when a service fails repeatedly:

```
CLOSED ──(failures > threshold)──> OPEN
   ↑                                  │
   └──(success)── HALF-OPEN ←─(timeout)┘
```

### Key Resilience Patterns

| Pattern | Purpose | Implementation |
|---------|---------|----------------|
| **Retry** | Handle transient failures | Exponential backoff with jitter |
| **Circuit Breaker** | Prevent cascade failures | Trip after N failures |
| **Bulkhead** | Isolate failures | Separate thread pools |
| **Timeout** | Bound waiting time | Set reasonable limits |
| **Fallback** | Graceful degradation | Return cached/default data |

In [None]:
# Retry with exponential backoff visualization
max_retries = 5
base_delay = 1  # second

attempts = list(range(max_retries + 1))
delays_no_backoff = [base_delay] * (max_retries + 1)
delays_exponential = [base_delay * (2 ** i) for i in range(max_retries + 1)]
delays_exp_jitter = [base_delay * (2 ** i) * (0.5 + np.random.random() * 0.5) for i in range(max_retries + 1)]

cumulative_no_backoff = np.cumsum(delays_no_backoff)
cumulative_exponential = np.cumsum(delays_exponential)

fig = make_subplots(rows=1, cols=2, subplot_titles=(
    'Retry Delay by Attempt', 'Cumulative Wait Time'
))

fig.add_trace(go.Bar(name='No Backoff', x=attempts, y=delays_no_backoff, marker_color='#e74c3c'), row=1, col=1)
fig.add_trace(go.Bar(name='Exponential', x=attempts, y=delays_exponential, marker_color='#3498db'), row=1, col=1)

fig.add_trace(go.Scatter(name='No Backoff', x=attempts, y=cumulative_no_backoff, 
                         mode='lines+markers', line=dict(color='#e74c3c')), row=1, col=2)
fig.add_trace(go.Scatter(name='Exponential', x=attempts, y=cumulative_exponential,
                         mode='lines+markers', line=dict(color='#3498db')), row=1, col=2)

fig.update_layout(title='Retry Strategies: Exponential Backoff', template='plotly_white')
fig.update_xaxes(title_text='Retry Attempt', row=1, col=1)
fig.update_xaxes(title_text='Retry Attempt', row=1, col=2)
fig.update_yaxes(title_text='Delay (seconds)', row=1, col=1)
fig.update_yaxes(title_text='Total Wait (seconds)', row=1, col=2)
fig

## 7. Observability

### The Three Pillars

| Pillar | What | Tools |
|--------|------|-------|
| **Logs** | Discrete events | ELK Stack, Splunk, CloudWatch |
| **Metrics** | Aggregated measurements | Prometheus, Datadog, CloudWatch |
| **Traces** | Request flow across services | Jaeger, Zipkin, X-Ray |

### Key Metrics (RED & USE)

**RED Method** (for services):
- **R**ate — requests per second
- **E**rrors — failed requests per second
- **D**uration — response time distribution

**USE Method** (for resources):
- **U**tilization — % resource busy
- **S**aturation — queue depth, work waiting
- **E**rrors — error count

---

## Core Topics in This Section

| Topic | Description |
|-------|-------------|
| **API Design** | REST, GraphQL, versioning, documentation |
| **Communication** | HTTP, WebSockets, gRPC, message queues |
| **Auth & Security** | JWT, OAuth, encryption, OWASP |
| **Data Storage** | SQL, NoSQL, CAP theorem, modeling |
| **Caching** | Redis, CDN, strategies, invalidation |
| **Resilience** | Circuit breaker, retry, bulkhead |
| **Observability** | Logs, metrics, traces, alerting |
| **Testing** | Unit, integration, load, chaos |

## Key Takeaways

1. **Design APIs for consumers** — Good APIs are intuitive and well-documented
2. **Choose the right database** — Match the data model to the access patterns
3. **Security is not optional** — Apply defense in depth at every layer
4. **Cache aggressively** — Even 90% cache hit rate dramatically improves performance
5. **Plan for failure** — Use resilience patterns to handle inevitable failures
6. **Observe everything** — You can't fix what you can't measure

## References
- Kleppmann, M. *Designing Data-Intensive Applications*
- Newman, S. *Building Microservices*
- Richardson, C. *Microservices Patterns*