# HPA (Horizontal Pod Autoscaler)

The **Horizontal Pod Autoscaler** automatically changes the number of pod replicas in a Deployment/ReplicaSet/StatefulSet based on observed metrics (CPU, memory, or custom metrics).


## Why it is used
- Increase replicas when load increases (better latency/availability).
- Reduce replicas when load decreases (cost efficiency).
- Keep the system near a target (e.g., 70% CPU utilization).


## Requirements (common)
- **metrics-server** installed (for CPU/memory).
- Containers should set **resource requests** (HPA needs them for utilization targets).
- Your app should scale horizontally (stateless or carefully designed state).


## YAML template (autoscaling/v2)
```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: checkout-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: checkout
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
```


## Pseudocode
```text
every N seconds:
  current = read_metric(cpu_utilization)
  desired_replicas = current_replicas * current/target
  scale_workload_to(desired_replicas)
```


## Pitfalls
- No resource requests -> utilization math becomes unreliable.
- Autoscaling can thrash if your app has long warm-up times; tune stabilization windows.
- Scaling stateful systems requires extra care (sometimes better solved with vertical scaling or sharding).

## References
- HPA: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/
