# Chapter 8: Data Distribution Shifts and Monitoring

## Causes of ML System Failures

- You need to worry about the usual software nonsense (latency, uptime etc)
- You ALSO need to worry about your model outputs (accuracy, human understandability etc)

### Software Failure

- Dependencies can fail
- Deployment can fail
- Hardware can fail
- Downtime of your endpoint

### ML Failure

- Edge cases 
    - something new happens that has never been seen before, like someone borrowing 1 million dollars), leading a failure in retrained model
- Degenerate feedback loop 
    - Predictions influence feedback, which influences the next model run (e.g. selection bias in lending)  
    - How do we detect such cases? 
        - Measure diversity of system outputs (e.g. aggregate diversity / average coverage of long-tail items)
    - How do we correct?
        - Randomisation (sometimes you insert random output into the model to measure performance) 
        - Contextual bandits (intelligently determine what you don't know, then perform causal estimate of unbiased value)

- Data Distribution Shifts: There are 3 main types of shifts to consider. 
    - Definitions
        - For definition's sake, let's call our covariates $X$, and our labels $Y$
        - Our training data is simply a sample from the joint distribution $P(X,Y)$
        - $P(X,Y) = P(Y | X)P(X) = P(X|Y)P(Y)$
    - Covariate Shift
        - This happens when $P(X)$ changes, but $P(Y|X)$ remains the same
    - Concept Drift
        - This happens when $P(Y|X)$ changes, but $P(X)$ remains the same
    - Label Shift
        - This happens when $P(Y)$ changes, but $P(X|Y)$ remains the same

- Definition change: Sometimes you have new categories to predict, needing a whole new model

## Detecting data shifts

- As per the definitions above, to detect shifts, you want to monitor $P(X), P(Y), P(X|Y), P(Y|X)$

### Statistical Methods

- Compare population moments
- Two sample hypothesis test

![tests](./artifacts/8_image.png)

- When working over time, think about measuring statistics over sliding windows

## Dealing with data shifts

- Train with HUGE data, and assume the model has "seen it all"
- Adapt existing model without retraining 
- Retrain model

## Monitoring/Observability

- What metrics do you look at to ensure you don't get deterioration?
    - ML metrics (accuracy, calibration, etc)
    - Predictions (check distribution of predictions)
    - Features (distribution)
    - Raw inputs (distribution)

## Toolbox for Monitoring

- Logs
- Dashboards
- Alerts