## Chapter 6: Conformal Prediction for Classification

### Why calibration is important

- You need to calibrate well, or your probability/regression outputs may not be meaningful
    - For example, it won't make sense if 20% of the people who you predict to have 1% chance of cancer actually have cancer
    - Without proper calibration, deciding on cutoffs may be difficult also. If your 50% isn't truly 50%, then what is the point?

### How to evaluate calibration

- There are a few ways to evaluate the calibration of your model?
    - Calibration Plot: Plot frequency of positives against mean predicted probability
    - Calibration Error: Reduce calibration evaluation to a single metric
        - Mean absolute distance between estimated probabilities and observed probabilities
    - Calibration metrics: Some of these include Expected Calibration Error, Log Loss, Brier Score (discussed in Chapter 3/4)
    - Cross validation

### Approaches to Classifier Calibration

#### Histogram Binning

- Not really an "algorithm" per se

- Procedure
    - Divide the predicted probabilities into bins/intervals 
    - For these same intervals, compute the observed proportions
    - Compute the ratio between the predicted and observed proportions
    - Adjust all probabilities within the intervals according to the adjustment ratio

- Some drawbacks 
    - Bins are kind of inflexible, so if miscalibration is non-linear, then you have problems
    - Binning causes information loss, because you are hard coding the boundaries between bins and calibrating only using bin averages
    - Sensitive to thresholds chosen
    - Discontinuous adjustments, even in adjacent bins
    - If your data distribution shifts, the bins may not generalise well to new data

#### Platt Scaling

- Procedure
    - Collect a labeled validation set or a holdout set that is distinct from the training data used to train the classifier
    - Use the classifier to generate the raw output scores or logits for the instances in the validation set
    - Fit a second logistic regression model on the validation set, treating the raw scores as the independent variable and the true class labels as the dependent variable
    - Once the logistic regression model has been trained, it can be used as a calibration function.
        - i.e. Given a new instance, the raw score produced by the classifier is input into the logistic regression model, which transforms it into a calibrated probability estimate.

- Idea is: you fit the predictions of a holdout set (that you know the true labels for) using another model

- Some drawbacks
    - You need a holdout validation set that the first model has not seen
    - Logreg assumption
    - If your holdout set contains some extreme values by chance, you may create a terrible calibration function unknowingly
    - Hard to map this to multiclass setting

#### Isotonic Regression

- Procedure
    - Collect a labeled validation set or a holdout set that is separate from the training data
    - Use the classifier to generate the raw output scores or probabilities for the instances in the validation set
    - Sort the instances in the validation set based on the raw scores
    - Initialize the isotonic regression function as the identity function, where the initial predicted probabilities are equal to the raw scores
    - Iteratively update the isotonic regression function by adjusting the predicted probabilities to minimize the squared differences between the predicted probabilities and the target probabilities. This adjustment is subject to the constraint of non-decreasing probabilities.
    - Repeat the updating process until convergence or a stopping criterion is reached

- Downsides
    - Overfitting possible
    - Sensitive to outliers
    - Limited flexibility
    - No multiclass support
    - Limited probabilistic interpretation