## Chapter 4: Validity and Efficiency of Conformal Prediction

- What does it mean for a predictor to be valid?
    - A conformal predictor is valid if, for any desired confidence level $1−\alpha$, the proportion of true target values contained within their corresponding prediction intervals is at least $1−\alpha$, on average, across multiple instances.
    - That is, if I set my confidence to be 95%, then 95% of my interval predictions should contain the true value

$$\begin{aligned}
    P(Y \in \text{Interval}(x, \epsilon)) \ge 1 - \epsilon
\end{aligned}$$

- This condition is met so long as your data is IID

### Classifier Calibration

- A well-calibrated model means that, if the model predicts 20% chance of class 1, the long term average outcome is 20%

- We will go through 2 measures of calibration
    - Brier Score
    - Log Loss


- Suppose we have this outcome table

| Day | Forecasted Probability of Rain | Actual Outcome (Rain) |
| --- | --- | --- | 
| 1 | 80% | Yes |
| 2 | 60% | No |
| 3 | 90% | Yes |
| 4 | 30% | No |
| 5 | 70% | Yes |
| 6 | 50% | No |
| 7 | 80% | Yes |
| 8 | 20% | No |
| 9 | 40% | Yes |
| 10 | 60% | Yes |

- Tabulating forecast probability against observed frequencies

| Chance of Rain | Count of Predictions with P(Rain) | Count of Actual Rain | Frequency |
| --- | --- | --- | --- | 
| 20% | 1 | 0 | 0% |
| 30% | 1 | 0 | 0% |
| 40% | 1 | 1 | 100% |
| 50% | 1 | 0 | 0% |
| 60% | 2 | 1 | 50% |
| 70% | 1 | 1 | 100% |
| 80% | 2 | 2 | 100% |
| 90% | 1 | 1 | 100% |


- Brier Score:
$$\begin{aligned}
    \frac{\sum_i \text{outcome} - \text{pred}}{n} &= \frac{(1 - 0.8)^2 + (0 - 0.6)^2 + (1 - 0.9)^2 + ...}{10} \\
    &\approx 0.144
\end{aligned}$$

- Log Loss:
$$\begin{aligned}
    \text{Avg Log Loss} &= \frac{\sum_i -(y_i \log{(p_i)} + (1-y_i) \log{(1-p_i)})}{10} \\
    &= \frac{(1 \log 0.8 + 0 \log(0.2)) + (0 \log 0.6 + 1 \log(0.4)) ...}{10} \\
    &\approx 0.664
\end{aligned}$$

### Classifier Efficiency

- There are a few common ways to measure a conformal prediction's efficiency

- Prediction interval length
    - In regression problems, we can study the size of the returned interval

- Prediction set size
    - In classification problems, we can study how many objects are returned on average

- Coverage probability
    - How often does the true value fall within your prediction set/interval

- P-value histograms
    - Study the distribution of p-values produced. 
    - For better models, you typically have very concentrated p-values (very close to 0 or 1)
    - This implies that the model is very confident that the observation is/is not conforming to the dataset
    - With more confidence, comes narrower prediction bands
    - Whereas more uniform p-values imply wider prediction bands