**Before you start:** Click **File â†’ Save a copy in Drive** so you have your own version of this notebook. If you skip this step, your work will not be saved.

# Load pandas and create data

In [None]:
import pandas as pd
import seaborn as sns

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

from sklearn.ensemble import RandomForestClassifier as rfc
from sklearn.model_selection import train_test_split
from sklearn.calibration import calibration_curve
import matplotlib.pyplot as plt



from IPython.display import display, HTML
display(HTML("<style>.container { width:80% !important; }</style>"))

pd.options.display.max_columns = None

In [None]:
def calibration_plot(y_true, y_prob, n_bins=5):
    """
    Create a calibration plot with a 45-degree dashed line.

    Parameters:
        y_true (array-like): True binary labels (0 or 1).
        y_prob (array-like): Predicted probabilities for the positive class.
        n_bins (int): Number of bins to divide the data for calibration.

    Returns:
        None
    """
    # Calculate calibration values
    bin_means, prob_true = calibration_curve(y_true, y_prob, n_bins=n_bins)
    
    # Create the Seaborn plot
    sns.set(style="whitegrid")
    plt.plot([0, y_prob.max()], [0, y_prob.max()], "k--")
    plt.plot(prob_true, bin_means, marker='o', label="Model")
    
    plt.xlabel("Mean Predicted Probability")
    plt.ylabel("Fraction of Positives")
    plt.title("Calibration Plot")
    plt.legend(loc="best")
    plt.show()

In [None]:
pred_universe = pd.read_csv('https://www.dropbox.com/scl/fi/mjb2zlm0q0bvvqdyl7d44/universe_lab10.csv?rlkey=q9oasdhx55nbbf1lutx18eseg&dl=1')
pred_universe.shape
pred_universe.head()

# Columns

Defendant info:
 - arrest_id - Unique ID for an arrest
 - person_id - Unique ID for a person
 - age_at_arrest - Age at arrest
 - group - Defendant's group (a stand-in for a category where we want to measure various definitions of fairness/balance: e.g. race, ethnicity, sex, gender, etc.)
 
Outcomes:
 - y_felony - Did the defendant get rearrested for a felony (serious) crime in the next year?
 - y_low_level - Did the defendant get rearrested for a low-level crime (less serious) in the next year?

Info about the current case:
 - current_charge__felony - Was the current charge a felony?
 - current_charge__nonfelony - Was the current charge a nonfelony?
 - current_charge__violent - Was the current charge for a violent crime?

Features we'll use:
 - arrest_any__last_4y - Number of arrests in last 4 years
 - arrest_any__last_1y - Number of arrests in last 1 years
 - arrest_felony__last_4y - Number of arrests in last 4 years for a felony crime
 - arrest_felony__last_1y - Number of arrests in last 1 years for a felony crime
 - arrest_violent__last_4y - Number of arrests in last 4 years for a violent crime
 - arrest_violent__last_1y - Number of arrests in last 1 years for a violent crime
  

# Base rate for felony rearrest
We use sns.barplot to show the base rate by group for the outcome y_felony. 

We see that group B has a base rate that is almost twice group A's rate

In [None]:
sns.barplot(
    data=pred_universe,
    x='group',
    y='y_felony',
    estimator='mean'
)

We also look at the base rates for "low level" rearrest, which is a rearrest for a crime that was neither a felony or violent.

In [None]:
sns.barplot(
    data=pred_universe,
    x='group',
    y='y_low_level',
    estimator='mean'
)

This time we see that the prevalence is higher for group A

# Lab Task
1. Create a column called `group__A` that is equal to True if group==A and False otherwise

2. Use train_test_split to create a train and a holdout dataset. The holdout set should be 50% of the data in pred_universe

3. Run a random forest model on train with max_depth equal to 5 and n_estimators=100. The prediction outcome is y_felony. The features to use are:
- arrest_any__last_4y
- arrest_any__last_1y
- arrest_felony__last_4y
- arrest_felony__last_1y
- arrest_violent__last_4y 
- arrest_violent__last_1y 

4. Create a column in holdout called `pred_f_all`. This represents felony predictions for a model that uses all of the arrest features.

5. Create a column called `yhat_f_all` which equals True if `pred_f_all` is greater than the outcome rate for `y_felony` in the `train` data frame 

6. Now run the same random forest model except this time use `y_low_level` as the outcome to predict

7. Create a column in holdout called `pred_ll_all`. This represents low level predictions for a model that uses all arrest features

8. Create a column called `yhat_ll_all` which equals True if `pred_ll_all` is greater than the outcome rate for `y_low_level` in the `train` data frame

10. Now repeat steps 3 to 8, except this time use the following predictors and update the column names you create accordingly (e.g. by putting `_group` at the end. For example: `yhat_ll_all` becomes `yhat_ll_all_group`):
- arrest_any__last_4y
- arrest_any__last_1y
- arrest_felony__last_4y
- arrest_felony__last_1y
- arrest_violent__last_4y 
- arrest_violent__last_1y 
- group__A

### Flagging Rates
11. Let's first look at the share of each group that is flagged as high risk for our four models. 

This measure is also considered a fairness metric and is called "Demographic Parity"

The idea is that the same share of defendants by group should be flagged as high risk

So use groupby by group to find these shares

###  Fairness Metrics
12. Use groupby to compute PPV (precision) by group for the model that predicts felony rearrest and uses:
- all arrest features
- all arrest features + the group feature

13. Is there a big difference between the two models? 

13. Use groupby to compute PPV by group for the model that predicts low level rearrest and uses:
- all arrest features
- all arrest features + the group feature

14. Is there a big difference between the two models?


15. Compute the false negative rate by `group` for the models that predicts felony rearrests and uses:
- all arrest features
- all arrest features + the group feature

16. Is there a big difference between the two models? 

17. Let's try to figure out why: Use groupby on group and the outcome to compute the average predicted probabilities produced by the two violent felony models

18. Compute the false negative rate by `group` for the models that predicts low level rearrest and uses:
- all arrest features
- all arrest features + the group feature


19. Is there a big difference between the two models?


15. Compute the false positive rate by `group` for the models that predicts felony rearrests and uses:
- all arrest features
- all arrest features + the group feature

21. Is there a big difference between the two models?  Use the table created in question 17 to reason about what's going on.


16. Compute the false positive rate by `group` for the models that predicts low level rearrests and uses:
- all arrest features
- all arrest features + the group feature

23. Is there a big difference by group?

In [1]:
#No

24. Which one of the four models do you think is the most fair? 
