# Uncertainty analysis in ORES model

In this notebook, we want to analyze how the ORES model, which is inherently a XGBoost, have different levels of uncertainties of its predicitons for different groups of users. We would like to explore if there are correlations between the models' uncertainties, prediction scores, errors and user groups, and how could this potentially be used as a cue for reviewers to determine whether they should trust the model, thus reducing bias. 

### Definition of "Uncertainty"

- In the original ORES model, threshold is set arbitrarily to 0.5. 
- Margin: yi*f(xi)
- Prediction interval
- Mean and std
- Entropy


## Build ORES

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import cross_validate
from sklearn import metrics
from collections import OrderedDict
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, roc_auc_score, balanced_accuracy_score, f1_score, precision_score, recall_score

In [5]:
# read in data
df = pd.read_csv('data/enwiki.labeled_revisions.20k_2015.csv')
df = df.dropna()

In [6]:
# Combine anon and new to a 3-category new feature, anonymous, newcomers, experienced
newcomer_seconds = 3.637819e+06

conditions = [
    (df['feature.revision.user.is_anon'] == True),
    (df['feature.revision.user.is_anon'] == False) & (df['feature.temporal.revision.user.seconds_since_registration'] < newcomer_seconds),
    (df['feature.revision.user.is_anon'] == False) & (df['feature.temporal.revision.user.seconds_since_registration'] >= newcomer_seconds)]
choices = [0,1,2]
df['user.type'] = np.select(conditions, choices)

In [7]:
# add in sample weights
df['sample_weight'] = np.where(df['damaging']==True, 10, 1)

In [8]:
# delete the two sensitive features
df = df.drop(['feature.revision.user.is_anon', 'feature.temporal.revision.user.seconds_since_registration'], axis=1)

In [9]:
# convert user.type to categorical
df['user.type'] = pd.Categorical(df['user.type'])

In [11]:
# divide into X, X_weights and y
y = df["damaging"]
X_with_weights = df.iloc[:,4:].copy()

In [12]:
# split into train and test set
X_with_weights_train, X_with_weights_test, y_train, y_test = train_test_split(X_with_weights, y, test_size=0.3, random_state=42)

In [13]:
# split train with weight to train and weight
X_train = X_with_weights_train.iloc[:,:-1].copy()
X_train_weights = X_with_weights_train.iloc[:,-1].copy()
X_test = X_with_weights_test.iloc[:,:-1].copy()

In [14]:
# parameters from 
#https://github.com/wikimedia/editquality/blob/master/model_info/enwiki.damaging.md
params= {'min_impurity_decrease': 0.0, 
         'loss': 'deviance', 
         'n_estimators': 700, 
         'min_impurity_split': None, 
         'verbose': 0, 
         'criterion': 'friedman_mse', 
         'subsample': 1.0, 
         #'center': True, 
         #'scale': True, 
         'presort': 'auto', 
         'init': None, 
         #'multilabel': False, 
         'max_depth': 7, 
         'random_state': None, 
         'learning_rate': 0.01, 
         'validation_fraction': 0.1, 
         'warm_start': False, 
         'min_samples_split': 2, 
         'min_samples_leaf': 1, 
         'min_weight_fraction_leaf': 0.0, 
         'n_iter_no_change': None, 
         'max_leaf_nodes': None, 
         'tol': 0.0001, 
         'max_features': 'log2'}
         #'labels': [True, False], 
         #'label_weights': OrderedDict([(True, 10)])

In [15]:
# Training
gb_clf_replicate = GradientBoostingClassifier(**params)
gb_clf_replicate.fit(X_train, y_train, sample_weight=X_train_weights)



GradientBoostingClassifier(ccp_alpha=0.0, criterion='friedman_mse', init=None,
                           learning_rate=0.01, loss='deviance', max_depth=7,
                           max_features='log2', max_leaf_nodes=None,
                           min_impurity_decrease=0.0, min_impurity_split=None,
                           min_samples_leaf=1, min_samples_split=2,
                           min_weight_fraction_leaf=0.0, n_estimators=700,
                           n_iter_no_change=None, presort='auto',
                           random_state=None, subsample=1.0, tol=0.0001,
                           validation_fraction=0.1, verbose=0,
                           warm_start=False)

## Margin

In [68]:
true_labels = (y_test.astype(int)-0.5)*2
pred_scores = (gb_clf_replicate.predict_proba(X_test)[:,1]-0.5)*2

In [69]:
margins = pred_labels * pred_scores

In [70]:
margins

array([0.846868  , 0.9850645 , 0.94506745, ..., 0.64613277, 0.99547307,
       0.91356891])

7829     False
12969    False
13978    False
9035     False
11590    False
         ...  
18022    False
12936    False
12576    False
14882    False
1568     False
Name: damaging, Length: 5796, dtype: bool

In [102]:
df_test = pd.DataFrame(columns = ['label', 'pred_label', 'pred_score', 'pred_type', 'user.type', 'margin'])
df_test['label'] = y_test
df_test['pred_label'] = gb_clf_replicate.predict(X_test)
df_test['pred_score'] = gb_clf_replicate.predict_proba(X_test)[:,1]
df_test['user.type'] = X_test.iloc[:,-1].copy().astype(str)
df_test['margin'] = margins

In [103]:
for i, row in df_test.iterrows():
    if (row['label'] == row['pred_label']):
        if (row['label']== True):
            pred_type[i] = "TP"
        else:
            pred_type[i] = "TN"
    else:
        if (row['pred_label'] == True):
            pred_type[i] = "FP"
        else:
            pred_type[i] = "FN"

NameError: name 'label' is not defined

In [None]:
df_test