# <span style="font-family: Arial, sans-serif; color:#97f788">xbooster</span>
<span style="font-family: Arial, sans-serif; color:navyblue">Repo: <a href="https://github.com/xRiskLab/xBooster" title="GitHub link">https://github.com/xRiskLab/xBooster</a></span>
## <span style="font-family: Arial, sans-serif; color:navyblue">Scorecard by Intervals</span>

<span style="font-family: Arial, sans-serif; color:navyblue">Author: <a href="https://github.com/jmonteroers" title="GitHub link">https://github.com/jmonteroers</a></span>

This short notebook illustrates how to transform the standard XBooster scorecard into a practical scorecard by intervals of the features. In essence, it combines the different decision rules from weak learners in the XGBoost model into intervals as per the industry standard (Siddiqi, 2017). Lastly, we show how to add points by the Points at Even Odds/Points to Double the Odds method (PEO/PDO) to this latter scorecard.

Please note that for the method to work, the XGBoost model fitted must have maximum depth one (that is, the weak learners must be tree stumps), and currently, points must have been previously calculated using `XGBScorecardConstructor.create_points()` (see below).

### Set-up

The set-up replicates that in the notebook 'Getting started', obtaining a standard XBooster scorecard.

The first step is to load the data.

In [4]:
import pandas as pd
from sklearn.model_selection import train_test_split

# Fetch blended credit data
url = "https://github.com/xRiskLab/xBooster/raw/main/examples/data/credit_data.parquet"
dataset = pd.read_parquet(url)

features = [
    "external_risk_estimate",
    "revolving_utilization_of_unsecured_lines",
    "account_never_delinq_percent",
    "net_fraction_revolving_burden",
    "num_total_cc_accounts",
    "average_months_in_file",
]

target = "is_bad"

X, y = dataset[features], dataset[target]

ix_train, ix_test = train_test_split(X.index, stratify=y, test_size=0.3, random_state=62)

Next, we train the XGBoost model using the train dataset.

In [5]:
import xgboost as xgb

best_params = dict(
    n_estimators=100,
    learning_rate=0.55,
    # NOTE: max_depth=1 is required!
    max_depth=1,
    min_child_weight=10,
    grow_policy="lossguide",
    early_stopping_rounds=5,
)

# Create an XGBoost model
xgb_model = xgb.XGBClassifier(**best_params, random_state=62)
evalset = [
    (X.loc[ix_train], y.loc[ix_train]),
    (X.loc[ix_test], y.loc[ix_test]),
]

# Fit the XGBoost model
xgb_model.fit(
    X.loc[ix_train],
    y.loc[ix_train],
    eval_set=evalset,
    verbose=False,
)

Now, thanks to `xBooster`, constructing a scorecard by decision rules is as simple as:

In [7]:
from xbooster.constructor import XGBScorecardConstructor

# Set up the scorecard constructor
scorecard_constructor = XGBScorecardConstructor(xgb_model, X.loc[ix_train], y.loc[ix_train])

scorecard_by_dr = scorecard_constructor.construct_scorecard()
# Add points
xgb_scorecard_with_points = scorecard_constructor.create_points(
    pdo=50, target_points=600, target_odds=50
)

xgb_scorecard_with_points.head()

Unnamed: 0,Tree,Node,Feature,Sign,Split,Count,CountPct,NonEvents,Events,EventRate,WOE,IV,XAddEvidence,DetailedSplit,Points
0,0,1,account_never_delinq_percent,<,98.0,2810.0,0.401429,2136.0,674.0,0.239858,1.043764,0.65111,0.282129,account_never_delinq_percent < 98,-1
1,0,2,account_never_delinq_percent,>=,98.0,4190.0,0.598571,4164.0,26.0,0.006205,-2.87891,1.795892,-0.635539,account_never_delinq_percent >= 98 or missing,65
2,1,1,revolving_utilization_of_unsecured_lines,<,0.609306,5224.0,0.746286,4986.0,238.0,0.045559,-0.844894,0.381409,-0.438478,revolving_utilization_of_unsecured_lines < 0.6...,63
3,1,2,revolving_utilization_of_unsecured_lines,>=,0.609306,1776.0,0.253714,1314.0,462.0,0.260135,1.151958,0.520027,0.449212,revolving_utilization_of_unsecured_lines >= 0....,-1
4,2,1,external_risk_estimate,<,68.0,1664.0,0.237714,1257.0,407.0,0.244591,1.069555,0.408468,0.469775,external_risk_estimate < 68,-1


Next, we combine decision rules into intervals, leading to a more compact version of a scorecard.

### Scorecard by Intervals

Using the same constructor, we are just one step away to build a compact scorecard grouped by intervals:

In [8]:
xgb_scorecard_by_intervals = scorecard_constructor.construct_scorecard_by_intervals()
xgb_scorecard_by_intervals.head()

Unnamed: 0,Feature,Bin,Left,Right,Points,XAddEvidence,Count,Events,NonEvents,CountPct,WOE,IV
0,account_never_delinq_percent,"(-inf, 70.8000031)",-inf,70.800003,-13,1.545084,237.0,102.0,135.0,0.033857,1.916923,0.039708
1,account_never_delinq_percent,"[70.8000031, 80.5)",70.800003,80.5,3,1.320483,305.0,100.0,205.0,0.043571,1.479385,0.0272
2,account_never_delinq_percent,"[80.5, 87.9000015)",80.5,87.900002,28,0.966175,524.0,139.0,385.0,0.074857,1.178455,0.026998
3,account_never_delinq_percent,"[87.9000015, 93.8000031)",87.900002,93.800003,45,0.723387,817.0,180.0,637.0,0.116714,0.933412,0.024274
4,account_never_delinq_percent,"[93.8000031, 97.0)",93.800003,97.0,83,0.187393,725.0,123.0,602.0,0.103571,0.609151,0.008138


Let us compare the length of each scorecard:

In [9]:
print(f"Length of the scorecard by decision rules: {len(xgb_scorecard_with_points)}")
print(f"Length of the scorecard by intervals: {len(xgb_scorecard_by_intervals)}")

Length of the scorecard by decision rules: 170
Length of the scorecard by intervals: 54


The latter format is more than three times shorter than the original! So now it is really much easier to use and interpret.

Finally, let us deal with the issue of negative points by directly using the PEO/PDO method on this scorecard:

In [10]:
scorecard_constructor.create_points_peo_pdo(peo=600, pdo=50).head()

Unnamed: 0,Feature,Bin,Left,Right,Points,XAddEvidence,Count,Events,NonEvents,CountPct,WOE,IV,Points_PEO_PDO
0,account_never_delinq_percent,"(-inf, 70.8000031)",-inf,70.800003,-13,1.545084,237.0,102.0,135.0,0.033857,1.916923,0.039708,8.0
1,account_never_delinq_percent,"[70.8000031, 80.5)",70.800003,80.5,3,1.320483,305.0,100.0,205.0,0.043571,1.479385,0.0272,24.0
2,account_never_delinq_percent,"[80.5, 87.9000015)",80.5,87.900002,28,0.966175,524.0,139.0,385.0,0.074857,1.178455,0.026998,50.0
3,account_never_delinq_percent,"[87.9000015, 93.8000031)",87.900002,93.800003,45,0.723387,817.0,180.0,637.0,0.116714,0.933412,0.024274,67.0
4,account_never_delinq_percent,"[93.8000031, 97.0)",93.800003,97.0,83,0.187393,725.0,123.0,602.0,0.103571,0.609151,0.008138,106.0


In summary, now we have a compact scorecard with points in a practical scale - only using two additional methods of the `XGBScorecardConstructor` constructor.