![Fraud detection image](cover_image.jpg)

🏦 Banks are battling frauds with machine learning models, but changing data patterns can weaken these defenses. London's Poundbank needs your help to figure out why their fraud detection models aren't as accurate anymore.

Poundbank recommends the `nannyml` library for monitoring machine learning models, which is also their tool of choice.

## The data

They have provided you with a reference(test data) and analysis set(production data). A summary and preview are provided below.

## reference.csv and analysis.csv

| Column     | Description              |
|------------|--------------------------|
| `'timestamp'` | Date of the transaction. |
| `'time_since_login_min'` | Time since the user logged in to the app. |
| `'transaction_amount'` | The amount of Pounds(£) that users sent to another account. |
| `'transaction_type'` | Transaction type: <ul><li>`CASH-OUT` - Withdrawing money from an account.</li><li>`PAYMENT` - Transaction where a payment is made to a third party.</li><li>`CASH-IN` - This is the opposite of a cash-out. It involves depositing money into an account.</li><li>`TRANSFER` - Transaction which involves moving funds from one account to another.</li> |
| `'is_first_transaction'` | A binary indicator denoting if the transaction is the user's first (1 for the first transaction, 0 otherwise). |
| `'user_tenure_months'` | The duration in months since the user's account was created or since they became a member. |
| `'is_fraud'` | A binary label indicating whether the transaction is fraudulent (1 for fraud, 0 otherwise). |
| `'predicted_fraud_proba'` | The probability assigned by a detection model indicates the likelihood of a fraudulent transaction. |
| `'predicted_fraud'` |  The predicted classification label is calculated based on predicted fraud probability by the detection model (1 for predicted fraud, 0 otherwise). |

Questions:
1. Identify the months in which the estimated(expected) and realized(actual) accuracy of the model triggers alerts. 
2. Determine the feature that shows the most drift between the reference and analysis sets, thereby impacting the drop in realized accuracy the most.
3. Figure out why the accuracy dropped. Think of a possible explanation.

In [2]:
# Re-run this cell to install nannyml
!pip install nannyml

Defaulting to user installation because normal site-packages is not writeable
Collecting nannyml
  Downloading nannyml-0.12.1-py3-none-any.whl.metadata (23 kB)
Collecting APScheduler<4.0.0,>=3.9.1 (from nannyml)
  Downloading APScheduler-3.10.4-py3-none-any.whl.metadata (5.7 kB)
Collecting FLAML<2.0.0,>=1.0.11 (from nannyml)
  Downloading FLAML-1.2.4-py3-none-any.whl.metadata (12 kB)
Collecting Jinja2<3.1 (from nannyml)
  Downloading Jinja2-3.0.3-py3-none-any.whl.metadata (3.5 kB)
Collecting gcsfs>=2022.5.0 (from nannyml)
  Downloading gcsfs-2024.9.0.post1-py2.py3-none-any.whl.metadata (1.6 kB)
Collecting matplotlib<4.0,>=3.7 (from nannyml)
  Downloading matplotlib-3.7.5-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.metadata (5.7 kB)
Collecting numpy<1.25,>=1.24 (from nannyml)
  Downloading numpy-1.24.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.6 kB)
Collecting pyarrow<15.0.0,>=14.0.0 (from nannyml)
  Downloading pyarrow-14.0.2-cp38-cp38-manylinux_

Defaulting to user installation because normal site-packages is not writeable
Collecting nannyml
  Downloading nannyml-0.12.1-py3-none-any.whl.metadata (23 kB)
Collecting APScheduler<4.0.0,>=3.9.1 (from nannyml)
  Downloading APScheduler-3.10.4-py3-none-any.whl.metadata (5.7 kB)
Collecting FLAML<2.0.0,>=1.0.11 (from nannyml)
  Downloading FLAML-1.2.4-py3-none-any.whl.metadata (12 kB)
Collecting Jinja2<3.1 (from nannyml)
  Downloading Jinja2-3.0.3-py3-none-any.whl.metadata (3.5 kB)
Collecting gcsfs>=2022.5.0 (from nannyml)
  Downloading gcsfs-2024.9.0.post1-py2.py3-none-any.whl.metadata (1.6 kB)
Collecting matplotlib<4.0,>=3.7 (from nannyml)
  Downloading matplotlib-3.7.5-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.metadata (5.7 kB)
Collecting numpy<1.25,>=1.24 (from nannyml)
  Downloading numpy-1.24.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.6 kB)
Collecting pyarrow<15.0.0,>=14.0.0 (from nannyml)
  Downloading pyarrow-14.0.2-cp38-cp38-manylinux_

In [43]:
import pandas as pd
import nannyml as nml
nml.disable_usage_logging()

reference = pd.read_csv("reference.csv")
analysis = pd.read_csv("analysis.csv")
reference.head()

Unnamed: 0,timestamp,time_since_login_min,transaction_amount,transaction_type,is_first_transaction,user_tenure_months,is_fraud,predicted_fraud_proba,predicted_fraud
0,2018-01-01 00:00:00.000,1.56175,3981.1,PAYMENT,False,0.31898,1.0,0.99,1
1,2018-01-01 00:08:43.152,1.658074,1267.9,PAYMENT,False,7.391323,0.0,0.07,0
2,2018-01-01 00:17:26.304,2.454287,1984.7,CASH-IN,False,0.781225,1.0,1.0,1
3,2018-01-01 00:26:09.456,2.392085,2265.2,CASH-OUT,False,0.680473,1.0,0.98,1
4,2018-01-01 00:34:52.608,2.189806,2126.8,CASH-IN,False,8.542895,1.0,0.99,1


In [4]:
analysis.head()

Unnamed: 0,timestamp,time_since_login_min,transaction_amount,transaction_type,is_first_transaction,user_tenure_months,predicted_fraud_proba,predicted_fraud,is_fraud
0,2018-11-01 00:04:52.464,2.174243,2832.3,CASH-OUT,False,1.013445,0.97,1,1
1,2018-11-01 00:13:35.616,2.493543,1426.9,CASH-OUT,False,6.700041,0.09,0,0
2,2018-11-01 00:22:18.768,1.807432,1302.0,PAYMENT,False,6.291723,0.01,0,0
3,2018-11-01 00:31:01.920,2.133415,1432.1,PAYMENT,True,8.165503,0.0,0,0
4,2018-11-01 00:39:45.072,1.987827,1870.3,CASH-OUT,False,8.205203,0.03,0,0


In [44]:
analysis.columns

Index(['timestamp', 'time_since_login_min', 'transaction_amount',
       'transaction_type', 'is_first_transaction', 'user_tenure_months',
       'predicted_fraud_proba', 'predicted_fraud', 'is_fraud'],
      dtype='object')

In [6]:
#analysis = analysis.set_index('timestamp')
#reference = reference.set_index('timestamp')

In [45]:
import nannyml as nml

#using CBPE (Chunk-Based Performance Estimation)

cbpe = nml.CBPE(
    y_pred_proba='predicted_fraud_proba',
    y_pred='predicted_fraud',
    y_true='is_fraud',
    problem_type='classification_binary',
    metrics=['accuracy'],
    chunk_period="m",
    timestamp_column_name = 'timestamp'
)

cbpe.fit(reference)

results = cbpe.estimate(analysis)

performance_metrics = results.filter(period='analysis').to_df()

# Convert the results into a DataFrame, which includes alert information
performance_alerts_df = results.to_df()

print(performance_metrics)
print(performance_alerts_df.columns)
performance_alerts_df

     chunk                          ...        accuracy                       
       key chunk_index start_index  ... upper_threshold lower_threshold  alert
0  2018-11           0           0  ...        0.950808        0.936367  False
1  2018-12           1        4955  ...        0.950808        0.936367  False
2  2019-01           2       10074  ...        0.950808        0.936367  False
3  2019-02           3       15194  ...        0.950808        0.936367  False
4  2019-03           4       19818  ...        0.950808        0.936367  False
5  2019-04           5       24938  ...        0.950808        0.936367   True
6  2019-05           6       29893  ...        0.950808        0.936367   True
7  2019-06           7       35012  ...        0.950808        0.936367   True

[8 rows x 15 columns]
MultiIndex([(   'chunk',                       'key'),
            (   'chunk',               'chunk_index'),
            (   'chunk',               'start_index'),
            (   'chunk

Unnamed: 0_level_0,chunk,chunk,chunk,chunk,chunk,chunk,chunk,accuracy,accuracy,accuracy,accuracy,accuracy,accuracy,accuracy,accuracy
Unnamed: 0_level_1,key,chunk_index,start_index,end_index,start_date,end_date,period,value,sampling_error,realized,upper_confidence_boundary,lower_confidence_boundary,upper_threshold,lower_threshold,alert
0,2018-01,0,0,5119,2018-01-01,2018-01-31 23:59:59.999999999,reference,0.944593,0.003224,0.94707,0.954264,0.934922,0.950808,0.936367,False
1,2018-02,1,5120,9744,2018-02-01,2018-02-28 23:59:59.999999999,reference,0.943064,0.003392,0.939892,0.953239,0.932888,0.950808,0.936367,False
2,2018-03,2,9745,14863,2018-03-01,2018-03-31 23:59:59.999999999,reference,0.942376,0.003224,0.94413,0.952048,0.932704,0.950808,0.936367,False
3,2018-04,3,14864,19818,2018-04-01,2018-04-30 23:59:59.999999999,reference,0.944343,0.003277,0.944299,0.954174,0.934512,0.950808,0.936367,False
4,2018-05,4,19819,24938,2018-05-01,2018-05-31 23:59:59.999999999,reference,0.943746,0.003224,0.938672,0.953417,0.934075,0.950808,0.936367,False
5,2018-06,5,24939,29892,2018-06-01,2018-06-30 23:59:59.999999999,reference,0.943668,0.003277,0.944489,0.9535,0.933836,0.950808,0.936367,False
6,2018-07,6,29893,35012,2018-07-01,2018-07-31 23:59:59.999999999,reference,0.941511,0.003224,0.944727,0.951182,0.93184,0.950808,0.936367,False
7,2018-08,7,35013,40132,2018-08-01,2018-08-31 23:59:59.999999999,reference,0.945088,0.003224,0.944922,0.954759,0.935417,0.950808,0.936367,False
8,2018-09,8,40133,45086,2018-09-01,2018-09-30 23:59:59.999999999,reference,0.94486,0.003277,0.945095,0.954691,0.935028,0.950808,0.936367,False
9,2018-10,9,45087,50206,2018-10-01,2018-10-31 23:59:59.999999999,reference,0.945417,0.003224,0.942578,0.955088,0.935746,0.950808,0.936367,False


In [48]:
# convert 'start_date' column to a datetime format
performance_alerts_df[('chunk', 'start_date')] = pd.to_datetime(performance_alerts_df[('chunk', 'start_date')])

# filter rows where alerts were triggered for drop in accuracy
alert_rows = performance_alerts_df[performance_alerts_df[('accuracy', 'alert')] == True]

# extract the months from chunk start dates where performance alerts were triggered
alert_timestamps = alert_rows[('chunk', 'start_date')].dt.to_period('M').unique()
months_with_performance_alerts = [f"{month.strftime('%B').lower()}_{month.year}" for month in alert_timestamps]

print("Months with performance alerts:", months_with_performance_alerts)

Months with performance alerts: ['april_2019', 'may_2019', 'june_2019']


['april_2019', 'may_2019', 'june_2019'] are the months with performance alerts


To determine identify the feature shows performance drift between the reference and analysis sets, then identify months when the feature experienced a drift. Lastly, i will try to explain why the performance drifted using the analysis.

I will use Kolmogorov-Smirnov and Chi-square test


In [25]:
analysis.head()

Unnamed: 0_level_0,time_since_login_min,transaction_amount,transaction_type,is_first_transaction,user_tenure_months,predicted_fraud_proba,predicted_fraud,is_fraud
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2018-11-01 00:04:52.464,2.174243,2832.3,CASH-OUT,False,1.013445,0.97,1,1
2018-11-01 00:13:35.616,2.493543,1426.9,CASH-OUT,False,6.700041,0.09,0,0
2018-11-01 00:22:18.768,1.807432,1302.0,PAYMENT,False,6.291723,0.01,0,0
2018-11-01 00:31:01.920,2.133415,1432.1,PAYMENT,True,8.165503,0.0,0,0
2018-11-01 00:39:45.072,1.987827,1870.3,CASH-OUT,False,8.205203,0.03,0,0


In [30]:
reference.head()

Unnamed: 0_level_0,time_since_login_min,transaction_amount,transaction_type,is_first_transaction,user_tenure_months,is_fraud,predicted_fraud_proba,predicted_fraud
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2018-01-01 00:00:00.000,1.56175,3981.1,PAYMENT,False,0.31898,1.0,0.99,1
2018-01-01 00:08:43.152,1.658074,1267.9,PAYMENT,False,7.391323,0.0,0.07,0
2018-01-01 00:17:26.304,2.454287,1984.7,CASH-IN,False,0.781225,1.0,1.0,1
2018-01-01 00:26:09.456,2.392085,2265.2,CASH-OUT,False,0.680473,1.0,0.98,1
2018-01-01 00:34:52.608,2.189806,2126.8,CASH-IN,False,8.542895,1.0,0.99,1


In [57]:
features = ['time_since_login_min', 'transaction_amount', 'user_tenure_months', 
             'transaction_type', 'is_first_transaction']

drift_calculator = nml.UnivariateDriftCalculator(
    timestamp_column_name='timestamp',        
    column_names=features,                    
    chunk_period='m'                        
)

# Set the methods for continuous and categorical features
drift_calculator.continuous_methods = ['kolmogorov_smirnov']
drift_calculator.categorical_methods = ['chi-2']

drift_calculator.fit(reference)

drift_results = drift_calculator.calculate(analysis)

drift_metrics = drift_results.filter(period='analysis').to_df()
drift_metrics

Unnamed: 0_level_0,chunk,chunk,chunk,chunk,chunk,chunk,chunk,time_since_login_min,time_since_login_min,time_since_login_min,time_since_login_min,transaction_amount,transaction_amount,transaction_amount,transaction_amount,user_tenure_months,user_tenure_months,user_tenure_months,user_tenure_months,is_first_transaction,is_first_transaction,is_first_transaction,is_first_transaction,transaction_type,transaction_type,transaction_type,transaction_type
Unnamed: 0_level_1,chunk,chunk,chunk,chunk,chunk,chunk,chunk,jensen_shannon,jensen_shannon,jensen_shannon,jensen_shannon,jensen_shannon,jensen_shannon,jensen_shannon,jensen_shannon,jensen_shannon,jensen_shannon,jensen_shannon,jensen_shannon,jensen_shannon,jensen_shannon,jensen_shannon,jensen_shannon,jensen_shannon,jensen_shannon,jensen_shannon,jensen_shannon
Unnamed: 0_level_2,key,chunk_index,start_index,end_index,start_date,end_date,period,value,upper_threshold,lower_threshold,alert,value,upper_threshold,lower_threshold,alert,value,upper_threshold,lower_threshold,alert,value,upper_threshold,lower_threshold,alert,value,upper_threshold,lower_threshold,alert
0,2018-11,0,0,4954,2018-11-01,2018-11-30 23:59:59.999999999,analysis,0.031508,0.041265,,False,0.028494,0.035442,,False,0.028919,0.03962,,False,0.007916,0.013616,,False,0.008498,0.022142,,False
1,2018-12,1,4955,10073,2018-12-01,2018-12-31 23:59:59.999999999,analysis,0.031089,0.041265,,False,0.021842,0.035442,,False,0.037796,0.03962,,False,0.00483,0.013616,,False,0.007979,0.022142,,False
2,2019-01,2,10074,15193,2019-01-01,2019-01-31 23:59:59.999999999,analysis,0.029432,0.041265,,False,0.020695,0.035442,,False,0.034346,0.03962,,False,0.013963,0.013616,,True,0.013707,0.022142,,False
3,2019-02,3,15194,19817,2019-02-01,2019-02-28 23:59:59.999999999,analysis,0.027021,0.041265,,False,0.034514,0.035442,,False,0.034835,0.03962,,False,0.003431,0.013616,,False,0.007627,0.022142,,False
4,2019-03,4,19818,24937,2019-03-01,2019-03-31 23:59:59.999999999,analysis,0.03464,0.041265,,False,0.030221,0.035442,,False,0.026014,0.03962,,False,0.011805,0.013616,,False,0.009726,0.022142,,False
5,2019-04,5,24938,29892,2019-04-01,2019-04-30 23:59:59.999999999,analysis,0.154866,0.041265,,True,0.028772,0.035442,,False,0.029773,0.03962,,False,0.003129,0.013616,,False,0.006702,0.022142,,False
6,2019-05,6,29893,35011,2019-05-01,2019-05-31 23:59:59.999999999,analysis,0.154515,0.041265,,True,0.028701,0.035442,,False,0.025729,0.03962,,False,9.9e-05,0.013616,,False,0.007134,0.022142,,False
7,2019-06,7,35012,39966,2019-06-01,2019-06-30 23:59:59.999999999,analysis,0.155312,0.041265,,True,0.049169,0.035442,,True,0.026995,0.03962,,False,0.010136,0.013616,,False,0.01066,0.022142,,False


In [58]:
drift_metrics.columns

MultiIndex([(               'chunk',          'chunk',             'key'),
            (               'chunk',          'chunk',     'chunk_index'),
            (               'chunk',          'chunk',     'start_index'),
            (               'chunk',          'chunk',       'end_index'),
            (               'chunk',          'chunk',      'start_date'),
            (               'chunk',          'chunk',        'end_date'),
            (               'chunk',          'chunk',          'period'),
            ('time_since_login_min', 'jensen_shannon',           'value'),
            ('time_since_login_min', 'jensen_shannon', 'upper_threshold'),
            ('time_since_login_min', 'jensen_shannon', 'lower_threshold'),
            ('time_since_login_min', 'jensen_shannon',           'alert'),
            (  'transaction_amount', 'jensen_shannon',           'value'),
            (  'transaction_amount', 'jensen_shannon', 'upper_threshold'),
            (  'transacti

In [66]:
# extract features that have triggered an alert from the drift metric table above
alerted_features = []
for feature in ['time_since_login_min', 'transaction_amount', 'user_tenure_months', 
                'is_first_transaction', 'transaction_type']:
    if drift_metrics[(feature, 'jensen_shannon', 'alert')].any():
        alerted_features.append(feature)

max_drift_feature = None
max_drift_value = -float('inf')

for feature in alerted_features:
    drift_value = drift_metrics[(feature, 'jensen_shannon', 'value')].max()
    if drift_value > max_drift_value:
        max_drift_value = drift_value
        max_drift_feature = feature

print("Feature with the most drift:", max_drift_feature)


Feature with the most drift: time_since_login_min


In [65]:
alerted_features

['time_since_login_min', 'transaction_amount', 'is_first_transaction']

In [71]:
import matplotlib.pyplot as plt

#visualizing feature drift
fig = drift_results.plot(kind='drift', period='analysis')
fig

The above plot can be used to determine months where the features had a drift

Analyzing the univariate drift metrics suggests that the model expereinenced a performance drift or a drop in performance in the analysis period due to an increase in period between successive user logins ('time_since_login_min'). The increase between successive logins means that this feature's distribution changed substantially over time. Therefore, this could affect how the model distinguishes between normal and fraudulent activities, especially since the model was trained on data with shorter login interval.

If 'time_since_login_min' is contributing significantly to the models performance, the change in distribution could lead to a higher rate of misclassification.