In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sqlite3
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")

# Key Banking KPIs in Python

* In this lesson, we focus on analyzing key banking KPIs using Python.
* KPIs (Key Performance Indicators) help banks measure important business health metrics.
* You will learn how to compute and interpret essential KPIs from various synthetic banking datasets.
* By the end, you will be able to calculate, explain, and use banking KPIs with code and data.

* No prior experience in finance is needed, but familiarity with basic Python and dataframes will help.

# Understanding Banking Data Sources for KPIs

- Banking KPIs are often derived from real transaction, customer, and account data.
- Transaction tables track every financial move, including debits, credits, and channels used.
- Customers are grouped by segments and regions, which helps discover performance across groups.
- Account tables connect customers to specific account products and opening periods.
- Beginners often overlook the need to join or merge these datasets to get complete KPI insight.

In [2]:
# Example 1: Create synthetic banking transactions dataset
np.random.seed(42)
n_transactions = 1000
n_customers = 200
df = pd.DataFrame({
    'transaction_id': range(1, n_transactions + 1),
    'customer_id': np.random.choice([f'CUST_{i:04d}' for i in range(1, n_customers + 1)], n_transactions),
    'amount': np.round(np.random.normal(150, 60, n_transactions), 2),
    'transaction_type': np.random.choice(['Debit', 'Credit'], n_transactions),
    'channel': np.random.choice(['ATM', 'Online', 'Branch', 'POS'], n_transactions),
    'date': pd.date_range(start='2024-01-01', periods=n_transactions, freq='h')
})
print(df.shape)
print(df.head(3))

(1000, 6)
   transaction_id customer_id  amount transaction_type channel  \
0               1   CUST_0103  238.77           Credit     ATM   
1               2   CUST_0180  269.17           Credit     POS   
2               3   CUST_0093   58.62           Credit  Online   

                 date  
0 2024-01-01 00:00:00  
1 2024-01-01 01:00:00  
2 2024-01-01 02:00:00  


In [3]:
# Example 2: Create a simple customers table
customer_ids = [f'CUST_{i:04d}' for i in range(1, 201)]
customers = pd.DataFrame({
    'customer_id': customer_ids,
    'segment': ['Retail'] * 150 + ['Business'] * 50,
    'region': ['Metro'] * 100 + ['Regional'] * 100
})
print(customers.shape)
print(customers.head(3))

(200, 3)
  customer_id segment region
0   CUST_0001  Retail  Metro
1   CUST_0002  Retail  Metro
2   CUST_0003  Retail  Metro


In [11]:
# Example 3: Create a synthetic account types table-200 bank accounts, each linked to a customer ID and three key 
#columns: account_type, open_date, and account_id.

np.random.seed(42)
accounts = pd.DataFrame({
    'account_id': [f'ACC_{i:05d}' for i in range(1, 201)],
    'customer_id': customer_ids,
    'account_type': np.random.choice(['Savings', 'Cheque', 'Credit'], size=200),
    'open_date': pd.date_range(start='2015-01-01', periods=200, freq='30D')
})
print(accounts.shape)
print(accounts.head(3))

(200, 4)
  account_id customer_id account_type  open_date
0  ACC_00001   CUST_0001       Credit 2015-01-01
1  ACC_00002   CUST_0002      Savings 2015-01-31
2  ACC_00003   CUST_0003       Credit 2015-03-02


# The code intent is to prepare a structured customer dataset to link with transactions and account for comprehensive KPI's insight

# Beginner Example 1: Number of Transactions KPI

- One classic banking KPI is the total number of transactions in a period.
- This measures customer engagement and fee-generating activity.
- It is a basic foundation for more advanced metrics.

In [4]:
num_txns = df.shape[0]
print('Total number of transactions:', num_txns)


Total number of transactions: 1000


# Beginner Example 2: Total Transaction Volume (Amount)

- The sum of all money moved is a key bank volume metric.
- This helps banks understand financial throughput and liquidity.

In [5]:
total_volume = df['amount'].sum()
print('Total transaction volume:', total_volume)

Total transaction volume: 151702.08000000002


# Beginner Example 3: Transactions By Channel

- Looking at KPIs by channel shows digital vs in-person trends.
- Channel analysis is important for understanding customer behavior.


In [6]:
txn_by_channel = df['channel'].value_counts()
print('Transactions by channel:')
print(txn_by_channel)

Transactions by channel:
channel
ATM       266
POS       250
Branch    248
Online    236
Name: count, dtype: int64


# Intermediate Example 1: Unique Customers With Transactions

- Not every customer is active. Active customer count is a vital KPI.
- This helps banks track engagement and churn.

In [7]:
active_customers = df['customer_id'].nunique()
print('Number of active customers:', active_customers)

Number of active customers: 198


In [12]:
customers_with_accounts = customers.merge(accounts, on='customer_id', how='left')
no_txn_customers = customers_with_accounts[~customers_with_accounts['customer_id'].isin(df['customer_id'])]
print('Customers with accounts but no transactions:', len(no_txn_customers))

Customers with accounts but no transactions: 2


# Intermediate Example 2: Average Transaction Size

- The average size of transactions provides a sense of account holder behavior.
- This is important for risk, fraud, and marketing analysis.

In [13]:
avg_amount = df['amount'].mean()
print('Average transaction amount:', round(avg_amount, 2))

Average transaction amount: 151.7


In [14]:
avg_by_segment = df.merge(customers, on='customer_id').groupby('segment')['amount'].mean()
print('Average transaction by segment:')
print(avg_by_segment)

Average transaction by segment:
segment
Business    147.577040
Retail      153.077093
Name: amount, dtype: float64


# Intermediate Example 3: Transactions Over Time

- Monitoring transaction patterns over days, weeks or months reveals trends.
- This is useful for planning, compliance, and identifying unusual behavior.

In [15]:
df['date_only'] = df['date'].dt.date
txns_per_day = df.groupby('date_only')['transaction_id'].count()
print(txns_per_day.head())

date_only
2024-01-01    24
2024-01-02    24
2024-01-03    24
2024-01-04    24
2024-01-05    24
Name: transaction_id, dtype: int64


# Advanced Example 1: KPI - Customer Lifetime Value (CLV)

- CLV estimates the revenue a customer brings in their bank relationship.
- Knowing CLV helps banks prioritize retention and marketing.

In [16]:
clv = df.groupby('customer_id')['amount'].sum().sort_values(ascending=False)
print('Top 5 Customer Lifetime Values:')
print(clv.head(5))

Top 5 Customer Lifetime Values:
customer_id
CUST_0099    1999.10
CUST_0147    1879.19
CUST_0190    1826.46
CUST_0145    1780.30
CUST_0090    1741.44
Name: amount, dtype: float64


# Advanced Example 2: KPI - Product Holding Rate

- Product holding rate shows average number of accounts per customer.
- It highlights cross-sell opportunities and customer relationship depth.

In [17]:
accounts_per_customer = accounts.groupby('customer_id')['account_id'].count()
avg_accounts_per_customer = accounts_per_customer.mean()
print('Average number of accounts per customer:', round(avg_accounts_per_customer, 2))


Average number of accounts per customer: 1.0


In [18]:
high_product_customers = accounts_per_customer[accounts_per_customer > 2]
print('Customers with more than 2 accounts:', len(high_product_customers))

Customers with more than 2 accounts: 0


# Advanced Example 3: KPI - Active Product Penetration

- Active product penetration looks at product usage instead of only holding.
- It shows how many products are actively used (with transactions) per customer.


In [19]:
active_accounts = df.merge(accounts, on='customer_id')[['customer_id','account_id']].drop_duplicates()
active_prods_per_customer = active_accounts.groupby('customer_id')['account_id'].count()
print('Average actively used products per customer:', round(active_prods_per_customer.mean(),2))


Average actively used products per customer: 1.0



# Error Handling: Dealing With Missing Data

- In real bank datasets, missing values can break KPI calculations.
- Let us simulate and handle missing data in transaction amounts.

In [20]:
df_missing = df.copy()
df_missing.loc[np.random.choice(df.index, 20, replace=False), 'amount'] = np.nan
print(df_missing['amount'].isnull().sum(), 'missing values introduced.')
print('Mean with missing:', df_missing['amount'].mean())
print('Mean after fill:', df_missing['amount'].fillna(0).mean())

20 missing values introduced.
Mean with missing: 151.56597959183674
Mean after fill: 148.53466


# Debugging Example: Tracking Outlier KPIs

- Unrealistic amounts or sudden spikes in KPI measures may indicate data issues or fraud.
- Let us learn to debug by tracking unusually large transaction amounts.

In [21]:
outliers = df[df['amount'] > df['amount'].mean() + 3*df['amount'].std()]
print('Unusual transactions found:', outliers.shape[0])
print(outliers[['transaction_id', 'amount']].head())

Unusual transactions found: 0
Empty DataFrame
Columns: [transaction_id, amount]
Index: []


# Best Practice: Modular KPI Calculation Functions

- Turning KPI calculations into functions keeps your code clean and reusable.
- This is important for team projects and updating metric definitions in the future.

In [22]:
def kpi_total_transactions(df, txn_type=None):
    if txn_type:
        return df[df['transaction_type'] == txn_type].shape[0]
    return df.shape[0]

print('KPIs:')
print('Total:', kpi_total_transactions(df))
print('Debit:', kpi_total_transactions(df, 'Debit'))

KPIs:
Total: 1000
Debit: 497


# Common Pattern: KPI Aggregation By Group

- Most advanced KPIs use groupby for segmentation (e.g. by region, by product).
- This lets banks benchmark regions, branches, or segments against each other.

In [23]:
seg_kpi = df.merge(customers, on='customer_id').groupby(['region','segment'])['amount'].sum()
print('Transaction Volume by Region & Segment:')
print(seg_kpi)

Transaction Volume by Region & Segment:
region    segment 
Metro     Retail      73550.95
Regional  Business    36894.26
          Retail      41256.87
Name: amount, dtype: float64


# End-to-End KPI Example: New Product Launch Performance

- Suppose the bank just launched a new product, and wants to track its uptake and use.
- Let us add a 'new savings' account, assign to 25 customers, and track its KPIs end-to-end.

In [24]:
np.random.seed(42)
launch_customers = np.random.choice(customer_ids, 25, replace=False)
new_accounts = pd.DataFrame({
    'account_id': [f"ACC_99{i:02d}" for i in range(1, 26)],
    'customer_id': launch_customers,
    'account_type': ["New_Savings"]*25,
    'open_date': pd.to_datetime('2024-02-01')
})

accounts_full = pd.concat([accounts, new_accounts], ignore_index=True)
new_txns = pd.DataFrame({
    'transaction_id': np.arange(2001,2026),
    'customer_id': launch_customers,
    'amount': np.random.normal(200, 30, 25),
    'transaction_type': ["Credit"]*25,
    'channel': ["Online"]*25,
    'date': pd.date_range(start='2024-02-01', periods=25, freq='D')
})

df_full = pd.concat([df, new_txns], ignore_index=True)

uptake = accounts_full[accounts_full['account_type'] == 'New_Savings'].shape[0]
active = new_txns.shape[0]
avg_txn_new = new_txns['amount'].mean()
print('New product uptake:', uptake)
print('Number of new product transactions:', active)
print('Average transaction (new product):', round(avg_txn_new,2))

New product uptake: 25
Number of new product transactions: 25
Average transaction (new product): 204.62
