In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sqlite3
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")

# Regulatory-Style Reports with Python

* In this lesson, we will learn how to prepare regulatory banking reports using Python.
* Regulatory reports are required by law and help banks demonstrate compliance, manage risk, and prevent fraud.
* You will learn to build robust, auditable reports from synthetic banking data.
* We will cover data preparation, aggregation, exporting, and automating simple reporting logics.
* By the end, you will be able to design and generate basic regulatory-style outputs as used in real banking and finance.

# Understanding Regulatory Reporting Data

- Regulatory reports in banking summarize transaction activity, risk, and compliance data.
- Typical data sources include transaction records, customer tables, account summaries, and risk flags.
- Data is structured into rows (data records) and columns (fields/variables).
- Beginners often forget to check for missing, duplicated, or incorrectly formatted data.
- Knowing how data fits together is key before building reports.

In [2]:
# Example 1: Create synthetic banking transactions dataset
np.random.seed(42)
n_transactions = 1000
n_customers = 200
df = pd.DataFrame({
    'transaction_id': range(1, n_transactions + 1),
    'customer_id': np.random.choice([f'CUST_{i:04d}' for i in range(1, n_customers + 1)], n_transactions),
    'amount': np.round(np.random.normal(150, 60, n_transactions), 2),
    'transaction_type': np.random.choice(['Debit', 'Credit'], n_transactions),
    'channel': np.random.choice(['ATM', 'Online', 'Branch', 'POS'], n_transactions),
    'date': pd.date_range(start='2024-01-01', periods=n_transactions, freq='h')
})
print(df.shape)
print(df.head(3))

(1000, 6)
   transaction_id customer_id  amount transaction_type channel  \
0               1   CUST_0103  238.77           Credit     ATM   
1               2   CUST_0180  269.17           Credit     POS   
2               3   CUST_0093   58.62           Credit  Online   

                 date  
0 2024-01-01 00:00:00  
1 2024-01-01 01:00:00  
2 2024-01-01 02:00:00  


In [3]:
# Example 2: Create a simple customers table
customer_ids = [f'CUST_{i:04d}' for i in range(1, 201)]
customers = pd.DataFrame({
    'customer_id': customer_ids,
    'segment': ['Retail'] * 150 + ['Business'] * 50,
    'region': ['Metro'] * 100 + ['Regional'] * 100
})
print(customers.shape)
print(customers.head(3))

(200, 3)
  customer_id segment region
0   CUST_0001  Retail  Metro
1   CUST_0002  Retail  Metro
2   CUST_0003  Retail  Metro


In [4]:
# Example 3: Create a synthetic account types table-200 bank accounts, each linked to a customer ID and three key 
#columns: account_type, open_date, and account_id.

np.random.seed(42)
accounts = pd.DataFrame({
    'account_id': [f'ACC_{i:05d}' for i in range(1, 201)],
    'customer_id': customer_ids,
    'account_type': np.random.choice(['Savings', 'Cheque', 'Credit'], size=200),
    'open_date': pd.date_range(start='2015-01-01', periods=200, freq='30D')
})
print(accounts.shape)
print(accounts.head(3))

(200, 4)
  account_id customer_id account_type  open_date
0  ACC_00001   CUST_0001       Credit 2015-01-01
1  ACC_00002   CUST_0002      Savings 2015-01-31
2  ACC_00003   CUST_0003       Credit 2015-03-02


In [5]:
# Merge transactions with customer details
df_full = pd.merge(df, customers, how='left', on='customer_id')

print(df_full.shape)
print(df_full[['transaction_id', 'customer_id', 'segment', 'region']].head(3))

(1000, 8)
   transaction_id customer_id   segment    region
0               1   CUST_0103    Retail  Regional
1               2   CUST_0180  Business  Regional
2               3   CUST_0093    Retail     Metro


In [7]:
# Check the columns of the merged dataframe
df_full.columns

Index(['transaction_id', 'customer_id', 'amount', 'transaction_type',
       'channel', 'date', 'segment', 'region'],
      dtype='object')

In [8]:
# Check for missing key fields
print(df_full['segment'].isnull().sum(), 'transactions have unknown segment')
print(df_full['region'].isnull().sum(), 'transactions have unknown region')

0 transactions have unknown segment
0 transactions have unknown region


# Beginner Example 1: Calculate total debit amounts
- Regulatory reports often require total amounts for a certain type of transactions.

- To start, let us calculate the sum of all debit transactions in the bank's history.

In [9]:
total_debit = df_full[df_full['transaction_type'] == 'Debit']['amount'].sum()
print('Total debit transaction:', total_debit)

Total debit transaction: 75981.44


# Beginner Example 2: Count unique customers
- Many regulatory outputs require the number of affected customers or accounts.

- Let us count how many unique customers appear in our combined dataset.

In [10]:
unique_customers = df_full['customer_id'].nunique()
print('Unique customers in the report:', unique_customers)

Unique customers in the report: 198


# Beginner Example 3: List transaction types and counts
- Regulatory reports often require a breakdown of transaction activities by type.

- We will tabulate the count of each type of transaction in our data.

In [11]:
type_counts = df_full['transaction_type'].value_counts()
print('Transaction types and counts:')
print(type_counts)

Transaction types and counts:
transaction_type
Credit    503
Debit     497
Name: count, dtype: int64


# Intermediate Example 1: Summarize by Region
- Many regulatory templates require figures grouped by regional office.

- Let us compute total debit amounts by each region.

In [12]:
region_debit = df_full[df_full['transaction_type'] == 'Debit'].groupby('region')['amount'].sum()
print('Total debit transactions by region:')
print(region_debit)

Total debit transactions by region:
region
Metro       36985.58
Regional    38995.86
Name: amount, dtype: float64


# Intermediate Example 2: Monthly activity report
- Regulatory filings often show activity for each month or reporting period.

- Let us generate a table of all transactions per month.

In [13]:
df_full['month'] = df_full['date'].dt.to_period('M')
monthly_tx = df_full.groupby('month')['transaction_id'].count()
print('Transaction counts per month:')
print(monthly_tx)

Transaction counts per month:
month
2024-01    744
2024-02    256
Freq: M, Name: transaction_id, dtype: int64


# Intermediate Example 3: Top 5 customers by total debit
- For compliance or suspicious activity reporting, regulators often request lists of top parties by volume.

- Let us create a top 5 list of customers with the largest debit transaction sums.

In [14]:
top_customers = df_full[df_full['transaction_type'] == 'Debit'].groupby('customer_id')['amount'].sum().nlargest(5)
print('Top 5 customers by debit amount:')
print(top_customers)

Top 5 customers by debit amount:
customer_id
CUST_0145    1412.07
CUST_0147    1312.39
CUST_0111    1162.79
CUST_0090    1074.37
CUST_0008    1032.78
Name: amount, dtype: float64


# Advanced Example 1: Prepare a regulatory output table

- Now let us build a DataFrame with mandatory columns: Region, Segment, Monthly Debit, and Customer Count.
- Such a table is a realistic deliverable for many external and internal banking reports.

In [15]:
output_table = df_full[df_full['transaction_type'] == 'Debit'].groupby(['region','segment', 'month']).agg(
    total_debit_amount=pd.NamedAgg(column='amount', aggfunc='sum'),
    customer_count=pd.NamedAgg(column='customer_id', aggfunc=lambda x: x.nunique())
).reset_index()
print(output_table.head())

     region   segment    month  total_debit_amount  customer_count
0     Metro    Retail  2024-01            26313.53              84
1     Metro    Retail  2024-02            10672.05              54
2  Regional  Business  2024-01            15220.63              44
3  Regional  Business  2024-02             4019.28              23
4  Regional    Retail  2024-01            14389.69              41


# Advanced Example 3: Report validation - Outliers & Negative Amounts

- Before submission, regulators may require banks to flag transactions with negative or excessive amounts.
- Let us filter the dataset for such cases.

In [16]:
neg_amounts = df_full[df_full['amount'] < 0]
high_amounts = df_full[df_full['amount'] > 1000]
print('Transactions with negative amounts:', len(neg_amounts))
print('Transactions with high amounts:', len(high_amounts))

Transactions with negative amounts: 7
Transactions with high amounts: 0


Error Handling Example 1: Handling missing data during aggregation

- Sometimes customer fields are missing after the merge. We need to ensure this does not break our report.



In [17]:
# Count rows with missing region or segment
missing = df_full[df_full['region'].isnull() | df_full['segment'].isnull()]
print('Missing customer info rows:', missing.shape[0])



Missing customer info rows: 0


# Error Handling Example 2: Data type issues in reporting columns

- Regulatory reports must use the correct types for currency, counts, and dates. Let us check column types.


In [18]:

print(df_full.dtypes)

transaction_id               int64
customer_id                 object
amount                     float64
transaction_type            object
channel                     object
date                datetime64[ns]
segment                     object
region                      object
month                    period[M]
dtype: object


# Best Practices for Regulatory Reporting with Python

- Always check for missing data before or after merges.
- Use reproducible scripts with clear steps: load, merge, filter, aggregate, export.
- Validate output shapes and types every time.
- Save outputs in a format usable by regulated teams (.csv, .xlsx).
- Provide clear comments and section headings for auditors.
- Document each logic step, especially filters or threshold choices.