# Understanding the Core Data for Banking Business Rules

- Transaction, customer, and account tables record real-world operations.
- Each row in a transaction table is a transfer of value with metadata like time, type, and amount.
- Beginners sometimes mix up debit and credit, or forget to handle negative values.
- Columns like customer_id link records between datasets.
- Financial rules must respect the data structure to avoid errors.
- Always check your data for missing or illogical values before building any rules.

In [8]:
# Creating synthetic transaction data

import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings("ignore")


### Building synthetic transaction data

np.random.seed(42)
n_transactions = 1000
n_customers = 200
df = pd.DataFrame({
'transaction_id': range(1, n_transactions + 1),
'customer_id': np.random.choice([f'CUST_{i:04d}' for i in range(1, n_customers + 1)], n_transactions),
'amount': np.round(np.random.normal(150, 60, n_transactions), 2),
'transaction_type': np.random.choice(['Debit', 'Credit'], n_transactions),
'channel': np.random.choice(['ATM', 'Online', 'Branch', 'POS'], n_transactions),
'date': pd.date_range(start='2026-01-01', periods=n_transactions, freq='h')
})
print(df.shape)
print(df.head())

(1000, 6)
   transaction_id customer_id  amount transaction_type channel  \
0               1   CUST_0103  238.77           Credit     ATM   
1               2   CUST_0180  269.17           Credit     POS   
2               3   CUST_0093   58.62           Credit  Online   
3               4   CUST_0015   81.85            Debit  Branch   
4               5   CUST_0107  163.56           Credit  Online   

                 date  
0 2026-01-01 00:00:00  
1 2026-01-01 01:00:00  
2 2026-01-01 02:00:00  
3 2026-01-01 03:00:00  
4 2026-01-01 04:00:00  


In [9]:
#  Building a customer data table
customer_ids = [f'CUST_{i:04d}' for i in range(1, 201)]
customers = pd.DataFrame({
    'customer_id': customer_ids,
    'segment': ['Retail'] * 150 + ['Business'] * 50,
    'region': ['Metro'] * 100 + ['Regional'] * 100
})
print(customers.shape)
print(customers.head(3))

(200, 3)
  customer_id segment region
0   CUST_0001  Retail  Metro
1   CUST_0002  Retail  Metro
2   CUST_0003  Retail  Metro


In [10]:
# Creating an accounts table
accounts = pd.DataFrame({
    'account_id': [f'ACC_{i:05d}' for i in range(1, 201)],
    'customer_id': customer_ids,
    'account_type': np.random.choice(['Savings', 'Cheque', 'Credit'], size=200),
    'open_date': pd.date_range(start='2015-01-01', periods=200, freq='30D')
})
print(accounts.shape)
print(accounts.head(3))

(200, 4)
  account_id customer_id account_type  open_date
0  ACC_00001   CUST_0001       Credit 2015-01-01
1  ACC_00002   CUST_0002      Savings 2015-01-31
2  ACC_00003   CUST_0003      Savings 2015-03-02


# Beginner Example 1: A Simple Minimum Amount Rule

- Many banks block or flag transactions below a set minimum (e.g. for AML or fee reasons).
- Let us design a reusable function for this, instead of repeating your logic everywhere.

In [11]:
def is_minimum_amount(amount, threshold=50.0):
    return amount >= threshold

# Apply the rule to transaction amounts
df['meets_minimum'] = df['amount'].apply(is_minimum_amount)
print(df[['amount', 'meets_minimum']].head(6))

   amount  meets_minimum
0  238.77           True
1  269.17           True
2   58.62           True
3   81.85           True
4  163.56           True
5  200.38           True


# Beginner Example 2: Customer-Based Rule with Function

- Some banks set special rules for business customers.
- Let us make a function that checks if a customer is in the Business segment.

In [12]:
def is_business_customer(customer_id):
    segment = customers.loc[customers['customer_id'] == customer_id, 'segment'].values
    return segment[0] == 'Business' if len(segment) > 0 else False

# Test on first few customers
for cid in df['customer_id'].unique()[:6]:
    print(cid, ':', is_business_customer(cid))

CUST_0103 : False
CUST_0180 : True
CUST_0093 : False
CUST_0015 : False
CUST_0107 : False
CUST_0072 : False


# Beginner Example 3: Channel Restriction with a Function

- Sometimes a bank wants to block high-value transactions via POS terminals.
- Use a function to flag such cases for review.

In [13]:
def is_high_pos(amount, channel, limit=500):
    return channel == 'POS' and amount > limit

# Flag in dataframe
df['blocked_pos'] = df.apply(lambda x: is_high_pos(x['amount'], x['channel']), axis=1)
print(df[['amount', 'channel', 'blocked_pos']].head(8))

   amount channel  blocked_pos
0  238.77     ATM        False
1  269.17     POS        False
2   58.62  Online        False
3   81.85  Branch        False
4  163.56  Online        False
5  200.38  Online        False
6  149.33  Branch        False
7   51.72     POS        False


# Intermediate Example 1: Function for Transaction Daily Limits

- Most banking systems set daily transaction count or value limits.
- Let us write a function to count transactions for each customer per day.

In [14]:

df['date_only'] = df['date'].dt.date
def num_daily_txns(customer_id, date_only):
    return len(df[(df['customer_id'] == customer_id) & (df['date_only'] == date_only)])
count = num_daily_txns(df['customer_id'].iloc[0], df['date_only'].iloc[0])
print('Customer', df['customer_id'].iloc[0], 'had', count, 'transactions on', df['date_only'].iloc[0])

Customer CUST_0103 had 2 transactions on 2026-01-01


# Intermediate Example 2: Function to Enforce Weekly Debit Limit

- Weekly debit caps are a core anti-fraud control.
- We will create a function for total debits this week for a customer.

In [16]:

def weekly_debit_total(customer_id, this_date):
    end_date = pd.to_datetime(this_date)
    start_date = end_date - pd.Timedelta(days=6)
    mask = (df['customer_id'] == customer_id) & (df['transaction_type'] == 'Debit') & (df['date'] >= start_date) & (df['date'] <= end_date)
    return df.loc[mask, 'amount'].sum()
example_cust = df['customer_id'].iloc[5]
example_date = df['date'].iloc[5]
print('Total debit for customer', example_cust, 'from', example_date - pd.Timedelta(days=6), 'to', example_date, ':', weekly_debit_total(example_cust, example_date))

Total debit for customer CUST_0072 from 2025-12-26 05:00:00 to 2026-01-01 05:00:00 : 200.38


# Intermediate Example 2: Function to Enforce Weekly Debit Limit

- Weekly debit caps are a core anti-fraud control.
- We will create a function for total debits this week for a customer.

In [17]:
def weekly_debit_total(customer_id, this_date):
    end_date = pd.to_datetime(this_date)
    start_date = end_date - pd.Timedelta(days=6)
    mask = (df['customer_id'] == customer_id) & (df['transaction_type'] == 'Debit') & (df['date'] >= start_date) & (df['date'] <= end_date)
    return df.loc[mask, 'amount'].sum()
example_cust = df['customer_id'].iloc[10]
example_date = df['date'].iloc[10]
print('Total debit for customer', example_cust, 'from', example_date - pd.Timedelta(days=6), 'to', example_date, ':', weekly_debit_total(example_cust, example_date))

Total debit for customer CUST_0075 from 2025-12-26 10:00:00 to 2026-01-01 10:00:00 : 137.63


Advanced Example 1: Flag Multiple Violations with Composite Function

- In real banking, you need to check many rules at once and report which ones failed.
- We will build a function returning a dictionary of violations for a single transaction.

In [19]:
# First, define the allowed_channel function
def allowed_channel(customer_id, channel):
    """
    Check if a channel is allowed for a given customer.
    Example rule: Business customers can't use POS terminals
    """
    # Check if customer is business (from customers dataframe)
    is_business = is_business_customer(customer_id)
    
    # Business customers cannot use POS terminals
    if is_business and channel == 'POS':
        return False
    
    # All other combinations are allowed
    return True

# Now the original function will work
def check_transaction_rules(row):
    rules = {}
    rules['min_amt'] = is_minimum_amount(row['amount'])
    rules['high_pos'] = not is_high_pos(row['amount'], row['channel'])
    rules['allowed_channel'] = allowed_channel(row['customer_id'], row['channel'])
    return rules

# See violations for a sample row
sample_row = df.iloc[0]
print(check_transaction_rules(sample_row))

{'min_amt': np.True_, 'high_pos': True, 'allowed_channel': True}


Advanced Example 2: Vectorized Application and Scoring for All Transactions

- Real systems need to apply rules to millions of records quickly.
- We will vectorize and summarize violations for the entire DataFrame.

In [20]:
rule_results = df.apply(check_transaction_rules, axis=1, result_type='expand')
df = pd.concat([df, rule_results], axis=1)
print(df[['amount', 'channel', 'min_amt', 'high_pos', 'allowed_channel']].head(5))

   amount channel  min_amt  high_pos  allowed_channel
0  238.77     ATM     True      True             True
1  269.17     POS     True      True            False
2   58.62  Online     True      True             True
3   81.85  Branch     True      True             True
4  163.56  Online     True      True             True


Advanced Example 3: Exporting Violations for Audit

Banks require robust audit trails of rule triggers.

We will export all violations to a CSV file for review.

In [21]:
violations = df[~(df['min_amt'] & df['high_pos'] & df['allowed_channel'])]

violations_file = 'violations_report.csv'

violations.to_csv(violations_file, index=False)

print(f'{violations.shape[0]} violations written to {violations_file}')

115 violations written to violations_report.csv


Error Handling Example: Defensive Functions for Data Quality Issues

- What if customer_id does not exist or data contains missing values?
- Let us improve our previous function to handle these cases and log warnings.

In [22]:
def strict_is_business_customer(customer_id):
    if customer_id is None or customer_id not in set(customers['customer_id']):
        print(f"Warning: Unknown or missing customer_id: {customer_id}")
        return False
    segment = customers.loc[customers['customer_id'] == customer_id, 'segment'].values
    return segment[0] == 'Business'

# Try with an unknown ID
print(strict_is_business_customer('CUST_0999'))

False


# Error Handling Example: Handling Non-Numeric Amounts and Missing Data

- Bad data can crash rule engines. Ensure your amount fields are always numeric.
- We will build a safe function to handle inappropriate types or NaN values.

In [23]:
def is_valid_amount(amount):
    try:
        a = float(amount)
        return not np.isnan(a)
    except (ValueError, TypeError):
        return False

# Examples
print(is_valid_amount(10))
print(is_valid_amount('foo'))
print(is_valid_amount(np.nan))

True
False
False


# Best Practices: Naming, Documentation, and Testing

- Use clear names describing what your function does.
- Add docstrings and type hints.
- Test on simple and edge cases, every time you add or change a rule.

In [24]:
def is_minimum_amount(amount: float, threshold: float = 50.0) -> bool:
    """Return True if amount greater than or equal to threshold."""
    return is_valid_amount(amount) and amount >= threshold

# Simple unit test
print(is_minimum_amount(50), 'Expected: True')
print(is_minimum_amount(20), 'Expected: False')
print(is_minimum_amount('bad'), 'Expected: False')

True Expected: True
False Expected: False
False Expected: False


# Best Practices: Logging and Traceability in Rule Functions

- Banks must trace why a transaction was allowed or blocked.
- Add simple print statements or use logging for real-world code.

In [25]:
def is_high_pos(amount, channel, limit=500):
    if channel == 'POS' and amount > limit:
        print(f'Flag: POS transaction above limit (${amount})')
        return True
    return False

# See a flagged transaction
print(is_high_pos(700, 'POS'))
print(is_high_pos(200, 'ATM'))

Flag: POS transaction above limit ($700)
True
False


# End-to-End Challenge: Detect Debit Rule Breaches and Warn Customers

- Let us combine everything: for each customer, flag all debit transactions exceeding a $500 daily limit.
- Output a warning per customer who violates this rule at least once.

In [26]:
# Create a table with total daily debits per customer
daily_debits = df[df['transaction_type']=='Debit'].groupby(['customer_id', 'date_only'])['amount'].sum().reset_index()
daily_debits['exceeds_limit'] = daily_debits['amount'] > 500
warned_customers = set(daily_debits.loc[daily_debits['exceeds_limit'], 'customer_id'])
for cid in warned_customers:
    print(f'Warning: Customer {cid} has exceeded the daily debit limit.')