# # Daily and Monthly Transaction Aggregation in Banking

- In this lesson, we will learn how to summarize banking transactions by both day and month.
- Financial aggregations are essential for tracking revenue, detecting issues, and making business decisions in banks.
- You will build skills to calculate totals, averages, and trends based on daily and monthly data patterns.
- We will use synthetic banking transaction data to practice real analysis tasks similar to those performed by analysts at financial institutions.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")


In [2]:
# Example 1: Create synthetic banking transactions dataset
np.random.seed(42)
n_transactions = 1000
n_customers = 200
df = pd.DataFrame({
    'transaction_id': range(1, n_transactions + 1),
    'customer_id': np.random.choice([f'CUST_{i:04d}' for i in range(1, n_customers + 1)], n_transactions),
    'amount': np.round(np.random.normal(150, 60, n_transactions), 2),
    'transaction_type': np.random.choice(['Debit', 'Credit'], n_transactions),
    'channel': np.random.choice(['ATM', 'Online', 'Branch', 'POS'], n_transactions),
    'date': pd.date_range(start='2024-01-01', periods=n_transactions, freq='h')
})
print(df.shape)
print(df.head(3))

(1000, 6)
   transaction_id customer_id  amount transaction_type channel  \
0               1   CUST_0103  238.77           Credit     ATM   
1               2   CUST_0180  269.17           Credit     POS   
2               3   CUST_0093   58.62           Credit  Online   

                 date  
0 2024-01-01 00:00:00  
1 2024-01-01 01:00:00  
2 2024-01-01 02:00:00  


In [3]:
# Example 2: Create a simple customers table
customer_ids = [f'CUST_{i:04d}' for i in range(1, 201)]
customers = pd.DataFrame({
    'customer_id': customer_ids,
    'segment': ['Retail'] * 150 + ['Business'] * 50,
    'region': ['Metro'] * 100 + ['Regional'] * 100
})
print(customers.shape)
print(customers.head(3))

(200, 3)
  customer_id segment region
0   CUST_0001  Retail  Metro
1   CUST_0002  Retail  Metro
2   CUST_0003  Retail  Metro


In [4]:
# Example 3: Create a synthetic account types table
np.random.seed(42)
accounts = pd.DataFrame({
    'account_id': [f'ACC_{i:05d}' for i in range(1, 201)],
    'customer_id': customer_ids,
    'account_type': np.random.choice(['Savings', 'Cheque', 'Credit'], size=200),
    'open_date': pd.date_range(start='2015-01-01', periods=200, freq='30D')
})
print(accounts.shape)
print(accounts.head(3))

(200, 4)
  account_id customer_id account_type  open_date
0  ACC_00001   CUST_0001       Credit 2015-01-01
1  ACC_00002   CUST_0002      Savings 2015-01-31
2  ACC_00003   CUST_0003       Credit 2015-03-02


# What are daily and monthly financial aggregations?

- Daily aggregations summarize all transactions on each day.
- Monthly aggregations summarize transactions for each month.
- These calculations reveal trends and help banks detect unusual activity or growth.
- Mistakes: Not converting date columns, grouping by wrong column, missing data overlaps.

In [5]:
# Convert the date column to datetime, if it is not already
df['date'] = pd.to_datetime(df['date'])
print(df['date'].dtype)

datetime64[ns]


In [6]:
# Beginner Example 1: Count transactions per day
daily_counts = df.groupby(df['date'].dt.date).size()
print(daily_counts.head())

date
2024-01-01    24
2024-01-02    24
2024-01-03    24
2024-01-04    24
2024-01-05    24
dtype: int64


In [7]:
# Beginner Example 2: Calculate total transaction amounts for each day
daily_totals = df.groupby(df['date'].dt.date)['amount'].sum()
print(daily_totals.head())

date
2024-01-01    3771.76
2024-01-02    3993.85
2024-01-03    3777.92
2024-01-04    3946.69
2024-01-05    3489.53
Name: amount, dtype: float64


In [8]:
# Beginner Example 3: Calculate average transaction amount per day
daily_avg = df.groupby(df['date'].dt.date)['amount'].mean()
print(daily_avg.head())

date
2024-01-01    157.156667
2024-01-02    166.410417
2024-01-03    157.413333
2024-01-04    164.445417
2024-01-05    145.397083
Name: amount, dtype: float64


In [9]:
# Intermediate Example 1: Get monthly totals using the pandas period
monthly_totals = df.groupby(df['date'].dt.to_period('M'))['amount'].sum()
print(monthly_totals)

date
2024-01    114493.50
2024-02     37208.58
Freq: M, Name: amount, dtype: float64


In [10]:
# Intermediate Example 2: Find peak transaction days each month
peak_days = df.groupby(df['date'].dt.to_period('M')).apply(lambda g: g.loc[g['amount'].idxmax()][['date', 'amount']])
print(peak_days)

                       date  amount
date                               
2024-01 2024-01-04 20:00:00  340.47
2024-02 2024-02-03 19:00:00  307.75


In [11]:
# Intermediate Example 3: Daily totals as a time series DataFrame
df_daily = df.set_index('date').resample('D')['amount'].sum().to_frame('daily_total')
print(df_daily.head())

            daily_total
date                   
2024-01-01      3771.76
2024-01-02      3993.85
2024-01-03      3777.92
2024-01-04      3946.69
2024-01-05      3489.53


In [12]:
# Advanced Example 1: Aggregations by customer and month
customer_monthly = df.groupby([df['customer_id'], df['date'].dt.to_period('M')])['amount'].sum().unstack()
print(customer_monthly.head())

date         2024-01  2024-02
customer_id                  
CUST_0001     706.28   376.88
CUST_0002     919.98      NaN
CUST_0003     376.15   795.98
CUST_0004     504.85   152.94
CUST_0005     378.90   251.92


In [13]:
# Advanced Example 2: Rolling monthly moving average
df_monthly = df.set_index('date').resample('M')['amount'].sum().to_frame('monthly_total')
df_monthly['rolling_3_month_avg'] = df_monthly['monthly_total'].rolling(window=3, min_periods=1).mean()
print(df_monthly)


            monthly_total  rolling_3_month_avg
date                                          
2024-01-31      114493.50            114493.50
2024-02-29       37208.58             75851.04


In [14]:
# Advanced Example 3: Aggregating different transaction types (debit vs credit) per month
monthly_types = df.groupby([df['date'].dt.to_period('M'), 'transaction_type'])['amount'].sum().unstack(fill_value=0)
print(monthly_types)

transaction_type    Credit     Debit
date                                
2024-01           58569.65  55923.85
2024-02           17150.99  20057.59


In [15]:
# Error Handling Example: Handling missing or corrupted date values
df_bad = df.copy()
df_bad.loc[0, 'date'] = 'not_a_date'
try:
    df_bad['date'] = pd.to_datetime(df_bad['date'], errors='raise')
except Exception as e:
    print('Error:', e)
    df_bad['date'] = pd.to_datetime(df_bad['date'], errors='coerce')
    print('Missing dates:', df_bad['date'].isna().sum())

Error: Unknown datetime string format, unable to parse: not_a_date, at position 0
Missing dates: 1


In [16]:
# Debugging Example: Sanity check for negative transaction amounts
neg_amounts = df[df['amount'] < 0]
print(f'Negative transactions: {neg_amounts.shape[0]}')
print(neg_amounts.head())

Negative transactions: 7
     transaction_id customer_id  amount transaction_type channel  \
94               95   CUST_0024   -4.38           Credit  Online   
165             166   CUST_0170  -64.09           Credit  Online   
231             232   CUST_0068   -4.69           Credit     ATM   
297             298   CUST_0161  -12.75           Credit     ATM   
304             305   CUST_0166   -1.76            Debit  Branch   

                   date  
94  2024-01-04 22:00:00  
165 2024-01-07 21:00:00  
231 2024-01-10 15:00:00  
297 2024-01-13 09:00:00  
304 2024-01-13 16:00:00  


# Best Practices for Financial Aggregations

- Always check and convert date columns to datetime type.
- Use groupby and resample for accurate period summaries.
- Handle missing or bad data early.
- Document every aggregation step for transparency and reproducibility.
- Sanity-check your outputs for expected totals and patterns.

In [None]:
# Common pattern: Chain grouping and summary operations---check daily summary stats
daily_summary = (df.groupby(df['date'].dt.date)
    .agg(total_amount=('amount', 'sum'),
         avg_amount=('amount', 'mean'),
         num_transactions=('transaction_id', 'count')))
print(daily_summary.head())

            total_amount  avg_amount  num_transactions
date                                                  
2024-01-01       3771.76  157.156667                24
2024-01-02       3993.85  166.410417                24
2024-01-03       3777.92  157.413333                24
2024-01-04       3946.69  164.445417                24
2024-01-05       3489.53  145.397083                24


In [18]:
# End-to-end example: Find customer with monthly highest debit total, then plot trend

monthly_debits = df[df['transaction_type'] == 'Debit'].groupby(['customer_id', df['date'].dt.to_period('M')])['amount'].sum().unstack()

top_cust = monthly_debits.sum(axis=1).idxmax()

top_cust_trend = monthly_debits.loc[top_cust]

print(f"Customer with highest total debit: {top_cust}")
print(top_cust_trend)

Customer with highest total debit: CUST_0145
date
2024-01    856.84
2024-02    555.23
Freq: M, Name: CUST_0145, dtype: float64


# Conclusion

- You learned how to aggregate banking data daily and monthly.
- This helps banks track financial trends, report to management, and catch anomalies.
- Practice chaining groupby and resample for all your real banking analytics.
- Keep learning: Try a real banking dataset next, and watch our YouTube series!