# 2023: Week 5 DSB Ranking
February 01, 2023
Created by: Carl Allchin

The intermediate month begins by building on the aggregation technique covered in week 1. This week's challenge looks at two analytical calculations that can make the use of the data source much easier for end users.

If you are a user of Tableau Desktop, you have likely had to create rank's and use Level of Detail calculations (these are just calculations at a different level of granularity to the data set or visual you are creating). These aren't easy calculations for new users to understand so if you can add them to your data set before sharing then you can make the end user's life easier.

Here are some links to learn more about both techniques:

Ranking
Level of Detail calculations

Input
- This week's input is the same as the first week's, one .csv file, but you can download it here
![image.png](attachment:image.png)

- Create the bank code by splitting out off the letters from the Transaction code, call this field 'Bank'
- Change transaction date to the just be the month of the transaction
- Total up the transaction values so you have one row for each bank and month combination
- Rank each bank for their value of transactions each month against the other banks. 1st is the highest value of transactions, 3rd the lowest.
- Without losing all of the other data fields, find:
-The average rank a bank has across all of the months, call this field 'Avg Rank per Bank'
-The average transaction value per rank, call this field 'Avg Transaction Value per Rank'

-Output the data

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [3]:
import pandas as pd
import numpy as np

In [4]:
# file = "PD 2023 Wk 1 Input.csv"
file = '/content/drive/MyDrive/Colab Notebooks/Prepping Data/Week 3/PD 2023 Wk 1 Input.csv'

In [5]:
# Read in file
df = pd.read_csv(file)

In [6]:
# Extract bank name from Transaction Code
df['Bank'] = df['Transaction Code'].str.split('-').str[0]

# Convert 'Transaction Date' to datetime format.
# Need to be explicit about how to parse the current datetime format
# Else any aggregation by datepart will be off.
df['Transaction Date'] = pd.to_datetime(df['Transaction Date'], format="%d/%m/%Y %H:%M:%S")

In [7]:
df.head()

Unnamed: 0,Transaction Code,Value,Customer Code,Online or In-Person,Transaction Date,Bank
0,DTB-716-679-576,1448,100001,2,2023-03-20,DTB
1,DS-795-814-303,7839,100001,2,2023-11-15,DS
2,DSB-807-592-406,5520,100005,1,2023-07-14,DSB
3,DS-367-545-264,7957,100007,2,2023-08-18,DS
4,DSB-474-374-857,5375,100000,2,2023-08-26,DSB


In [8]:
# Change transaction date to the just be the month of the transaction
df['Month'] = df['Transaction Date'].dt.month_name()

In [10]:
# Total up the transaction values so you have one row for each bank and month combination
monthly_transactions = df.groupby(['Bank', 'Month'])['Value'].sum().reset_index()

In [12]:
monthly_transactions.head(6)

Unnamed: 0,Bank,Month,Value
0,DS,April,40785
1,DS,August,102237
2,DS,December,33952
3,DS,February,31204
4,DS,January,50207
5,DS,July,55002


In [None]:
# Rank each bank for their value of transactions each month against the other banks. 1st is the highest value of transactions, 3rd the lowest.


- Rank each bank for their value of transactions each month against the other banks.
- 1st is the highest value of transactions, 3rd the lowest.
- Without losing all of the other data fields, find:
-The average rank a bank has across all of the months, call this field 'Avg Rank per Bank'
-The average transaction value per rank, call this field 'Avg Transaction Value per Rank'

In [13]:
monthly_transactions=monthly_transactions.sort_values(by =['Month', 'Value'], ascending =[True,False], ignore_index=True)

In [14]:
monthly_transactions.head(5)

Unnamed: 0,Bank,Month,Value
0,DTB,April,42360
1,DS,April,40785
2,DSB,April,30317
3,DS,August,102237
4,DTB,August,66063


In [15]:
monthly_transactions['Monthly Rank'] =1
monthly_transactions['Monthly Rank'] = monthly_transactions.groupby(['Month'])['Monthly Rank'].cumsum()
monthly_transactions.head(6)

Unnamed: 0,Bank,Month,Value,Monthly Rank
0,DTB,April,42360,1
1,DS,April,40785,2
2,DSB,April,30317,3
3,DS,August,102237,1
4,DTB,August,66063,2
5,DSB,August,38167,3


In [18]:
# Calculate The average rank a bank has across all of the months, call this field 'Avg Rank per Bank'
monthly_transactions['Avg Rank Per Bank'] = monthly_transactions.groupby(['Bank'])['Monthly Rank'].transform('mean')

In [22]:
# Calculate the average transaction value per rank, call this field 'Avg Transaction Value per Rank'
monthly_transactions['Avg Transaction Value Per Rank']= monthly_transactions.groupby(['Monthly Rank'])['Value'].transform('mean')

In [23]:
monthly_transactions.head()

Unnamed: 0,Bank,Month,Value,Monthly Rank,Avg Rank Per Bank,Avg Transaction Value Per Rank
0,DTB,April,42360,1,1.75,66967.75
1,DS,April,40785,2,1.916667,48633.666667
2,DSB,April,30317,3,2.333333,34620.833333
3,DS,August,102237,1,1.916667,66967.75
4,DTB,August,66063,2,1.75,48633.666667
