# 2023: Week 6 - DSB Customer Ratings
February 08, 2023
 Challenge by: Jenny Martin

For the second intermediate challenge, Data Source Bank are interested in surveying their customers. They're trying to work out whether users prefer to use the Online Interface for their banking requirements, or whether they prefer the Mobile App. Customers can be quite fickle and so it's probably best to take some of their ratings with a pinch of salt! We'll use an aggregated view to hopefully cut through the noise.

## Input
The results of a survey asking customers to rate, on a scale of 1-5, different areas for the Mobile App and the Online Interface.

![image](https://blogger.googleusercontent.com/img/a/AVvXsEhUCGKJmzq_o9fwKipILRSYPXdH7dZ1LSb2W-HzhzAMk1mguk7hFqIE6md38FPt9ZXSBDEHODFnKBX4Pi3gr7Z1lFpmduuSusVeTkreJ6SVsKY9qgZXegfu_cui9kASFy2ATBqhw13ykVkJX2jRBwCsGNnO_eP_ht3QlpyEw6ljTf-1cCGLxp87jcAehg=w640-h82)


## Requirements
1. Input the data
2. Reshape the data so we have 5 rows for each customer, with responses for the Mobile App and Online Interface being in separate fields on the same row
- Clean the question categories so they don't have the platform in from of them
e.g. Mobile App - Ease of Use should be simply Ease of Use
- Exclude the Overall Ratings, these were incorrectly calculated by the system
3. Calculate the Average Ratings for each platform for each customer
4. Calculate the difference in Average Rating between Mobile App and Online Interface for each customer
5. Catergorise customers as being:
Mobile App Superfans if the difference is greater than or equal to 2 in the Mobile App's favour
Mobile App Fans if difference >= 1
Online Interface Fan
Online Interface Superfan
Neutral if difference is between 0 and 1
6. Calculate the Percent of Total customers in each category, rounded to 1 decimal place
- Output the data

![image](https://blogger.googleusercontent.com/img/a/AVvXsEhKlFzIrOztF4tTgHpTRO3me0v6bLEtqcInn77F3oNHzx0zJVq-ab4cfX6ajOyBRzwYg4hdJOp1dZj_5J1izDOB7KK0Yx63LgBSfPTDaj2MikY898Luz_Q4PXFiIsTQHf-oSNqpt4f3WKlhSAyrMtgCLxRR2M7ZqNmz-62RaLfiAM7dYDR2VAQrUepWZg)

In [46]:
# from google.colab import drive
# drive.mount('/content/drive')

In [47]:
import pandas as pd
import numpy as np

In [48]:
file = '/content/drive/MyDrive/Colab Notebooks/Prepping Data/Week 6/DSB Customer Survery.csv'

In [126]:
# Read in file
df = pd.read_csv(file)

In [127]:
df.head()

Unnamed: 0,Customer ID,Mobile App - Ease of Use,Mobile App - Ease of Access,Mobile App - Navigation,Mobile App - Likelihood to Recommend,Mobile App - Overall Rating,Online Interface - Ease of Use,Online Interface - Ease of Access,Online Interface - Navigation,Online Interface - Likelihood to Recommend,Online Interface - Overall Rating
0,535084,2,1,5,4,1,4,4,5,2,3
1,250892,3,5,4,4,2,5,5,2,4,3
2,544191,5,3,4,4,1,3,3,2,3,1
3,949343,2,5,4,3,1,1,4,3,5,1
4,915305,3,1,2,1,1,4,2,4,3,2


## Reshape the data so we have 5 rows for each customer, with responses for the Mobile App and Online Interface being in separate fields on the same row

In [128]:
# Unpivot the response columns from wide to tall
df = pd.melt(df, id_vars = 'Customer ID', value_vars=list(df.columns[1:]))

In [129]:
df

Unnamed: 0,Customer ID,variable,value
0,535084,Mobile App - Ease of Use,2
1,250892,Mobile App - Ease of Use,3
2,544191,Mobile App - Ease of Use,5
3,949343,Mobile App - Ease of Use,2
4,915305,Mobile App - Ease of Use,3
...,...,...,...
7675,374015,Online Interface - Overall Rating,1
7676,144922,Online Interface - Overall Rating,3
7677,421323,Online Interface - Overall Rating,2
7678,707580,Online Interface - Overall Rating,1


In [130]:
# Split the response columns by delimiter
df[['System', 'Question']] = df['variable'].str.split('-', 1, expand=True)

  df[['System', 'Question']] = df['variable'].str.split('-', 1, expand=True)


In [131]:
# Remove whitespace in the split columns
df = df.apply(lambda x: x.str.strip() if x.dtype.name == 'object' else x, axis=0)

In [132]:
# Drop the unsplit column
df.drop(['variable'], axis =1, inplace=True)

In [133]:
df.head()

Unnamed: 0,Customer ID,value,System,Question
0,535084,2,Mobile App,Ease of Use
1,250892,3,Mobile App,Ease of Use
2,544191,5,Mobile App,Ease of Use
3,949343,2,Mobile App,Ease of Use
4,915305,3,Mobile App,Ease of Use


In [134]:
# Next,  pivot to get it into the shape, with one row per customer ID
df = df.pivot(index=['Customer ID', 'Question'], columns=['System'], values='value').reset_index()

In [135]:
df.head(10)

System,Customer ID,Question,Mobile App,Online Interface
0,101646,Ease of Access,5,4
1,101646,Ease of Use,3,2
2,101646,Likelihood to Recommend,4,4
3,101646,Navigation,2,3
4,101646,Overall Rating,5,2
5,101650,Ease of Access,4,5
6,101650,Ease of Use,1,4
7,101650,Likelihood to Recommend,2,2
8,101650,Navigation,2,1
9,101650,Overall Rating,2,5


In [136]:
# Filter out Overall Ratings rows, these were incorrectly calculated by the system
df = df.loc[df['Question'] != "Overall Rating"]

In [137]:
df.head(10)

System,Customer ID,Question,Mobile App,Online Interface
0,101646,Ease of Access,5,4
1,101646,Ease of Use,3,2
2,101646,Likelihood to Recommend,4,4
3,101646,Navigation,2,3
5,101650,Ease of Access,4,5
6,101650,Ease of Use,1,4
7,101650,Likelihood to Recommend,2,2
8,101650,Navigation,2,1
10,105088,Ease of Access,1,5
11,105088,Ease of Use,5,5


In [138]:
# 3. Calculate the Average Ratings for each platform for each customer
avg_ratings = df.groupby(['Customer ID']).agg( {'Mobile App': 'mean', 'Online Interface': 'mean'} ).reset_index()
avg_ratings.head()

System,Customer ID,Mobile App,Online Interface
0,101646,3.5,3.25
1,101650,2.25,3.0
2,105088,3.5,4.25
3,109306,2.0,2.0
4,110719,3.0,3.5


In [139]:
# 4. Calculate the difference in Average Rating between Mobile App and Online Interface for each customer
avg_ratings['Difference in Avg Rating'] = avg_ratings['Mobile App']- avg_ratings['Online Interface']

In [140]:
avg_ratings

System,Customer ID,Mobile App,Online Interface,Difference in Avg Rating
0,101646,3.50,3.25,0.25
1,101650,2.25,3.00,-0.75
2,105088,3.50,4.25,-0.75
3,109306,2.00,2.00,0.00
4,110719,3.00,3.50,-0.50
...,...,...,...,...
763,994742,3.00,3.50,-0.50
764,996508,2.50,3.00,-0.50
765,997785,3.75,3.00,0.75
766,997926,3.50,3.75,-0.25


In [141]:
# 5. Catergorise customers as being:
# Mobile App Superfans if the difference is greater than or equal to 2 in the Mobile App's favour
# Mobile App Fans if difference >= 1
# Online Interface Fan
# Online Interface Superfan
# Neutral if difference is between 0 and 1
# 6. Calculate the Percent of Total customers in each category, rounded to 1 decimal place
# - Output the data

In [142]:
# Define the function to recode the difference values
def recode_col(row):
    if row >= 2:
        return "Mobile App Superfan"
    elif  1<= row < 2:
        return "Mobile App Fan"
    elif row <=-2:
        return "Online Interface Fan"
    elif -2< row <= -1:
        return "Online Interface Fan"
    else:
        return "Neutral"

In [143]:
avg_ratings['Preference'] = avg_ratings['Difference in Avg Rating'].apply(recode_col)

In [144]:
avg_ratings.sample(10)

System,Customer ID,Mobile App,Online Interface,Difference in Avg Rating,Preference
7,115507,2.0,3.5,-1.5,Online Interface Fan
95,192322,3.0,4.0,-1.0,Online Interface Fan
501,660102,2.75,1.75,1.0,Mobile App Fan
528,690440,4.0,4.0,0.0,Neutral
27,131634,3.25,3.75,-0.5,Neutral
647,840208,3.75,2.5,1.25,Mobile App Fan
573,741586,1.75,3.25,-1.5,Online Interface Fan
305,434467,4.0,2.75,1.25,Mobile App Fan
520,679462,3.5,4.25,-0.75,Neutral
101,201048,3.75,2.75,1.0,Mobile App Fan


In [157]:
#6 Calculate the Percent of Total customers in each category, rounded to 1 decimal place

pct = avg_ratings['Preference'].value_counts(normalize=True).mul(100).round(1).astype(str) + '%'

pct


Neutral                 63.7%
Online Interface Fan    17.3%
Mobile App Fan          16.4%
Mobile App Superfan      2.6%
Name: Preference, dtype: object

In [159]:
# Present value counts as dataframe
pct_df = pd.DataFrame(pct)
pct_df

Unnamed: 0,Preference
Neutral,63.7%
Online Interface Fan,17.3%
Mobile App Fan,16.4%
Mobile App Superfan,2.6%


In [160]:
pct_v2 = avg_ratings['Preference'].value_counts(normalize=True).mul(100).round(1).astype(str) + '%'
pct_v2 = pct_v2.reset_index()


Unnamed: 0,index,Preference
0,Neutral,63.7%
1,Online Interface Fan,17.3%
2,Mobile App Fan,16.4%
3,Mobile App Superfan,2.6%


In [164]:
# Change column name
pct_v2.columns =['Preference', '% of Total']

In [166]:
# Nicely formatted
pct_v2

Unnamed: 0,Preference,% of Total
0,Neutral,63.7%
1,Online Interface Fan,17.3%
2,Mobile App Fan,16.4%
3,Mobile App Superfan,2.6%
