# Challenge 3

In this challenge we will work on the `Orders` data set. In your work you will apply the thinking process and workflow we showed you in Challenge 2.

You are serving as a Business Intelligence Analyst at the headquarter of an international fashion goods chain store. Your boss today asked you to do two things for her:

**First, identify two groups of customers from the data set.** The first group is **VIP Customers** whose **aggregated expenses** at your global chain stores are **above the 95th percentile** (aka. 0.95 quantile). The second group is **Preferred Customers** whose **aggregated expenses** are **between the 75th and 95th percentile**.

**Second, identify which country has the most of your VIP customers, and which country has the most of your VIP+Preferred Customers combined.**

## Q1: How to identify VIP & Preferred Customers?

We start by importing all the required libraries:

In [70]:
# import required libraries
import numpy as np
import pandas as pd
import matplotlib as plt

Next, extract and import `Orders` dataset into a dataframe variable called `orders`. Print the head of `orders` to overview the data:

In [71]:
# your code here
import patoolib
patoolib.extract_archive('Orders.zip')

patool: Extracting Orders.zip ...
patool: running "C:\Program Files\7-Zip\7z.EXE" x -o.\Unpack_u1vlu3fr -- Orders.zip
patool: ... Orders.zip extracted to `Orders' (local file exists).


'Orders'

In [72]:
orders = pd.read_csv('Orders.csv')
orders.head()

Unnamed: 0.1,Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent
0,0,536365,85123A,2010,12,3,8,white hanging heart t-light holder,6,2010-12-01 08:26:00,2.55,17850,United Kingdom,15.3
1,1,536365,71053,2010,12,3,8,white metal lantern,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34
2,2,536365,84406B,2010,12,3,8,cream cupid hearts coat hanger,8,2010-12-01 08:26:00,2.75,17850,United Kingdom,22.0
3,3,536365,84029G,2010,12,3,8,knitted union flag hot water bottle,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34
4,4,536365,84029E,2010,12,3,8,red woolly hottie white heart.,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34


---

"Identify VIP and Preferred Customers" is the non-technical goal of your boss. You need to translate that goal into technical languages that data analysts use:

## How to label customers whose aggregated `amount_spent` is in a given quantile range?


We break down the main problem into several sub problems:

#### Sub Problem 1: How to aggregate the  `amount_spent` for unique customers?

#### Sub Problem 2: How to select customers whose aggregated `amount_spent` is in a given quantile range?

#### Sub Problem 3: How to label selected customers as "VIP" or "Preferred"?

*Note: If you want to break down the main problem in a different way, please feel free to revise the sub problems above.*

Now in the workspace below, tackle each of the sub problems using the iterative problem solving workflow. Insert cells as necessary to write your codes and explain your steps.

In [73]:
# your code here
customers_spent = orders.groupby(['CustomerID']).agg({'amount_spent':'sum'}).reset_index()
customers_spent

Unnamed: 0,CustomerID,amount_spent
0,12346,77183.60
1,12347,4310.00
2,12348,1797.24
3,12349,1757.55
4,12350,334.40
...,...,...
4334,18280,180.60
4335,18281,80.82
4336,18282,178.05
4337,18283,2094.88


In [74]:
np.quantile(customers_spent,0.75)
VIP_customers = customers_spent[customers_spent['amount_spent']>np.quantile(customers_spent,0.95)]
Preferred_customers = customers_spent[(customers_spent['amount_spent']>np.quantile(customers_spent,0.75)) & (customers_spent['amount_spent']<np.quantile(customers_spent,0.95))]

VIP_customers_list = VIP_customers['CustomerID'].to_list()
Preferred_customers_list = Preferred_customers['CustomerID'].to_list()

In [78]:
orders['customer_label'] = 1

def label(code):
    if code in VIP_customers_list:
        return "VIP"
    if code in Preferred_customers_list:
        return "Pref"
    else:
        return "Normal"
    
orders['customer_label'] = orders['CustomerID'].apply(label)
orders['customer_label'].value_counts()
orders.sample(10)

Unnamed: 0.1,Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent,customer_label
385177,522606,580403,23554,2011,12,7,11,landmark frame oxford street,1,2011-12-04 11:56:00,12.5,17858,United Kingdom,12.5,Normal
362743,491809,578074,21327,2011,11,2,16,skulls writing set,1,2011-11-22 16:06:00,1.65,17590,United Kingdom,1.65,Normal
168209,239240,558035,22720,2011,6,5,12,set of 3 cake tins pantry design,3,2011-06-24 12:30:00,4.95,15258,United Kingdom,14.85,Normal
380382,516105,579867,84029G,2011,11,3,16,knitted union flag hot water bottle,4,2011-11-30 16:41:00,4.25,16265,United Kingdom,17.0,Normal
278012,384678,570168,47590A,2011,10,5,14,blue happy birthday bunting,3,2011-10-07 14:05:00,5.45,17596,United Kingdom,16.35,Normal
191493,275072,560931,47599A,2011,7,5,10,pink party bags,26,2011-07-22 10:25:00,2.1,16365,United Kingdom,54.6,Normal
218029,308153,563940,23344,2011,8,1,9,jumbo bag 50's christmas,10,2011-08-22 09:30:00,2.08,17416,United Kingdom,20.8,Normal
367641,498357,578519,22355,2011,11,4,13,charlotte bag suki design,1,2011-11-24 13:56:00,0.85,14591,United Kingdom,0.85,Normal
307052,420362,572889,22621,2011,10,3,13,traditional knitting nancy,1,2011-10-26 13:56:00,1.65,12748,United Kingdom,1.65,VIP
141124,202919,554510,22760,2011,5,2,15,"tray, breakfast in bed",12,2011-05-24 15:51:00,12.75,17389,United Kingdom,153.0,VIP


In [80]:
orders[orders['customer_label']=='VIP']

Unnamed: 0.1,Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent,customer_label
106,106,536381,22139,2010,12,3,9,retrospot tea set ceramic 11 pc,23,2010-12-01 09:41:00,4.25,15311,United Kingdom,97.75,VIP
107,107,536381,84854,2010,12,3,9,girly pink tool set,5,2010-12-01 09:41:00,4.95,15311,United Kingdom,24.75,VIP
108,108,536381,22411,2010,12,3,9,jumbo shopper vintage red paisley,10,2010-12-01 09:41:00,1.95,15311,United Kingdom,19.50,VIP
109,109,536381,82567,2010,12,3,9,"airline lounge,metal sign",2,2010-12-01 09:41:00,2.10,15311,United Kingdom,4.20,VIP
110,110,536381,21672,2010,12,3,9,white spot red ceramic drawer knob,6,2010-12-01 09:41:00,1.25,15311,United Kingdom,7.50,VIP
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
397874,541859,581580,37500,2011,12,5,12,tea time teapot in gift box,1,2011-12-09 12:20:00,4.95,12748,United Kingdom,4.95,VIP
397880,541865,581583,20725,2011,12,5,12,lunch bag red retrospot,40,2011-12-09 12:23:00,1.45,13777,United Kingdom,58.00,VIP
397881,541866,581583,85038,2011,12,5,12,6 chocolate love heart t-lights,36,2011-12-09 12:23:00,1.85,13777,United Kingdom,66.60,VIP
397882,541867,581584,20832,2011,12,5,12,red flock love heart photo frame,72,2011-12-09 12:25:00,0.72,13777,United Kingdom,51.84,VIP


Now we'll leave it to you to solve Q2 & Q3, which you can leverage from your solution for Q1:

## Q2: How to identify which country has the most VIP Customers?

In [125]:
# your code here
country_classification = orders.groupby(['Country', 'customer_label']).agg({'amount_spent':'count'}).reset_index()
country_classification = country_classification.rename(columns={'amount_spent':'number_customers'})

country_classification

Unnamed: 0,Country,customer_label,number_customers
0,Australia,Normal,469
1,Australia,VIP,716
2,Austria,Normal,398
3,Bahrain,Normal,17
4,Belgium,Normal,2031
5,Brazil,Normal,32
6,Canada,Normal,151
7,Channel Islands,Normal,748
8,Cyprus,Normal,614
9,Czech Republic,Normal,25


In [126]:
country_classification[country_classification['customer_label']=='VIP'].max()

Country             United Kingdom
customer_label                 VIP
number_customers             34515
dtype: object

## Q3: How to identify which country has the most VIP+Preferred Customers combined?

In [129]:
# your code here
combination = country_classification[(country_classification['customer_label']!='Normal')]
combination

Unnamed: 0,Country,customer_label,number_customers
1,Australia,VIP,716
12,EIRE,VIP,7077
16,France,Pref,165
17,France,VIP,274
19,Germany,VIP,460
25,Japan,VIP,197
30,Netherlands,VIP,2080
36,Singapore,VIP,222
39,Sweden,VIP,198
44,United Kingdom,Pref,3683


In [131]:
combination.groupby(['Country']).agg({'number_customers':'sum'}).reset_index().max()

Country             United Kingdom
number_customers             38198
dtype: object