# Challenge 3

In this challenge we will work on the `Orders.csv` data set in the previous [Subsetting and Descriptive Stats lab](../../lab-subsetting-and-descriptive-stats/your-code/main.ipynb). In your work you will apply the thinking process and workflow we showed you in Challenge 2.

You are serving as a Business Intelligence Analyst at the headquarter of an international fashion goods chain store. Your boss today asked you to do two things for her:

**First, identify two groups of customers from the data set.** The first group is **VIP Customers** whose **aggregated expenses** at your global chain stores are **above the 95th percentile** (aka. 0.95 quantile). The second group is **Preferred Customers** whose **aggregated expenses** are **between the 75th and 95th percentile**.

**Second, identify which country has the most of your VIP customers, and which country has the most of your VIP+Preferred Customers combined.**

# Q1: How to identify VIP & Preferred Customers?

We start by importing all the required libraries:

In [1]:
# import required libraries

import pandas as pd
import numpy as np

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

import matplotlib
%matplotlib inline
%timeit ,  line_profiler, memory_profiler

16.5 ns ± 0.796 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)


Next, import `Orders.csv` from the "subsetting" lab folder into a dataframe variable called `orders`. Print the head of `orders` to overview the data:

In [3]:
# enter your code here
orders = pd.read_csv("Orders.csv")
orders.head()
orders.tail()

Unnamed: 0.1,Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent
0,0,536365,85123A,2010,12,3,8,white hanging heart t-light holder,6,2010-12-01 08:26:00,2.55,17850,United Kingdom,15.3
1,1,536365,71053,2010,12,3,8,white metal lantern,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34
2,2,536365,84406B,2010,12,3,8,cream cupid hearts coat hanger,8,2010-12-01 08:26:00,2.75,17850,United Kingdom,22.0
3,3,536365,84029G,2010,12,3,8,knitted union flag hot water bottle,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34
4,4,536365,84029E,2010,12,3,8,red woolly hottie white heart.,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34


Unnamed: 0.1,Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent
397919,541904,581587,22613,2011,12,5,12,pack of 20 spaceboy napkins,12,2011-12-09 12:50:00,0.85,12680,France,10.2
397920,541905,581587,22899,2011,12,5,12,children's apron dolly girl,6,2011-12-09 12:50:00,2.1,12680,France,12.6
397921,541906,581587,23254,2011,12,5,12,childrens cutlery dolly girl,4,2011-12-09 12:50:00,4.15,12680,France,16.6
397922,541907,581587,23255,2011,12,5,12,childrens cutlery circus parade,4,2011-12-09 12:50:00,4.15,12680,France,16.6
397923,541908,581587,22138,2011,12,5,12,baking set 9 piece retrospot,3,2011-12-09 12:50:00,4.95,12680,France,14.85


---

"Identify VIP and Preferred Customers" is the non-technical goal of your boss. You need to translate that goal into technical languages that data analysts use:

## How to label customers whose aggregated `amount_spent` is in a given quantile range?


We break down the main problem into several sub problems:

#### Sub Problem 1: How to aggregate the  `amount_spent` for unique customers?

#### Sub Problem 2: How to select customers whose aggregated `amount_spent` is in a given quantile range?

#### Sub Problem 3: How to label selected customers as "VIP" or "Preferred"?

*Note: If you want to break down the main problem in a different way, please feel free to revise the sub problems above.*

Now in the workspace below, tackle each of the sub problems using the iterative problem solving workflow. Insert cells as necessary to write your codes and explain your steps.

### Sub Problem 1: Aggregation amount_spent

In [11]:
# your code here
agr_customers = orders.groupby(['CustomerID'])['amount_spent'].sum()
agr_customers = agr_customers.sort_values(ascending=False)
print ("Number of Costumers:",agr_customers.count())
agr_customers.head()

Number of Costumers: 4339


CustomerID
14646    280206.02
18102    259657.30
17450    194550.79
16446    168472.50
14911    143825.06
Name: amount_spent, dtype: float64

### Sub Problem 2: upper quantile

In [12]:
best = agr_customers.get(agr_customers >= agr_customers.quantile(0.75))
print("Number of Best:",best.count())
best

NUmber of Best: 1085


CustomerID
14646    280206.020
18102    259657.300
17450    194550.790
16446    168472.500
14911    143825.060
12415    124914.530
14156    117379.630
17511     91062.380
16029     81024.840
12346     77183.600
16684     66653.560
14096     65164.790
13694     65039.620
15311     60767.900
13089     58825.830
17949     58510.480
15769     56252.720
15061     54534.140
14298     51527.300
14088     50491.810
15749     44534.300
12931     42055.960
17841     40991.570
15098     39916.500
13798     37153.850
16013     37130.600
16422     34684.400
12748     33719.730
15838     33643.080
17404     31906.820
17389     31833.680
13098     28882.440
14680     28754.110
13081     28337.380
13408     28117.040
17857     26879.040
16333     26626.800
13777     25977.160
12753     21429.390
12744     21279.290
16210     21086.300
17675     20374.280
17381     20275.610
15039     19914.440
12471     19824.050
12731     18895.910
15159     18641.010
12901     17654.540
12678     17628.460
14031    

### Sub Problem 3: tag customers

In [28]:
tags=['low','medium','prefered','VIP']

orders['BinCustomer'] = pd.qcut(orders['amount_spent'].rank(method='first').values, 4,labels=tags)

orders.head() 


Unnamed: 0.1,Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent,BinCustomer
0,0,536365,85123A,2010,12,3,8,white hanging heart t-light holder,6,2010-12-01 08:26:00,2.55,17850,United Kingdom,15.3,prefered
1,1,536365,71053,2010,12,3,8,white metal lantern,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34,VIP
2,2,536365,84406B,2010,12,3,8,cream cupid hearts coat hanger,8,2010-12-01 08:26:00,2.75,17850,United Kingdom,22.0,VIP
3,3,536365,84029G,2010,12,3,8,knitted union flag hot water bottle,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34,VIP
4,4,536365,84029E,2010,12,3,8,red woolly hottie white heart.,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34,VIP


Now we'll leave it to you to solve Q2 & Q3, which you can leverage from your solution for Q1:

# Q2: How to identify which country has the most VIP Customers?

# Q3: How to identify which country has the most VIP+Preferred Customers combined?

Provide your solution for Q2 below:

# Q2

In [42]:
# your code here

vips = orders.get(orders['BinCustomer']=='VIP')

vips_country = vips.groupby(['BinCustomer','Country'])['CustomerID'].agg('count')
vips_country.sort_values(ascending=False)

BinCustomer  Country             
VIP          United Kingdom          80891
             Germany                  3386
             France                   3119
             EIRE                     3016
             Netherlands              1940
             Switzerland               865
             Australia                 855
             Belgium                   704
             Spain                     673
             Norway                    546
             Portugal                  492
             Channel Islands           331
             Finland                   322
             Italy                     308
             Sweden                    282
             Japan                     241
             Denmark                   231
             Cyprus                    212
             Singapore                 155
             Austria                   144
             Poland                    132
             Israel                    121
             Iceland

# Q3

In [45]:
vips_pref = orders.get((orders['BinCustomer']=='VIP') | (orders['BinCustomer']=='prefered'))
vips_pref.head()
vips_pref.tail()

Unnamed: 0.1,Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent,BinCustomer
0,0,536365,85123A,2010,12,3,8,white hanging heart t-light holder,6,2010-12-01 08:26:00,2.55,17850,United Kingdom,15.3,prefered
1,1,536365,71053,2010,12,3,8,white metal lantern,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34,VIP
2,2,536365,84406B,2010,12,3,8,cream cupid hearts coat hanger,8,2010-12-01 08:26:00,2.75,17850,United Kingdom,22.0,VIP
3,3,536365,84029G,2010,12,3,8,knitted union flag hot water bottle,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34,VIP
4,4,536365,84029E,2010,12,3,8,red woolly hottie white heart.,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34,VIP


Unnamed: 0.1,Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent,BinCustomer
397918,541903,581587,23256,2011,12,5,12,childrens cutlery spaceboy,4,2011-12-09 12:50:00,4.15,12680,France,16.6,prefered
397920,541905,581587,22899,2011,12,5,12,children's apron dolly girl,6,2011-12-09 12:50:00,2.1,12680,France,12.6,prefered
397921,541906,581587,23254,2011,12,5,12,childrens cutlery dolly girl,4,2011-12-09 12:50:00,4.15,12680,France,16.6,prefered
397922,541907,581587,23255,2011,12,5,12,childrens cutlery circus parade,4,2011-12-09 12:50:00,4.15,12680,France,16.6,prefered
397923,541908,581587,22138,2011,12,5,12,baking set 9 piece retrospot,3,2011-12-09 12:50:00,4.95,12680,France,14.85,prefered


In [46]:
vips_p_country = vips_pref.groupby(['Country'])['CustomerID'].agg('count')
vips_p_country.sort_values(ascending=False)

Country
United Kingdom          164491
Germany                   7061
France                    6289
EIRE                      5938
Netherlands               2105
Spain                     1648
Belgium                   1615
Switzerland               1497
Australia                 1072
Portugal                  1010
Norway                     938
Channel Islands            667
Italy                      659
Finland                    567
Cyprus                     431
Sweden                     392
Austria                    319
Denmark                    312
Poland                     284
Japan                      257
Singapore                  204
Israel                     176
Iceland                    160
USA                        138
Greece                     118
Canada                     111
Malta                       94
Unspecified                 90
United Arab Emirates        56
European Community          56
RSA                         49
Lebanon                     44
