# Challenge 3

In this challenge we will work on the `Orders.csv` data set in the previous [Subsetting and Descriptive Stats lab](../../lab-subsetting-and-descriptive-stats/your-code/main.ipynb). In your work you will apply the thinking process and workflow we showed you in Challenge 2.

You are serving as a Business Intelligence Analyst at the headquarter of an international fashion goods chain store. Your boss today asked you to do two things for her:

**First, identify two groups of customers from the data set.** The first group is **VIP Customers** whose **aggregated expenses** at your global chain stores are **above the 95th percentile** (aka. 0.95 quantile). The second group is **Preferred Customers** whose **aggregated expenses** are **between the 75th and 95th percentile**.

**Second, identify which country has the most of your VIP customers, and which country has the most of your VIP+Preferred Customers combined.**

# Q1: How to identify VIP & Preferred Customers?

We start by importing all the required libraries:

In [1]:
# import required libraries
import numpy as np
import pandas as pd

Next, import `Orders.csv` from the "subsetting" lab folder into a dataframe variable called `orders`. Print the head of `orders` to overview the data:

In [2]:
# enter your code here
orders = pd.read_csv("../../lab-subsetting-and-descriptive-stats/your-code/Orders.csv")
orders.head()

Unnamed: 0.1,Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent
0,0,536365,85123A,2010,12,3,8,white hanging heart t-light holder,6,2010-12-01 08:26:00,2.55,17850,United Kingdom,15.3
1,1,536365,71053,2010,12,3,8,white metal lantern,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34
2,2,536365,84406B,2010,12,3,8,cream cupid hearts coat hanger,8,2010-12-01 08:26:00,2.75,17850,United Kingdom,22.0
3,3,536365,84029G,2010,12,3,8,knitted union flag hot water bottle,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34
4,4,536365,84029E,2010,12,3,8,red woolly hottie white heart.,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34


---

"Identify VIP and Preferred Customers" is the non-technical goal of your boss. You need to translate that goal into technical languages that data analysts use:

## How to label customers whose aggregated `amount_spent` is in a given quantile range?


We break down the main problem into several sub problems:

#### Sub Problem 1: How to aggregate the  `amount_spent` for unique customers?

#### Sub Problem 2: How to select customers whose aggregated `amount_spent` is in a given quantile range?

#### Sub Problem 3: How to label selected customers as "VIP" or "Preferred"?

*Note: If you want to break down the main problem in a different way, please feel free to revise the sub problems above.*

Now in the workspace below, tackle each of the sub problems using the iterative problem solving workflow. Insert cells as necessary to write your codes and explain your steps.

In [3]:
#Subproblem1 How to aggregate the amount_spent for unique customers?
agg_exp = pd.DataFrame(orders.groupby('CustomerID')['amount_spent'].sum().sort_values(ascending=False))
#print(len(agg_exp))
#agg_exp.head()
agg_exp

Unnamed: 0_level_0,amount_spent
CustomerID,Unnamed: 1_level_1
14646,280206.02
18102,259657.30
17450,194550.79
16446,168472.50
14911,143825.06
12415,124914.53
14156,117379.63
17511,91062.38
16029,81024.84
12346,77183.60


In [4]:
#Subproblem2 How to select customers whose aggregated amount_spent is in a given quantile range: percentil 80.
agg_exp_80 = agg_exp.quantile([.80])
agg_exp_80
#print(len(agg_exp_50a75))
print("Los clientes que gastaron 2057.914 o más están en el percentil 80")

VIP = agg_exp[agg_exp["amount_spent"] >= 2057.914]
VIP = pd.DataFrame(VIP)
VIP

Los clientes que gastaron 2057.914 o más están en el percentil 80


Unnamed: 0_level_0,amount_spent
CustomerID,Unnamed: 1_level_1
14646,280206.02
18102,259657.30
17450,194550.79
16446,168472.50
14911,143825.06
12415,124914.53
14156,117379.63
17511,91062.38
16029,81024.84
12346,77183.60


In [5]:
#Preferred. Serán los que gastaron más de percentil 50%
agg_exp_50 = agg_exp.quantile([.50])
print(f"Los clientes preferidos son los que gastaron hasta 674.45")

Preferred = agg_exp[(agg_exp["amount_spent"] >= 674.45) & (agg_exp["amount_spent"] < 2057.914)]
Preferred = pd.DataFrame(Preferred)
Preferred


Los clientes preferidos son los que gastaron hasta 674.45


Unnamed: 0_level_0,amount_spent
CustomerID,Unnamed: 1_level_1
12518,2056.89
17667,2055.51
14188,2054.36
16150,2053.02
17086,2050.08
17612,2048.45
14232,2048.07
17625,2047.00
13495,2044.87
17597,2044.37


In [71]:
#Sub Problem 3: How to label selected customers as "VIP" or "Preferred"?
#Respuesta: se incluirá una columna en el data frame original que califique a los clientes incluidos en el porcentil
#80 como VIP

VIP.columns #Me está tomando "CustomerID" como index, por lo que no lo solucion

#Tenemos que extraer los index
Index_VIP = list(VIP.index)
Index_Preferred = list(Preferred.index)

In [65]:
#Crear columna en order si el valor del indice de vip o preferred está en la lista

#INTENTO 1
#orders['Status']='Preferred' #creamos una columna que tendrá por default el valor "VIP"
#orders.loc[orders.CustomerID.isin('Index_VIP'), 'Status']='VIP'

#INTENTO 2
#Esto funciona, pero te cambia a VIP y el resto a preferido. 
orders["Tipo_cliente"] = np.where(orders["CustomerID"].isin(Index_VIP), "VIP", "NORMAL") 
orders["Tipo_cliente"] = np.where(orders["CustomerID"].isin(Index_Preferred), "PREFERIDO", orders["Tipo_cliente"])

In [66]:
orders["Tipo_cliente"].value_counts()

VIP          231251
PREFERIDO    107869
NORMAL        58804
Name: Tipo_cliente, dtype: int64

In [72]:
orders.head()

Unnamed: 0.1,Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent,Tipo_cliente
0,0,536365,85123A,2010,12,3,8,white hanging heart t-light holder,6,2010-12-01 08:26:00,2.55,17850,United Kingdom,15.3,VIP
1,1,536365,71053,2010,12,3,8,white metal lantern,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34,VIP
2,2,536365,84406B,2010,12,3,8,cream cupid hearts coat hanger,8,2010-12-01 08:26:00,2.75,17850,United Kingdom,22.0,VIP
3,3,536365,84029G,2010,12,3,8,knitted union flag hot water bottle,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34,VIP
4,4,536365,84029E,2010,12,3,8,red woolly hottie white heart.,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34,VIP


In [69]:
#APUNTOS DE RAMIRO. IGNORAR :)

#Después queda modificar la variable "VIP_Preferido" a normal para los que gastaron menos de 674.45
#orders["Tipo_cliente"] = np.where(orders["CustomerID"].isin(Index_Preferred), "PREFERIDO", "NORMAL")

#for x in orders["Tipo_cliente"] == "PREFERIDO":
#    if x in Index_Preferred:
#        orders["Tipo_cliente"] == "PREFERIDO"
#    else:
#        orders["Tipo_cliente"] == "NORMAL"

#np.where(consumption_energy > 400, 'high', 
#         (np.where(consumption_energy < 200, 'low', 'medium')))

#orders["Tipo_cliente"] = np.where(orders["Tipo_cliente"]=="PREFERIDO", np.where(orders["CustomerID"].isin(Index_Preferred), "PREFERIDO", "NORMAL"))
#dists[(np.where((dists >= r) & (dists <= r + dr)))]

#orders["Tipo_cliente"] = np.where((orders["Tipo_cliente"]=="PREFERIDO") & (orders["CustomerID"].isin(Index_Preferred), "PREFERIDO", "NORMAL"))

#x['Continent'] = np.where(x['Country'].isin(europe), 'Europe', 'Not Europe')

#EJEMPLO
#cols = ['Name','Country','Income']
#europe = ['UK','France']
#x['New Column']='Not Europe'
#x.loc[x.Country.isin(europe),'New Column']='Europe'

Now we'll leave it to you to solve Q2 & Q3, which you can leverage from your solution for Q1:

# Q2: How to identify which country has the most VIP Customers?

# Q3: How to identify which country has the most VIP+Preferred Customers combined?

Provide your solution for Q2 below:

In [67]:
#Q2: How to identify which country has the most VIP Customers?
print("UK tiene la mayor cantidad de clientes VIP")
orders.groupby(["Tipo_cliente", "Country"]).size().sort_values(ascending=False)

UK tiene la mayor cantidad de clientes VIP


Tipo_cliente  Country             
VIP           United Kingdom          198270
PREFERIDO     United Kingdom          100775
NORMAL        United Kingdom           55300
VIP           EIRE                      7238
              Germany                   6806
              France                    6112
              Netherlands               2080
              Spain                     1569
PREFERIDO     Germany                   1469
VIP           Switzerland               1307
PREFERIDO     France                    1301
VIP           Belgium                   1256
              Portugal                  1024
              Australia                  998
              Norway                     941
NORMAL        France                     929
              Germany                    767
PREFERIDO     Belgium                    665
              Spain                      659
VIP           Channel Islands            492
              Cyprus                     451
PREFERIDO     Switze

In [73]:
#Q3: How to identify which country has the most VIP+Preferred Customers combined?

print("El paíse que tiene mayor cantidad de clientes VIP y PREFERIDOS es UK")
Paises_VIP_PREF = orders[orders["Tipo_cliente"] != "NORMAL"]
Paises_VIP_PREF["Country"].value_counts()

El paíse que tiene mayor cantidad de clientes VIP y PREFERIDOS es UK


United Kingdom          299045
Germany                   8275
France                    7413
EIRE                      7238
Netherlands               2291
Spain                     2228
Belgium                   1921
Switzerland               1751
Portugal                  1229
Australia                 1163
Norway                    1028
Channel Islands            731
Italy                      650
Finland                    568
Cyprus                     451
Austria                    362
Sweden                     359
Denmark                    336
Japan                      284
Poland                     284
Singapore                  222
Israel                     214
Iceland                    182
USA                        146
Canada                     135
Greece                     123
Malta                      112
United Arab Emirates        68
European Community          60
RSA                         58
Unspecified                 56
Lebanon                     45
Lithuani