# Retail Lab (Advanced Data Filtering)

**Learning Objectives:**
  * Apply advanced data filtering techniques
  * Gain exposure to retail related DataSets

## Context of the datasets

### 1. There are three datasets: `articles.csv.zip`, `customers.csv.zip` and `transactions2020.csv.zip`

#### 2. The Articles dataset contains information over products available.
#### 3. The Customers dataset contains information over registered customers.
#### 4. The Transactions dataset contains purchases of articles made by customers.



## 1. Library Import

In [1]:
import pandas as pd
import warnings
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt

In [2]:
warnings.simplefilter('ignore')

## 2. Data loading and DataFrame creation

In [3]:
Articles=pd.read_csv("https://github.com/thousandoaks/Python4DS-I/raw/main/datasets/articles.csv.zip")

In [4]:
Articles.head(3)

Unnamed: 0,article_id,product_code,prod_name,product_type_no,product_type_name,product_group_name,graphical_appearance_no,graphical_appearance_name,colour_group_code,colour_group_name,...,department_name,index_code,index_name,index_group_no,index_group_name,section_no,section_name,garment_group_no,garment_group_name,detail_desc
0,108775015,108775,Strap top,253,Vest top,Garment Upper body,1010016,Solid,9,Black,...,Jersey Basic,A,Ladieswear,1,Ladieswear,16,Womens Everyday Basics,1002,Jersey Basic,Jersey top with narrow shoulder straps.
1,108775044,108775,Strap top,253,Vest top,Garment Upper body,1010016,Solid,10,White,...,Jersey Basic,A,Ladieswear,1,Ladieswear,16,Womens Everyday Basics,1002,Jersey Basic,Jersey top with narrow shoulder straps.
2,108775051,108775,Strap top (1),253,Vest top,Garment Upper body,1010017,Stripe,11,Off White,...,Jersey Basic,A,Ladieswear,1,Ladieswear,16,Womens Everyday Basics,1002,Jersey Basic,Jersey top with narrow shoulder straps.


In [5]:
Articles.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 105542 entries, 0 to 105541
Data columns (total 25 columns):
 #   Column                        Non-Null Count   Dtype 
---  ------                        --------------   ----- 
 0   article_id                    105542 non-null  int64 
 1   product_code                  105542 non-null  int64 
 2   prod_name                     105542 non-null  object
 3   product_type_no               105542 non-null  int64 
 4   product_type_name             105542 non-null  object
 5   product_group_name            105542 non-null  object
 6   graphical_appearance_no       105542 non-null  int64 
 7   graphical_appearance_name     105542 non-null  object
 8   colour_group_code             105542 non-null  int64 
 9   colour_group_name             105542 non-null  object
 10  perceived_colour_value_id     105542 non-null  int64 
 11  perceived_colour_value_name   105542 non-null  object
 12  perceived_colour_master_id    105542 non-null  int64 
 13 

In [6]:
Customers=pd.read_csv("https://github.com/thousandoaks/Python4DS-I/raw/main/datasets/customers.csv.zip")

In [7]:
Customers.sample(3)

Unnamed: 0,customer_id,FN,Active,club_member_status,fashion_news_frequency,age,postal_code
582651,6cb6dc8307e1fd628576e350d7e8893bb65e85c2aedac1...,,,ACTIVE,NONE,24.0,9e155b7f32a3669881860d90edd7d56cf2c8f29bd587c0...
1081068,c9b9210dfb750ce323c747360b00c5a2d6cd1adeeb6b51...,1.0,1.0,ACTIVE,Regularly,18.0,9200292167ef7b206d121b1fc2df5da2fa192ef94ebb4e...
476941,5913fcd3fc585514680a4dbd937829cf611d845008353e...,,,ACTIVE,NONE,27.0,ad2004a0d2d52191396c29fe328eadbe9df079164a4212...


In [8]:
Customers.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1371980 entries, 0 to 1371979
Data columns (total 7 columns):
 #   Column                  Non-Null Count    Dtype  
---  ------                  --------------    -----  
 0   customer_id             1371980 non-null  object 
 1   FN                      476930 non-null   float64
 2   Active                  464404 non-null   float64
 3   club_member_status      1365918 non-null  object 
 4   fashion_news_frequency  1355969 non-null  object 
 5   age                     1356119 non-null  float64
 6   postal_code             1371980 non-null  object 
dtypes: float64(3), object(4)
memory usage: 73.3+ MB


In [9]:
Transactions=pd.read_csv("https://github.com/thousandoaks/Python4DS-I/raw/main/datasets/transactions2020.csv.zip",parse_dates=['t_dat'])

In [10]:
Transactions.sample(3)

Unnamed: 0,t_dat,customer_id,article_id,price,sales_channel_id
4378112,2020-09-01,a77a87aabb87d181f22e09ff9c6005f548fdcdaa36c271...,850446001,0.015237,1
1959370,2020-07-04,9b739a9bd2d3672f88188cdd2a237c64b89060cb94f81b...,658298007,0.022017,1
2991681,2020-07-29,4943f64e939f5a2f2b8f942d2f60bc97e1b5e2f8d918cc...,855080011,0.025407,2


In [11]:
Transactions.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5151470 entries, 0 to 5151469
Data columns (total 5 columns):
 #   Column            Dtype         
---  ------            -----         
 0   t_dat             datetime64[ns]
 1   customer_id       object        
 2   article_id        int64         
 3   price             float64       
 4   sales_channel_id  int64         
dtypes: datetime64[ns](1), float64(1), int64(2), object(1)
memory usage: 196.5+ MB


## 3. Merging DataFrames

#### 3.1. Transactions-Articles


In [12]:
Transactions.head(3)

Unnamed: 0,t_dat,customer_id,article_id,price,sales_channel_id
0,2020-06-01,00075ef36696a7b4ed8c83e22a4bf7ea7c90ee110991ec...,844198001,0.016932,2
1,2020-06-01,000b31552d3785c79833262bbeefa484cbc43d7b612b3c...,777016001,0.030492,1
2,2020-06-01,002d8d26c9414c981c012c6f5e4b2de7ffd3bc568c4574...,820507001,0.010153,2


In [13]:
Articles.head(3)

Unnamed: 0,article_id,product_code,prod_name,product_type_no,product_type_name,product_group_name,graphical_appearance_no,graphical_appearance_name,colour_group_code,colour_group_name,...,department_name,index_code,index_name,index_group_no,index_group_name,section_no,section_name,garment_group_no,garment_group_name,detail_desc
0,108775015,108775,Strap top,253,Vest top,Garment Upper body,1010016,Solid,9,Black,...,Jersey Basic,A,Ladieswear,1,Ladieswear,16,Womens Everyday Basics,1002,Jersey Basic,Jersey top with narrow shoulder straps.
1,108775044,108775,Strap top,253,Vest top,Garment Upper body,1010016,Solid,10,White,...,Jersey Basic,A,Ladieswear,1,Ladieswear,16,Womens Everyday Basics,1002,Jersey Basic,Jersey top with narrow shoulder straps.
2,108775051,108775,Strap top (1),253,Vest top,Garment Upper body,1010017,Stripe,11,Off White,...,Jersey Basic,A,Ladieswear,1,Ladieswear,16,Womens Everyday Basics,1002,Jersey Basic,Jersey top with narrow shoulder straps.


In [14]:
## we merge both DataFrames using the common key: article_id. We store the result in a new DataFrame
TransactionsAndArticles=pd.merge(Transactions, Articles, how='left',on='article_id')

#### 3.2. Transactions-Articles-Customers

In [15]:
TransactionsAndArticles.head(3)

Unnamed: 0,t_dat,customer_id,article_id,price,sales_channel_id,product_code,prod_name,product_type_no,product_type_name,product_group_name,...,department_name,index_code,index_name,index_group_no,index_group_name,section_no,section_name,garment_group_no,garment_group_name,detail_desc
0,2020-06-01,00075ef36696a7b4ed8c83e22a4bf7ea7c90ee110991ec...,844198001,0.016932,2,844198,Saturn trs (J),296,Pyjama bottom,Nightwear,...,Nightwear,B,Lingeries/Tights,1,Ladieswear,62,"Womens Nightwear, Socks & Tigh",1017,"Under-, Nightwear",Pyjama bottoms in sweatshirt fabric with wide ...
1,2020-06-01,000b31552d3785c79833262bbeefa484cbc43d7b612b3c...,777016001,0.030492,1,777016,Cisco skirt,275,Skirt,Garment Lower body,...,Trousers & Skirt,A,Ladieswear,1,Ladieswear,18,Womens Trend,1009,Trousers,"Calf-length skirt in softly draping, patterned..."
2,2020-06-01,002d8d26c9414c981c012c6f5e4b2de7ffd3bc568c4574...,820507001,0.010153,2,820507,Charlotte Hipster Primula,286,Underwear bottom,Underwear,...,Expressive Lingerie,B,Lingeries/Tights,1,Ladieswear,61,Womens Lingerie,1017,"Under-, Nightwear","Hipster briefs in lace with a mid waist, lined..."


In [16]:
Customers.head(3)

Unnamed: 0,customer_id,FN,Active,club_member_status,fashion_news_frequency,age,postal_code
0,00000dbacae5abe5e23885899a1fa44253a17956c6d1c3...,,,ACTIVE,NONE,49.0,52043ee2162cf5aa7ee79974281641c6f11a68d276429a...
1,0000423b00ade91418cceaf3b26c6af3dd342b51fd051e...,,,ACTIVE,NONE,25.0,2973abc54daa8a5f8ccfe9362140c63247c5eee03f1d93...
2,000058a12d5b43e67d225668fa1f8d618c13dc232df0ca...,,,ACTIVE,NONE,24.0,64f17e6a330a85798e4998f62d0930d14db8db1c054af6...


In [17]:
## we merge both DataFrames using the common key: customer_id. We store the result in a new DataFrame
TransactionsAndArticlesAndCustomers=pd.merge(TransactionsAndArticles, Customers, how='left',on='customer_id')

In [18]:
TransactionsAndArticlesAndCustomers.head(3)

Unnamed: 0,t_dat,customer_id,article_id,price,sales_channel_id,product_code,prod_name,product_type_no,product_type_name,product_group_name,...,section_name,garment_group_no,garment_group_name,detail_desc,FN,Active,club_member_status,fashion_news_frequency,age,postal_code
0,2020-06-01,00075ef36696a7b4ed8c83e22a4bf7ea7c90ee110991ec...,844198001,0.016932,2,844198,Saturn trs (J),296,Pyjama bottom,Nightwear,...,"Womens Nightwear, Socks & Tigh",1017,"Under-, Nightwear",Pyjama bottoms in sweatshirt fabric with wide ...,,,ACTIVE,NONE,40.0,0c0e15f8fa88a1d4aa6ca8a0b4a8289ca1affbaebdea22...
1,2020-06-01,000b31552d3785c79833262bbeefa484cbc43d7b612b3c...,777016001,0.030492,1,777016,Cisco skirt,275,Skirt,Garment Lower body,...,Womens Trend,1009,Trousers,"Calf-length skirt in softly draping, patterned...",1.0,1.0,ACTIVE,Regularly,59.0,2c29ae653a9282cce4151bd87643c907644e09541abc28...
2,2020-06-01,002d8d26c9414c981c012c6f5e4b2de7ffd3bc568c4574...,820507001,0.010153,2,820507,Charlotte Hipster Primula,286,Underwear bottom,Underwear,...,Womens Lingerie,1017,"Under-, Nightwear","Hipster briefs in lace with a mid waist, lined...",,,ACTIVE,NONE,23.0,8d4ceb946237cf52ce5c2a1a71d1221fde77627a52d661...


In [19]:
TransactionsAndArticlesAndCustomers.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5151470 entries, 0 to 5151469
Data columns (total 35 columns):
 #   Column                        Dtype         
---  ------                        -----         
 0   t_dat                         datetime64[ns]
 1   customer_id                   object        
 2   article_id                    int64         
 3   price                         float64       
 4   sales_channel_id              int64         
 5   product_code                  int64         
 6   prod_name                     object        
 7   product_type_no               int64         
 8   product_type_name             object        
 9   product_group_name            object        
 10  graphical_appearance_no       int64         
 11  graphical_appearance_name     object        
 12  colour_group_code             int64         
 13  colour_group_name             object        
 14  perceived_colour_value_id     int64         
 15  perceived_colour_value_name   ob

In [20]:
# we cast t_dat as a dateTime object

TransactionsAndArticlesAndCustomers['t_dat'] = pd.to_datetime(TransactionsAndArticlesAndCustomers['t_dat'])

In [21]:
TransactionsAndArticlesAndCustomers

Unnamed: 0,t_dat,customer_id,article_id,price,sales_channel_id,product_code,prod_name,product_type_no,product_type_name,product_group_name,...,section_name,garment_group_no,garment_group_name,detail_desc,FN,Active,club_member_status,fashion_news_frequency,age,postal_code
0,2020-06-01,00075ef36696a7b4ed8c83e22a4bf7ea7c90ee110991ec...,844198001,0.016932,2,844198,Saturn trs (J),296,Pyjama bottom,Nightwear,...,"Womens Nightwear, Socks & Tigh",1017,"Under-, Nightwear",Pyjama bottoms in sweatshirt fabric with wide ...,,,ACTIVE,NONE,40.0,0c0e15f8fa88a1d4aa6ca8a0b4a8289ca1affbaebdea22...
1,2020-06-01,000b31552d3785c79833262bbeefa484cbc43d7b612b3c...,777016001,0.030492,1,777016,Cisco skirt,275,Skirt,Garment Lower body,...,Womens Trend,1009,Trousers,"Calf-length skirt in softly draping, patterned...",1.0,1.0,ACTIVE,Regularly,59.0,2c29ae653a9282cce4151bd87643c907644e09541abc28...
2,2020-06-01,002d8d26c9414c981c012c6f5e4b2de7ffd3bc568c4574...,820507001,0.010153,2,820507,Charlotte Hipster Primula,286,Underwear bottom,Underwear,...,Womens Lingerie,1017,"Under-, Nightwear","Hipster briefs in lace with a mid waist, lined...",,,ACTIVE,NONE,23.0,8d4ceb946237cf52ce5c2a1a71d1221fde77627a52d661...
3,2020-06-01,002d8d26c9414c981c012c6f5e4b2de7ffd3bc568c4574...,869811005,0.016932,2,869811,OLEANDER LINEN STRAP TOP,253,Vest top,Garment Upper body,...,Womens Casual,1005,Jersey Fancy,Top in slub jersey made from a viscose and lin...,,,ACTIVE,NONE,23.0,8d4ceb946237cf52ce5c2a1a71d1221fde77627a52d661...
4,2020-06-01,002d8d26c9414c981c012c6f5e4b2de7ffd3bc568c4574...,823118004,0.025407,2,823118,Ginger Top,298,Bikini top,Swimwear,...,"Womens Swimwear, beachwear",1018,Swimwear,"Fully lined, non-wired bikini top with adjusta...",,,ACTIVE,NONE,23.0,8d4ceb946237cf52ce5c2a1a71d1221fde77627a52d661...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5151465,2020-09-22,fff2282977442e327b45d8c89afde25617d00124d0f999...,929511001,0.059305,2,929511,POPPY PU SHIRT DRESS,265,Dress,Garment Full body,...,Divided Collection,1013,Dresses Ladies,Short shirt dress in soft imitation leather wi...,1.0,1.0,ACTIVE,Regularly,32.0,2695d7727a61ed8011f93de47dc9445017596302bd0592...
5151466,2020-09-22,fff2282977442e327b45d8c89afde25617d00124d0f999...,891322004,0.042356,2,891322,FENNEL SHIRT DRESS,-1,Unknown,Unknown,...,Divided Collection,1013,Dresses Ladies,Short shirt dress in a cotton weave with a col...,1.0,1.0,ACTIVE,Regularly,32.0,2695d7727a61ed8011f93de47dc9445017596302bd0592...
5151467,2020-09-22,fff380805474b287b05cb2a7507b9a013482f7dd0bce0e...,918325001,0.043203,1,918325,Winter shopper,66,Bag,Accessories,...,Womens Big accessories,1019,Accessories,"Lightly padded, quilted shopper in a recycled ...",,,ACTIVE,NONE,67.0,a9c9c4db44316f6e62ea17ba5e8b84c1ec3ebeddb3f299...
5151468,2020-09-22,fff4d3a8b1f3b60af93e78c30a7cb4cf75edaf2590d3e5...,833459002,0.006763,1,833459,Class Aligator Ring Pack,79,Ring,Accessories,...,Womens Small accessories,1019,Accessories,Thin metal rings in various designs.,1.0,1.0,ACTIVE,Regularly,21.0,3737324e2574c3bde9ef00336bc767781dbed7e828d51a...


## Let's select customers who bought using sales channel 2 younger than 25

In [29]:
TransactionsAndArticlesAndCustomers[(TransactionsAndArticlesAndCustomers['sales_channel_id']==2)&(TransactionsAndArticlesAndCustomers['age']<25)]

Unnamed: 0,t_dat,customer_id,article_id,price,sales_channel_id,product_code,prod_name,product_type_no,product_type_name,product_group_name,...,section_name,garment_group_no,garment_group_name,detail_desc,FN,Active,club_member_status,fashion_news_frequency,age,postal_code
88,2020-06-01,009fadd7d8f7728e20f79989ff9417472600aa1d78ccb7...,877014008,0.027102,1,877014,Kiwi,265,Dress,Garment Full body,...,Womens Everyday Collection,1013,Dresses Ladies,"Calf-length, A-line dress in an airy weave. V-...",1.0,1.0,ACTIVE,Regularly,22.0,57668778b1b828d18279ab2b92b053f7ed9fd3e7f2509a...
138,2020-06-01,00ff5a441b177aa71f151d322c39c33a3893d10ee7b258...,814230002,0.025407,1,814230,Farsta cardigan,245,Cardigan,Garment Upper body,...,"Womens Nightwear, Socks & Tigh",1017,"Under-, Nightwear","Cardigan in a soft, fine rib knit with a V-nec...",,,ACTIVE,NONE,18.0,dcf777331baa3651e9ae09211f035fb8920cfe0d2a031b...
139,2020-06-01,00ff5a441b177aa71f151d322c39c33a3893d10ee7b258...,857271001,0.050831,1,857271,SF Salma dress,265,Dress,Garment Full body,...,Womens Everyday Collection,1023,Special Offers,"Short, fitted dress in an organic cotton weave...",,,ACTIVE,NONE,18.0,dcf777331baa3651e9ae09211f035fb8920cfe0d2a031b...
160,2020-06-01,01261383a410c34c2d58465149c2f512284073af09180f...,876053002,0.015237,1,876053,Ruby,258,Blouse,Garment Upper body,...,Womens Everyday Collection,1010,Blouses,Blouse in a crêpe weave with a sweetheart neck...,1.0,1.0,ACTIVE,Regularly,20.0,da86a3203f54b43378bbd797e5eff6be6d449ee69e8ab2...
165,2020-06-01,013b0a66e02db1b92989bb57df6ddbbdb6286ce480e190...,664074059,0.016932,1,664074,Charlie Top,254,Top,Garment Upper body,...,Womens Tailoring,1005,Jersey Fancy,Straight-cut top in airy jersey crêpe with a b...,,,ACTIVE,NONE,21.0,4e0f8d9dea2abbf49af9667b9c7cc04bf9692cf535f98f...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5151424,2020-09-22,ff6bc0a652d313d04a8c5852c7e740b9e351b3f9420e2e...,866714016,0.016932,1,866714,Illiana top,254,Top,Garment Upper body,...,Womens Everyday Collection,1005,Jersey Fancy,Round-necked top in soft cotton jersey. Short ...,1.0,1.0,ACTIVE,Regularly,22.0,70322bfc8f931e43a4d2e600dffafcc9d470b927e0926c...
5151425,2020-09-22,ff6bc0a652d313d04a8c5852c7e740b9e351b3f9420e2e...,922625005,0.025407,1,922625,SPEED SISSY SMOCK BLOUSE,258,Blouse,Garment Upper body,...,Divided Collection,1010,Blouses,Short blouse in woven fabric with a small fril...,1.0,1.0,ACTIVE,Regularly,22.0,70322bfc8f931e43a4d2e600dffafcc9d470b927e0926c...
5151435,2020-09-22,ff813df6887c2a6d7065aed247bf1db3d6f629eee23798...,907702003,0.016932,1,907702,Tanga LS,254,Top,Garment Upper body,...,Divided Collection,1005,Jersey Fancy,Long-sleeved top in ribbed jersey with a round...,1.0,1.0,ACTIVE,Regularly,23.0,5e7aad7a13b389ef920bea1d7d8bbc13329abb96db22b4...
5151436,2020-09-22,ff813df6887c2a6d7065aed247bf1db3d6f629eee23798...,806388002,0.013542,1,806388,Therese tee,255,T-shirt,Garment Upper body,...,Divided Basics,1002,Jersey Basic,Wide T-shirt in soft cotton jersey with a ribb...,1.0,1.0,ACTIVE,Regularly,23.0,5e7aad7a13b389ef920bea1d7d8bbc13329abb96db22b4...


## Let's compute total sales for the  section 'Contemporary Smart'

In [36]:
TransactionsAndArticlesAndCustomers[TransactionsAndArticlesAndCustomers['section_name']=='Contemporary Smart']['price'].sum()

1240.6871186440653

## Let's compute total sales for the  section 'Contemporary Smart' in channel 1

In [38]:
TransactionsAndArticlesAndCustomers[(TransactionsAndArticlesAndCustomers['section_name']=='Contemporary Smart')&(TransactionsAndArticlesAndCustomers['sales_channel_id']==1)]['price'].sum()

644.1384237288123

## Let's find customers who have spent more than 5 dollars in total, younger than 22 and having an active club member status

In [39]:
TransactionsAndArticlesAndCustomers.head(3)

Unnamed: 0,t_dat,customer_id,article_id,price,sales_channel_id,product_code,prod_name,product_type_no,product_type_name,product_group_name,...,section_name,garment_group_no,garment_group_name,detail_desc,FN,Active,club_member_status,fashion_news_frequency,age,postal_code
0,2020-06-01,00075ef36696a7b4ed8c83e22a4bf7ea7c90ee110991ec...,844198001,0.016932,2,844198,Saturn trs (J),296,Pyjama bottom,Nightwear,...,"Womens Nightwear, Socks & Tigh",1017,"Under-, Nightwear",Pyjama bottoms in sweatshirt fabric with wide ...,,,ACTIVE,NONE,40.0,0c0e15f8fa88a1d4aa6ca8a0b4a8289ca1affbaebdea22...
1,2020-06-01,000b31552d3785c79833262bbeefa484cbc43d7b612b3c...,777016001,0.030492,1,777016,Cisco skirt,275,Skirt,Garment Lower body,...,Womens Trend,1009,Trousers,"Calf-length skirt in softly draping, patterned...",1.0,1.0,ACTIVE,Regularly,59.0,2c29ae653a9282cce4151bd87643c907644e09541abc28...
2,2020-06-01,002d8d26c9414c981c012c6f5e4b2de7ffd3bc568c4574...,820507001,0.010153,2,820507,Charlotte Hipster Primula,286,Underwear bottom,Underwear,...,Womens Lingerie,1017,"Under-, Nightwear","Hipster briefs in lace with a mid waist, lined...",,,ACTIVE,NONE,23.0,8d4ceb946237cf52ce5c2a1a71d1221fde77627a52d661...


In [42]:
TransactionsAndArticlesAndCustomers.groupby('customer_id')['price'].sum().sort_values(ascending=False)

Unnamed: 0_level_0,price
customer_id,Unnamed: 1_level_1
863f0e03da282ae32a76775ce55d8a4605a85c84a26066e1ad0e9469e8c40e68,18.907119
b637a3e7d8b0caa947aaefd609b8d84a9ee962cf0a52a51bac507ffc2bf1b741,14.198542
77db96923d20d40532eba0020b55cd91eb51358885c2d698a2805e79481f64a1,12.672356
a3ab708684132c6bbd3dad7aa41f9b9c7d1c95d7d5cb1a3a052905191e858566,11.934508
60c8dfc36653461f03d6001b77e7cf6182cf2d71f914c9295f7287126ae8da32,11.912068
...,...
eb2a909b3bcbf655dd65768b1f1f376fb0b7440b9ea2679c4c7c6e4f3ae69db7,0.001000
8e3747991dc8271595c759c8f9ce23c91ebd6622e96d9366187d7b157956c00b,0.000847
b6345f8bf3fdb4a9833a812e11aaebfd6e2f256e8834d4f7c5f0621c1c83cf26,0.000847
48a3a2afda539d63f27c44c4fca6076a8582489299a05895b64950654a7f52b4,0.000780


In [46]:
CummulativeSpending=TransactionsAndArticlesAndCustomers.groupby('customer_id')['price'].sum()
CummulativeSpending

Unnamed: 0_level_0,price
customer_id,Unnamed: 1_level_1
00000dbacae5abe5e23885899a1fa44253a17956c6d1c3d25f88aa139fdfc657,0.050831
0000423b00ade91418cceaf3b26c6af3dd342b51fd051eec9c12fb36984420fa,0.027102
000058a12d5b43e67d225668fa1f8d618c13dc232df0cad8ffe7ad4a1091e318,0.061000
00006413d8573cd20ed7128e53b7b13819fe5cfc2d801fe7fc0f26dd8d65a85a,0.255814
0000757967448a6cb83efb3ea7a3fb9d418ac7adf2379d8cd0c725276a467a2a,0.076237
...,...
ffff8f9ecdce722b5bab97fff68a6d1866492209bfe5242c50d2a10a652fb5ef,0.135542
ffffbbf78b6eaac697a8a5dfbfd2bfa8113ee5b403e4747568cac33e8c541831,0.413271
ffffcd5046a6143d29a04fb8c424ce494a76e5cdf4fab53481233731b5c4f8b7,0.104949
ffffcf35913a0bee60e8741cb2b4e78b8a98ee5ff2e6a1778d0116cffd259264,0.142203


In [52]:

# Select observations where the price is larger than 10
customers_totalspending_5 = CummulativeSpending[CummulativeSpending > 5]

customers_totalspending_5

Unnamed: 0_level_0,price
customer_id,Unnamed: 1_level_1
03d0011487606c37c1b1ed147fc72f285a50c05f00b9712e0fc3da400c864296,7.540746
03fdb0bf2d9ff8ba23e1b4aef53709119aad5bc83691d89293a01a52b93d7370,8.358831
0a2de6c095c0771aa5bc2153c37938a07e7daa2bdcd159c8b534c779cb58a276,10.098441
0bb6244a4efaed0303a7cfe18ffd69ffee81e70247693f900b8d9ec674e40473,7.053576
0d8e6c1ea7890ce90968f017361e439f4ea9091de795e9eeac8c5fcb3b92ddc2,5.635034
...,...
f3f3b83a093df7d7f3c15797fd429efc11eaa6e3c75c6c34ee27c881a073afba,7.161508
f69cf6fca69045a8259f9554e318e00fbf5e8e758e88b1f78775239603f7ea61,8.363898
fb142114110364dee048fc92bf615ae06f65306ae7ffeacc78f659bdb628909d,5.601475
fe75707094c6332e3caa45ecda2a3d08c7a7caef739d0b77d03e0e1bd85feda0,5.238237


In [53]:
customers_totalspending_5.index

Index(['03d0011487606c37c1b1ed147fc72f285a50c05f00b9712e0fc3da400c864296',
       '03fdb0bf2d9ff8ba23e1b4aef53709119aad5bc83691d89293a01a52b93d7370',
       '0a2de6c095c0771aa5bc2153c37938a07e7daa2bdcd159c8b534c779cb58a276',
       '0bb6244a4efaed0303a7cfe18ffd69ffee81e70247693f900b8d9ec674e40473',
       '0d8e6c1ea7890ce90968f017361e439f4ea9091de795e9eeac8c5fcb3b92ddc2',
       '12600404ef642873ccc3902e2f26a000d087972775e5d1f720594b0e2a175b54',
       '1320d4b3dd6481cde05bb80fb7ca37397f70470b9afb96aeca5d41175acaf836',
       '1585f1b9e407e6e267f58b2c6041a93373e6ecb0963416bdc804738b588bdae8',
       '191071b0e1f2e94a557f1a0b4cea3de55faf1581b1f46466ffe90664f73ec96e',
       '1de6f4aff9a8ac7086f047831bdc76c4cf3e2f42b1b25fcbf66bb119046ee996',
       ...
       'f113a8afa4021872bca78c9eca49c02e4cf176c5965b22b14044aa2da37020ef',
       'f137c16fd175271922dad4006565503952f24750a57388fe24970a218c62de6a',
       'f16593e3ca0700fad7541b11eff11a330c1b1d3e03c60142c3e07747ecfe27b8',
       'f24845

In [60]:
TransactionsAndArticlesAndCustomers[(TransactionsAndArticlesAndCustomers['customer_id'].isin(customers_totalspending_5.index))&(TransactionsAndArticlesAndCustomers['age']<22)&(TransactionsAndArticlesAndCustomers['club_member_status']=='ACTIVE')]

Unnamed: 0,t_dat,customer_id,article_id,price,sales_channel_id,product_code,prod_name,product_type_no,product_type_name,product_group_name,...,section_name,garment_group_no,garment_group_name,detail_desc,FN,Active,club_member_status,fashion_news_frequency,age,postal_code
2527775,2020-07-17,fe75707094c6332e3caa45ecda2a3d08c7a7caef739d0b...,892626001,0.006763,2,892626,Class Carlie Hoop pk,70,Earring,Accessories,...,Womens Small accessories,1019,Accessories,Thin metal hoop earrings in various sizes and ...,,,ACTIVE,NONE,21.0,f76af8517c381b16eaf7a9073ed97429fefe5d822d075c...
2527776,2020-07-17,fe75707094c6332e3caa45ecda2a3d08c7a7caef739d0b...,904199001,0.022017,2,904199,Cool Sully earcuff pk,70,Earring,Accessories,...,Womens Small accessories,1019,Accessories,Set with earrings and ear cuffs in metal decor...,,,ACTIVE,NONE,21.0,f76af8517c381b16eaf7a9073ed97429fefe5d822d075c...
2527777,2020-07-17,fe75707094c6332e3caa45ecda2a3d08c7a7caef739d0b...,903876001,0.011847,2,903876,Flirty Lynn hoop,70,Earring,Accessories,...,Womens Small accessories,1019,Accessories,Metal hoop earrings with pearly plastic beads.,,,ACTIVE,NONE,21.0,f76af8517c381b16eaf7a9073ed97429fefe5d822d075c...
2565547,2020-07-18,fe75707094c6332e3caa45ecda2a3d08c7a7caef739d0b...,915444001,0.025407,2,915444,Cool Paddington earcuff pk,70,Earring,Accessories,...,Womens Small accessories,1019,Accessories,Earrings and ear cuffs in metal decorated with...,,,ACTIVE,NONE,21.0,f76af8517c381b16eaf7a9073ed97429fefe5d822d075c...
2565548,2020-07-18,fe75707094c6332e3caa45ecda2a3d08c7a7caef739d0b...,837941001,0.033881,2,837941,Tio NW straight denim,272,Trousers,Garment Lower body,...,Womens Trend,1010,Blouses,5-pocket jeans in washed denim with a regular ...,,,ACTIVE,NONE,21.0,f76af8517c381b16eaf7a9073ed97429fefe5d822d075c...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5151301,2020-09-22,fe75707094c6332e3caa45ecda2a3d08c7a7caef739d0b...,936217002,0.059305,2,936217,Embrace Skirt,275,Skirt,Garment Lower body,...,Ladies Denim,1016,Trousers Denim,"Short, 5-pocket skirt with a high waist and zi...",,,ACTIVE,NONE,21.0,f76af8517c381b16eaf7a9073ed97429fefe5d822d075c...
5151302,2020-09-22,fe75707094c6332e3caa45ecda2a3d08c7a7caef739d0b...,936217002,0.059305,2,936217,Embrace Skirt,275,Skirt,Garment Lower body,...,Ladies Denim,1016,Trousers Denim,"Short, 5-pocket skirt with a high waist and zi...",,,ACTIVE,NONE,21.0,f76af8517c381b16eaf7a9073ed97429fefe5d822d075c...
5151303,2020-09-22,fe75707094c6332e3caa45ecda2a3d08c7a7caef739d0b...,906352001,0.059305,2,906352,Lava,252,Sweater,Garment Upper body,...,Womens Everyday Collection,1003,Knitwear,Relaxed-fit jumper in a soft cable knit contai...,,,ACTIVE,NONE,21.0,f76af8517c381b16eaf7a9073ed97429fefe5d822d075c...
5151304,2020-09-22,fe75707094c6332e3caa45ecda2a3d08c7a7caef739d0b...,914805002,0.050831,2,914805,Fife dress,254,Top,Garment Upper body,...,Womens Casual,1003,Knitwear,Calf-length dress in a soft knit containing so...,,,ACTIVE,NONE,21.0,f76af8517c381b16eaf7a9073ed97429fefe5d822d075c...
