# Retail Lab (Time Series)

**Learning Objectives:**
  * Apply text processing techniques
  * Gain exposure to retail related DataSets

## Context of the datasets

### 1. There are three datasets: `articles.csv.zip`, `customers.csv.zip` and `transactions2020.csv.zip`

#### 2. The Articles dataset contains information over products available.
#### 3. The Customers dataset contains information over registered customers.
#### 4. The Transactions dataset contains purchases of articles made by customers.



## 1. Library Import

In [1]:
import pandas as pd
import warnings
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt

In [2]:
warnings.simplefilter('ignore')

## 2. Data loading and DataFrame creation

In [3]:
Articles=pd.read_csv("https://github.com/thousandoaks/Python4DS-I/raw/main/datasets/articles.csv.zip")

In [4]:
Articles.head(3)

Unnamed: 0,article_id,product_code,prod_name,product_type_no,product_type_name,product_group_name,graphical_appearance_no,graphical_appearance_name,colour_group_code,colour_group_name,...,department_name,index_code,index_name,index_group_no,index_group_name,section_no,section_name,garment_group_no,garment_group_name,detail_desc
0,108775015,108775,Strap top,253,Vest top,Garment Upper body,1010016,Solid,9,Black,...,Jersey Basic,A,Ladieswear,1,Ladieswear,16,Womens Everyday Basics,1002,Jersey Basic,Jersey top with narrow shoulder straps.
1,108775044,108775,Strap top,253,Vest top,Garment Upper body,1010016,Solid,10,White,...,Jersey Basic,A,Ladieswear,1,Ladieswear,16,Womens Everyday Basics,1002,Jersey Basic,Jersey top with narrow shoulder straps.
2,108775051,108775,Strap top (1),253,Vest top,Garment Upper body,1010017,Stripe,11,Off White,...,Jersey Basic,A,Ladieswear,1,Ladieswear,16,Womens Everyday Basics,1002,Jersey Basic,Jersey top with narrow shoulder straps.


In [5]:
Articles.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 105542 entries, 0 to 105541
Data columns (total 25 columns):
 #   Column                        Non-Null Count   Dtype 
---  ------                        --------------   ----- 
 0   article_id                    105542 non-null  int64 
 1   product_code                  105542 non-null  int64 
 2   prod_name                     105542 non-null  object
 3   product_type_no               105542 non-null  int64 
 4   product_type_name             105542 non-null  object
 5   product_group_name            105542 non-null  object
 6   graphical_appearance_no       105542 non-null  int64 
 7   graphical_appearance_name     105542 non-null  object
 8   colour_group_code             105542 non-null  int64 
 9   colour_group_name             105542 non-null  object
 10  perceived_colour_value_id     105542 non-null  int64 
 11  perceived_colour_value_name   105542 non-null  object
 12  perceived_colour_master_id    105542 non-null  int64 
 13 

In [6]:
Customers=pd.read_csv("https://github.com/thousandoaks/Python4DS-I/raw/main/datasets/customers.csv.zip")

In [7]:
Customers.sample(3)

Unnamed: 0,customer_id,FN,Active,club_member_status,fashion_news_frequency,age,postal_code
389573,48d1f8dbb09254515fc28c65f9d9e638890a27dd91a6b2...,,,ACTIVE,NONE,20.0,77877b59d3ea31f6b92e0265470f7c14cc4b9500d452ee...
445147,53372e5a495238af8b823172cf0b0496f7e0f1324bf2a9...,,,ACTIVE,NONE,46.0,da0bbaaed411a796e0f3fe9d83c799c4e7a081dbca5f26...
1303907,f35058bb737bbcec400c6f6f92000eb676ca80738f8fa3...,,,ACTIVE,NONE,29.0,8e58f188f877bf922956381974378c1c9084d0a851e54b...


In [8]:
Customers.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1371980 entries, 0 to 1371979
Data columns (total 7 columns):
 #   Column                  Non-Null Count    Dtype  
---  ------                  --------------    -----  
 0   customer_id             1371980 non-null  object 
 1   FN                      476930 non-null   float64
 2   Active                  464404 non-null   float64
 3   club_member_status      1365918 non-null  object 
 4   fashion_news_frequency  1355969 non-null  object 
 5   age                     1356119 non-null  float64
 6   postal_code             1371980 non-null  object 
dtypes: float64(3), object(4)
memory usage: 73.3+ MB


In [9]:
Transactions=pd.read_csv("https://github.com/thousandoaks/Python4DS-I/raw/main/datasets/transactions2020.csv.zip")

In [10]:
Transactions.sample(3)

Unnamed: 0,t_dat,customer_id,article_id,price,sales_channel_id
1367834,2020-06-25,3dd2c55ad8eeebb7565517b3dcbd6663ffc4fb56117af3...,876327001,0.010153,1
4229318,2020-08-28,c7e3984a6751ada06f9630fa47e6d7ff7a469a4d0fdd85...,918292001,0.042356,2
1563079,2020-06-27,afcf094f564b93cdba75438414ba0ea737a12cc9f6b082...,878510002,0.015237,2


In [11]:
Transactions.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5151470 entries, 0 to 5151469
Data columns (total 5 columns):
 #   Column            Dtype  
---  ------            -----  
 0   t_dat             object 
 1   customer_id       object 
 2   article_id        int64  
 3   price             float64
 4   sales_channel_id  int64  
dtypes: float64(1), int64(2), object(2)
memory usage: 196.5+ MB


## 3. Merging DataFrames

#### 3.1. Transactions-Articles


In [12]:
Transactions.head(3)

Unnamed: 0,t_dat,customer_id,article_id,price,sales_channel_id
0,2020-06-01,00075ef36696a7b4ed8c83e22a4bf7ea7c90ee110991ec...,844198001,0.016932,2
1,2020-06-01,000b31552d3785c79833262bbeefa484cbc43d7b612b3c...,777016001,0.030492,1
2,2020-06-01,002d8d26c9414c981c012c6f5e4b2de7ffd3bc568c4574...,820507001,0.010153,2


In [13]:
Articles.head(3)

Unnamed: 0,article_id,product_code,prod_name,product_type_no,product_type_name,product_group_name,graphical_appearance_no,graphical_appearance_name,colour_group_code,colour_group_name,...,department_name,index_code,index_name,index_group_no,index_group_name,section_no,section_name,garment_group_no,garment_group_name,detail_desc
0,108775015,108775,Strap top,253,Vest top,Garment Upper body,1010016,Solid,9,Black,...,Jersey Basic,A,Ladieswear,1,Ladieswear,16,Womens Everyday Basics,1002,Jersey Basic,Jersey top with narrow shoulder straps.
1,108775044,108775,Strap top,253,Vest top,Garment Upper body,1010016,Solid,10,White,...,Jersey Basic,A,Ladieswear,1,Ladieswear,16,Womens Everyday Basics,1002,Jersey Basic,Jersey top with narrow shoulder straps.
2,108775051,108775,Strap top (1),253,Vest top,Garment Upper body,1010017,Stripe,11,Off White,...,Jersey Basic,A,Ladieswear,1,Ladieswear,16,Womens Everyday Basics,1002,Jersey Basic,Jersey top with narrow shoulder straps.


In [14]:
## we merge both DataFrames using the common key: article_id. We store the result in a new DataFrame
TransactionsAndArticles=pd.merge(Transactions, Articles, how='left',on='article_id')

#### 3.2. Transactions-Articles-Customers

In [15]:
TransactionsAndArticles.head(3)

Unnamed: 0,t_dat,customer_id,article_id,price,sales_channel_id,product_code,prod_name,product_type_no,product_type_name,product_group_name,...,department_name,index_code,index_name,index_group_no,index_group_name,section_no,section_name,garment_group_no,garment_group_name,detail_desc
0,2020-06-01,00075ef36696a7b4ed8c83e22a4bf7ea7c90ee110991ec...,844198001,0.016932,2,844198,Saturn trs (J),296,Pyjama bottom,Nightwear,...,Nightwear,B,Lingeries/Tights,1,Ladieswear,62,"Womens Nightwear, Socks & Tigh",1017,"Under-, Nightwear",Pyjama bottoms in sweatshirt fabric with wide ...
1,2020-06-01,000b31552d3785c79833262bbeefa484cbc43d7b612b3c...,777016001,0.030492,1,777016,Cisco skirt,275,Skirt,Garment Lower body,...,Trousers & Skirt,A,Ladieswear,1,Ladieswear,18,Womens Trend,1009,Trousers,"Calf-length skirt in softly draping, patterned..."
2,2020-06-01,002d8d26c9414c981c012c6f5e4b2de7ffd3bc568c4574...,820507001,0.010153,2,820507,Charlotte Hipster Primula,286,Underwear bottom,Underwear,...,Expressive Lingerie,B,Lingeries/Tights,1,Ladieswear,61,Womens Lingerie,1017,"Under-, Nightwear","Hipster briefs in lace with a mid waist, lined..."


In [16]:
Customers.head(3)

Unnamed: 0,customer_id,FN,Active,club_member_status,fashion_news_frequency,age,postal_code
0,00000dbacae5abe5e23885899a1fa44253a17956c6d1c3...,,,ACTIVE,NONE,49.0,52043ee2162cf5aa7ee79974281641c6f11a68d276429a...
1,0000423b00ade91418cceaf3b26c6af3dd342b51fd051e...,,,ACTIVE,NONE,25.0,2973abc54daa8a5f8ccfe9362140c63247c5eee03f1d93...
2,000058a12d5b43e67d225668fa1f8d618c13dc232df0ca...,,,ACTIVE,NONE,24.0,64f17e6a330a85798e4998f62d0930d14db8db1c054af6...


In [17]:
## we merge both DataFrames using the common key: customer_id. We store the result in a new DataFrame
TransactionsAndArticlesAndCustomers=pd.merge(TransactionsAndArticles, Customers, how='left',on='customer_id')

In [42]:
TransactionsAndArticlesAndCustomers.head(3).T

t_dat,2020-06-01,2020-06-01.1,2020-06-01.2
t_dat,2020-06-01 00:00:00,2020-06-01 00:00:00,2020-06-01 00:00:00
customer_id,00075ef36696a7b4ed8c83e22a4bf7ea7c90ee110991ec...,000b31552d3785c79833262bbeefa484cbc43d7b612b3c...,002d8d26c9414c981c012c6f5e4b2de7ffd3bc568c4574...
article_id,844198001,777016001,820507001
price,0.016932,0.030492,0.010153
sales_channel_id,2,1,2
product_code,844198,777016,820507
prod_name,Saturn trs (J),Cisco skirt,Charlotte Hipster Primula
product_type_no,296,275,286
product_type_name,Pyjama bottom,Skirt,Underwear bottom
product_group_name,Nightwear,Garment Lower body,Underwear


In [19]:
TransactionsAndArticlesAndCustomers.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5151470 entries, 0 to 5151469
Data columns (total 35 columns):
 #   Column                        Dtype  
---  ------                        -----  
 0   t_dat                         object 
 1   customer_id                   object 
 2   article_id                    int64  
 3   price                         float64
 4   sales_channel_id              int64  
 5   product_code                  int64  
 6   prod_name                     object 
 7   product_type_no               int64  
 8   product_type_name             object 
 9   product_group_name            object 
 10  graphical_appearance_no       int64  
 11  graphical_appearance_name     object 
 12  colour_group_code             int64  
 13  colour_group_name             object 
 14  perceived_colour_value_id     int64  
 15  perceived_colour_value_name   object 
 16  perceived_colour_master_id    int64  
 17  perceived_colour_master_name  object 
 18  department_no         

## 4. Let's find all rows: (1) having the word "Trousers" in the column `garment_group_name` AND (2) having the word "Ladieswear" in the column `index_group_name`

In [43]:
TransactionsAndArticlesAndCustomers.groupby('index_group_name').count()

Unnamed: 0_level_0,t_dat,customer_id,article_id,price,sales_channel_id,product_code,prod_name,product_type_no,product_type_name,product_group_name,...,section_name,garment_group_no,garment_group_name,detail_desc,FN,Active,club_member_status,fashion_news_frequency,age,postal_code
index_group_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Baby/Children,88425,88425,88425,88425,88425,88425,88425,88425,88425,88425,...,88425,88425,88425,88407,41298,40628,88104,88045,87914,88425
Divided,1080962,1080962,1080962,1080962,1080962,1080962,1080962,1080962,1080962,1080962,...,1080962,1080962,1080962,1080956,490449,482639,1079239,1078667,1077290,1080962
Ladieswear,3448578,3448578,3448578,3448578,3448578,3448578,3448578,3448578,3448578,3448578,...,3448578,3448578,3448578,3444994,1554595,1532315,3444011,3440093,3435019,3448578
Menswear,303848,303848,303848,303848,303848,303848,303848,303848,303848,303848,...,303848,303848,303848,303846,139553,137577,303522,303134,302691,303848
Sport,229657,229657,229657,229657,229657,229657,229657,229657,229657,229657,...,229657,229657,229657,229653,94380,93073,229389,229023,228752,229657


In [44]:
TransactionsAndArticlesAndCustomers.groupby('garment_group_name').count()

Unnamed: 0_level_0,t_dat,customer_id,article_id,price,sales_channel_id,product_code,prod_name,product_type_no,product_type_name,product_group_name,...,section_no,section_name,garment_group_no,detail_desc,FN,Active,club_member_status,fashion_news_frequency,age,postal_code
garment_group_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Accessories,254328,254328,254328,254328,254328,254328,254328,254328,254328,254328,...,254328,254328,254328,254310,117210,115181,254038,253477,253312,254328
Blouses,431929,431929,431929,431929,431929,431929,431929,431929,431929,431929,...,431929,431929,431929,431900,206972,204203,431363,430942,430385,431929
Dressed,72407,72407,72407,72407,72407,72407,72407,72407,72407,72407,...,72407,72407,72407,72362,33675,33238,72306,72270,72140,72407
Dresses Ladies,388742,388742,388742,388742,388742,388742,388742,388742,388742,388742,...,388742,388742,388742,388722,189229,186668,388196,387975,387559,388742
Dresses/Skirts girls,4067,4067,4067,4067,4067,4067,4067,4067,4067,4067,...,4067,4067,4067,4067,1996,1966,4060,4057,4050,4067
Jersey Basic,586898,586898,586898,586898,586898,586898,586898,586898,586898,586898,...,586898,586898,586898,586890,256654,252834,585932,585429,584441,586898
Jersey Fancy,877520,877520,877520,877520,877520,877520,877520,877520,877520,877520,...,877520,877520,877520,877473,395462,389857,876399,875260,874030,877520
Knitwear,226507,226507,226507,226507,226507,226507,226507,226507,226507,226507,...,226507,226507,226507,226505,103797,102439,226133,225986,225715,226507
Outdoor,70172,70172,70172,70172,70172,70172,70172,70172,70172,70172,...,70172,70172,70172,70172,32565,32149,70045,70037,69941,70172
Shirts,33970,33970,33970,33970,33970,33970,33970,33970,33970,33970,...,33970,33970,33970,33970,15762,15546,33948,33894,33824,33970


In [48]:
LadiesWearFilter=TransactionsAndArticlesAndCustomers['index_group_name'].str.contains('Ladieswear')

In [50]:
regexPattern=r'\bTrousers\b'

In [53]:
TrousersFilter=TransactionsAndArticlesAndCustomers['garment_group_name'].str.contains(regexPattern, regex=True)

In [55]:
TransactionsAndArticlesAndCustomers[LadiesWearFilter & TrousersFilter]

Unnamed: 0_level_0,t_dat,customer_id,article_id,price,sales_channel_id,product_code,prod_name,product_type_no,product_type_name,product_group_name,...,section_name,garment_group_no,garment_group_name,detail_desc,FN,Active,club_member_status,fashion_news_frequency,age,postal_code
t_dat,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2020-06-01,2020-06-01,000b31552d3785c79833262bbeefa484cbc43d7b612b3c...,777016001,0.030492,1,777016,Cisco skirt,275,Skirt,Garment Lower body,...,Womens Trend,1009,Trousers,"Calf-length skirt in softly draping, patterned...",1.0,1.0,ACTIVE,Regularly,59.0,2c29ae653a9282cce4151bd87643c907644e09541abc28...
2020-06-01,2020-06-01,00357392d470d99fd23483a8bee556346b687cc357f237...,880867004,0.050831,2,880867,Juniper playsuit,272,Trousers,Garment Lower body,...,Womens Casual,1009,Trousers,Sleeveless playsuit in a viscose and linen wea...,,,ACTIVE,NONE,32.0,fb06cbc1bd9e3d9037d586c7f790f99f8c491620be9d03...
2020-06-01,2020-06-01,00357392d470d99fd23483a8bee556346b687cc357f237...,871974001,0.059305,2,871974,Tarly indigo tencel playsuit,272,Trousers,Garment Lower body,...,Womens Casual,1009,Trousers,"Playsuit in soft, washed Tencel™ lyocell denim...",,,ACTIVE,NONE,32.0,fb06cbc1bd9e3d9037d586c7f790f99f8c491620be9d03...
2020-06-01,2020-06-01,003ecc4871f0f5f1722c36e00e345d76204096c524b68a...,757347001,0.050831,2,757347,LOGG Iris linen jogger',272,Trousers,Garment Lower body,...,H&M+,1009,Trousers,Trousers in airy linen. Relaxed fit with a reg...,1.0,1.0,ACTIVE,Regularly,34.0,d4539e3081df4f09c0eb92737df51f880f7367821add99...
2020-06-01,2020-06-01,0040e2fc2d1e7931a38355aca56b2c62b87e65051b7287...,862325002,0.036000,2,862325,Felix HW,272,Trousers,Garment Lower body,...,Womens Everyday Collection,1009,Trousers,Ankle-length trousers in twill made from a vis...,,,ACTIVE,NONE,39.0,8d19ecf2057b7199b28648c237af6b048c495118325581...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2020-09-22,2020-09-22,fe99a0069d6b3c64c2707d0ce53b9311540917471d82df...,562245018,0.023712,2,562245,Luna skinny RW,272,Trousers,Garment Lower body,...,Womens Everyday Collection,1009,Trousers,"5-pocket jeans in washed, superstretch denim w...",1.0,1.0,ACTIVE,Regularly,48.0,3fc97fb9464411bfb0f0739a0c5040d0bc1e94cdc02363...
2020-09-22,2020-09-22,febbccd6dd7885f2501bb56c47af50b122e9f3ab9a7472...,836540005,0.064390,2,836540,Push up ankle (D),272,Trousers,Garment Lower body,...,Mama,1009,Trousers,"Ankle-length jeggings in washed, stretch denim...",1.0,1.0,ACTIVE,Regularly,33.0,3c32ab05679c20304e50b754b9d33447e62adb39b7665f...
2020-09-22,2020-09-22,fee9fcf27b395c43053f36697af5766fdaaf428b13f8ef...,886241001,0.027102,1,886241,Bahamas jumpsuit,267,Jumpsuit/Playsuit,Garment Full body,...,Womens Everyday Collection,1009,Trousers,"Jumpsuit in woven fabric with wide, flounce-tr...",1.0,1.0,ACTIVE,Regularly,28.0,cd63d5c1451b873e246e0f0bcda632b53244e02120e41c...
2020-09-22,2020-09-22,ffb72741f3bc3d98855703b55d34e05bc7893a5d6a99a3...,751471041,0.033881,2,751471,Pluto RW slacks (1),272,Trousers,Garment Lower body,...,Womens Everyday Collection,1009,Trousers,Ankle-length cigarette trousers in a stretch w...,,,ACTIVE,NONE,26.0,97f87a076cd480253739cadad692d95aa2c636c6f5141f...
