Turkey's largest online service platform, Armut, brings together service providers and those who want to receive service.
It provides easy access to services such as cleaning, renovation, transportation with a few touches on your computer or smartphone.
Using the data set containing service users and the services and categories these users receive, it is desired to create a product recommendation system with Association Rule Learning.

The dataset consists of services received by customers and the categories of these services.

Contains date and time information for each service received.

UserId: Customer number
ServiceId: Anonymized services belonging to each category.Example: Sofa washing service under the cleaning category
A ServiceId can be found under different categories and represents different services under different categories.
Example: Service with CategoryId 7 is radiator cleaning, while service with CategoryId 2 is furniture assembly
CategoryId: Anonymized categories.Example: Cleaning, transportation, renovation category
CreateDate: Date the service was purchased

In [1]:
import pandas as pd
pd.set_option('display.max_columns', None)
from mlxtend.frequent_patterns import apriori, association_rules

import warnings
warnings.filterwarnings("ignore")

Data Preparation

In [3]:
df_ = pd.read_csv("/content/armut_data.csv")
df = df_.copy()

In [4]:
df.head()

Unnamed: 0,UserId,ServiceId,CategoryId,CreateDate
0,25446,4,5,2017-08-06 16:11:00
1,22948,48,5,2017-08-06 16:12:00
2,10618,0,8,2017-08-06 16:13:00
3,7256,9,4,2017-08-06 16:14:00
4,25446,48,5,2017-08-06 16:16:00


ServiceID represents a different service for each CategoryID. We created a new variable to represent the services by combining ServiceID and CategoryID with "_".

In [5]:
df["Hizmet"] = df["ServiceId"].astype(str) + "_" + df["CategoryId"].astype(str)
df["Hizmet"] = df[["ServiceId", "CategoryId"]].apply(lambda x: "_".join(x.astype(str)), axis=1)
df["Hizmet"] = df.apply(lambda x: str(x["ServiceId"]) + "_" + str(x["CategoryId"]), axis=1)
df["Hizmet"] = [str(row[1]) + "_" + str(row[2]) for row in df.values]

The dataset consists of the date and time the services were received, there is no basket definition (invoice etc.).

In order to apply Association Rule Learning, a basket (invoice etc.) definition must be created.

Here, the basket definition is the services received by each customer monthly. For example; customer with id 7256 represents one basket for the 9_4, 46_4 services received in the 8th month of 2017;
The 9_4, 38_4 services received in the 10th month of 2017 represent another basket. Baskets must be defined with a unique ID.

To do this, first create a new date variable that contains only the year and month. Combine the UserID and the newly created date variable with "_" and assign it to a new variable named ID.

In [6]:
df["CreateDate"] = pd.to_datetime(df["CreateDate"])
df.head()

Unnamed: 0,UserId,ServiceId,CategoryId,CreateDate,Hizmet
0,25446,4,5,2017-08-06 16:11:00,4_5
1,22948,48,5,2017-08-06 16:12:00,48_5
2,10618,0,8,2017-08-06 16:13:00,0_8
3,7256,9,4,2017-08-06 16:14:00,9_4
4,25446,48,5,2017-08-06 16:16:00,48_5


In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 162523 entries, 0 to 162522
Data columns (total 5 columns):
 #   Column      Non-Null Count   Dtype         
---  ------      --------------   -----         
 0   UserId      162523 non-null  int64         
 1   ServiceId   162523 non-null  int64         
 2   CategoryId  162523 non-null  int64         
 3   CreateDate  162523 non-null  datetime64[ns]
 4   Hizmet      162523 non-null  object        
dtypes: datetime64[ns](1), int64(3), object(1)
memory usage: 6.2+ MB


In [8]:
df["NEW_DATE"] = df["CreateDate"].dt.strftime("%Y-%m")

In [9]:
df["SepetID"] = df["UserId"].astype(str) + "_" + df["NEW_DATE"]
df.head()
# df["SepetID"] = [str(row[0]) + "_" + str(row[5]) for row in df.values]
df[df["UserId"] == 7256]

Unnamed: 0,UserId,ServiceId,CategoryId,CreateDate,Hizmet,NEW_DATE,SepetID
3,7256,9,4,2017-08-06 16:14:00,9_4,2017-08,7256_2017-08
1268,7256,46,4,2017-08-09 16:15:00,46_4,2017-08,7256_2017-08
9540,7256,46,4,2017-08-29 03:53:00,46_4,2017-08,7256_2017-08
24679,7256,9,4,2017-10-01 04:59:00,9_4,2017-10,7256_2017-10
24680,7256,38,4,2017-10-01 05:01:00,38_4,2017-10,7256_2017-10
28698,7256,9,4,2017-10-11 08:06:00,9_4,2017-10,7256_2017-10
65325,7256,15,1,2017-12-31 04:17:00,15_1,2017-12,7256_2017-12
67093,7256,2,0,2018-01-03 22:06:00,2_0,2018-01,7256_2018-01
70623,7256,38,4,2018-01-11 13:07:00,38_4,2018-01,7256_2018-01
160299,7256,18,4,2018-07-25 00:51:00,18_4,2018-07,7256_2018-07


Generating Association Rules

In [10]:
invoice_product_df = df.groupby(['SepetID', 'Hizmet'])['Hizmet'].count().unstack().fillna(0).applymap(lambda x: 1 if x > 0 else 0)

In [11]:
invoice_product_df.head()

Hizmet,0_8,10_9,11_11,12_7,13_11,14_7,15_1,16_8,17_5,18_4,19_6,1_4,20_5,21_5,22_0,23_10,24_10,25_0,26_7,27_7,28_4,29_0,2_0,30_2,31_6,32_4,33_4,34_6,35_11,36_1,37_0,38_4,39_10,3_5,40_8,41_3,42_1,43_2,44_0,45_6,46_4,47_7,48_5,49_1,4_5,5_11,6_7,7_3,8_5,9_4
SepetID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1
0_2017-08,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0
0_2017-09,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0
0_2018-01,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0
0_2018-04,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
10000_2017-08,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0


In [12]:
frequent_itemsets = apriori(invoice_product_df, min_support=0.01, use_colnames=True)
rules = association_rules(frequent_itemsets, metric="support", min_threshold=0.01)
rules.head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(2_0),(13_11),0.130286,0.056627,0.012819,0.098394,1.737574,1.0,0.005442,1.046325,0.488074,0.073635,0.044274,0.162388
1,(13_11),(2_0),0.056627,0.130286,0.012819,0.226382,1.737574,1.0,0.005442,1.124216,0.449965,0.073635,0.110491,0.162388
2,(15_1),(2_0),0.120963,0.130286,0.033951,0.280673,2.154278,1.0,0.018191,1.209066,0.609539,0.156242,0.172915,0.270631
3,(2_0),(15_1),0.130286,0.120963,0.033951,0.260588,2.154278,1.0,0.018191,1.188833,0.616073,0.156242,0.158839,0.270631
4,(15_1),(33_4),0.120963,0.02731,0.011233,0.092861,3.400299,1.0,0.007929,1.072262,0.803047,0.081967,0.067392,0.252086


In [16]:
def arl_recommender(rules_df, product_id, rec_count=1):
    sorted_rules = rules_df.sort_values("lift", ascending=False)
    recommendation_list = []
    for i, product in sorted_rules["antecedents"].items():
        for j in list(product):
            if j == product_id:
                recommendation_list.append(list(sorted_rules.iloc[i]["consequents"]))
    recommendation_list = list({item for item_list in recommendation_list for item in item_list})
    return recommendation_list[:rec_count]

In [17]:
arl_recommender(rules, "2_0", 4)

['22_0', '15_1', '2_0', '38_4']