# Business Problem

**Birliktelik Kuralları Nedir?**

Veri içerisindeki pattern'leri (ilişkileri, yapıları) bulmak için kullanılan kural tabanlı bir makine öğrenmesi tekniğidir.


Birliktelik analizi uygulamaları veri biliminde en çok karşımıza çıkan uygulamalardandır. Tavsiye sistemleri olarak da denk gelmiş olacaktır.


Bu uygulamalar karşınıza şu şekillerde gelmiş olabilir "o ürünü alan bu ürünü de aldı" ya da "o ilana bakanlar bu ilanlara da baktı" ya da "senin için çalma listesi oluşturduk" ya da "sıradaki video için önerilen video" gibi.

Bu senaryolar e-ticaret veri bilimi veri madenciliği çalışmaları kapsamında en sık karşımıza çıkacak olan senaryolar.

Türkiye'deki ve dünyadaki büyük e-ticaret şirketleri, spotify, amazon, netflix gibi biraz daha yakından bilebileceğimiz birçok platform tavsiye sistemlerini kullanmaktadır.

Peki özetle ne yapmaktadır bu birliktelik analizleri?


**Apriori Algoritması**

Bu alanda en çok kullanılan yöntemdir. 

Birliktelik kuralı analizi bazı metrikler incelenerek gerçekleştirilir:

- Destek (Support)

Support(X, Y) = Freq(X,Y)/N

X: ürün
Y: ürün
N: toplam alışveriş

- Güven (Confidence)

Confidence(X, Y) = Freq(X,Y) / Freq(X)

- Lift

Lift = Support (X, Y) / ( Support(X) * Support(Y) ) 



# Data Understanding

In [2]:
#!pip install mlxtend



In [2]:
import pandas as pd
import numpy as np
from mlxtend.frequent_patterns import apriori, association_rules

In [21]:
df = pd.read_csv('data/retail_dataset.csv', sep=',')
df.head()

Unnamed: 0,0,1,2,3,4,5,6
0,Bread,Wine,Eggs,Meat,Cheese,Pencil,Diaper
1,Bread,Cheese,Meat,Diaper,Wine,Milk,Pencil
2,Cheese,Meat,Eggs,Milk,Wine,,
3,Cheese,Meat,Eggs,Milk,Wine,,
4,Meat,Pencil,Wine,,,,


In [4]:
df.shape

(315, 7)

# Data Preprocessing

In [None]:
items = (df['0'].unique())
items

In [8]:
encoded_vals = []
for index, row in df.iterrows(): 
    labels = {}
    uncommons = list(set(items) - set(row))
    commons = list(set(items).intersection(row))
    for uc in uncommons:
        labels[uc] = 0
    for com in commons:
        labels[com] = 1
    encoded_vals.append(labels)

In [27]:
ohe_df = pd.DataFrame(encoded_vals)

In [28]:
ohe_df

Unnamed: 0,Milk,Bagel,Bread,Meat,Wine,Pencil,Eggs,Cheese,Diaper
0,0,0,1,1,1,1,1,1,1
1,1,0,1,1,1,1,0,1,1
2,1,0,0,1,1,0,1,1,0
3,1,0,0,1,1,0,1,1,0
4,0,0,0,1,1,1,0,0,0
...,...,...,...,...,...,...,...,...,...
310,0,0,1,0,0,0,1,1,0
311,1,0,0,1,0,1,0,0,0
312,0,0,1,1,1,1,1,1,1
313,0,0,0,1,0,0,0,1,0


# Association Rules 

In [33]:
?apriori

[0;31mSignature:[0m
[0mapriori[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mdf[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmin_support[0m[0;34m=[0m[0;36m0.5[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0muse_colnames[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmax_len[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mverbose[0m[0;34m=[0m[0;36m0[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mlow_memory[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Get frequent itemsets from a one-hot DataFrame

Parameters
-----------
df : pandas DataFrame
  pandas DataFrame the encoded format. Also supports
  DataFrames with sparse data; for more info, please
  see (https://pandas.pydata.org/pandas-docs/stable/
       user_guide/sparse.html#sparse-data-structures)

  Please note that the old pandas SparseDataFrame format
  is no longer supported in mlxtend >= 

In [41]:
freq_items = apriori(ohe_df, min_support = 0.4, use_colnames = True, verbose = 1)

Processing 56 combinations | Sampling itemset size 2


In [42]:
freq_items.head()

Unnamed: 0,support,itemsets
0,0.501587,(Milk)
1,0.425397,(Bagel)
2,0.504762,(Bread)
3,0.47619,(Meat)
4,0.438095,(Wine)


In [43]:
?association_rules

[0;31mSignature:[0m
[0massociation_rules[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mdf[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmetric[0m[0;34m=[0m[0;34m'confidence'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmin_threshold[0m[0;34m=[0m[0;36m0.8[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0msupport_only[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Generates a DataFrame of association rules including the
metrics 'score', 'confidence', and 'lift'

Parameters
-----------
df : pandas DataFrame
  pandas DataFrame of frequent itemsets
  with columns ['support', 'itemsets']

metric : string (default: 'confidence')
  Metric to evaluate if a rule is of interest.
  **Automatically set to 'support' if `support_only=True`.**
  Otherwise, supported metrics are 'support', 'confidence', 'lift',
  'leverage', and 'conviction'
  These metrics are computed as follows:

  - support(A->C) = support(A+C) 

In [34]:
association_rules(freq_items, metric = "confidence", min_threshold = 0.6)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(Milk),(Cheese),0.501587,0.501587,0.304762,0.607595,1.211344,0.053172,1.270148
1,(Cheese),(Milk),0.501587,0.501587,0.304762,0.607595,1.211344,0.053172,1.270148
2,(Bagel),(Bread),0.425397,0.504762,0.279365,0.656716,1.301042,0.064641,1.44265
3,(Eggs),(Meat),0.438095,0.47619,0.266667,0.608696,1.278261,0.05805,1.338624
4,(Cheese),(Meat),0.501587,0.47619,0.32381,0.64557,1.355696,0.084958,1.477891
5,(Meat),(Cheese),0.47619,0.501587,0.32381,0.68,1.355696,0.084958,1.55754
6,(Wine),(Cheese),0.438095,0.501587,0.269841,0.615942,1.227986,0.050098,1.297754
7,(Eggs),(Cheese),0.438095,0.501587,0.298413,0.681159,1.358008,0.07867,1.563203
8,"(Milk, Cheese)",(Meat),0.304762,0.47619,0.203175,0.666667,1.4,0.05805,1.571429
9,"(Milk, Meat)",(Cheese),0.244444,0.501587,0.203175,0.831169,1.657077,0.080564,2.952137


# Reporting