# Ex. 01

**Step 1: Divide the data into bins of size 3.**

Bin 1: 13, 15, 16

Bin 2: 16, 19, 20

Bin 3: 20, 21, 22

Bin 4: 22, 25, 25

Bin 5: 25, 25, 30

Bin 6: 33, 33, 35

Bin 7: 35, 35, 35

Bin 8: 35, 36, 40

Bin 9: 45, 46, 52

Bin 10: 70

**Step 2: Calculate the mean of each bin and replace the values in each bin with this mean.**

Bin 1 mean: (13 + 15 + 16) / 3 = 14.67


Bin 2 mean: (16 + 19 + 20) / 3 = 18.33

Bin 3 mean: (20 + 21 + 22) / 3 = 21

Bin 4 mean: (22 + 25 + 25) / 3 = 24

Bin 5 mean: (25 + 25 + 30) / 3 = 26.67

Bin 6 mean: (33 + 33 + 35) / 3 = 33.67

Bin 7 mean: (35 + 35 + 35) / 3 = 35

Bin 8 mean: (35 + 36 + 40) / 3 = 37

Bin 9 mean: (45 + 46 + 52) / 3 = 47.67

Bin 10 mean: 70

*So, the smoothed data becomes:*

14.67, 14.67, 14.67, 18.33, 18.33, 18.33, 21, 21, 21, 24, 24, 24, 26.67, 26.67, 26.67, 33.67, 33.67, 33.67, 35, 35, 35, 37, 37, 37, 47.67, 47.67, 47.67, 70.

The effect of this technique is that it reduces the variation in the data by replacing values within each bin with a single representative value (the mean). It smooths out fluctuations and highlights underlying trends in the data.

***(b) Outliers in the data can be determined by various methods such as:***

*Z-Score method:* Identify data points that fall beyond a certain threshold of standard deviations from the mean.

*Box plot:* Plot the data and look for points that fall outside the whiskers of the box plot.

*Tukey's method*: Define outliers as points that fall more than a certain distance from the first and third quartiles of the data.

***(c) Other methods for data smoothing include:***

*Moving averages:* Replace each data point with the average of itself and its neighboring points within a specified window.

*Exponential smoothing:* Assign exponentially decreasing weights to past observations to give more importance to recent data.

*Polynomial fitting:* Fit a polynomial function to the data to smooth out fluctuations and capture overall trends.

*Kernel smoothing:* Estimate the probability density function of the data by smoothing it with a kernel function.

# Ex. 02

In [None]:
import pandas as pd

In [None]:
# Dataset:
columns = ['cust-ID', 'TID', 'items_bought (in the form of brand-item_category)']
rows = [['01', 'T100', "King's-Crab, Sunset-Milk, Dairyland-Cheese, Best-Bread"],
        ['02', 'T200', "Best-Cheese, Dairyland-Milk, Goldenfarm-Apple, Tasty-Pie, Wonder-Bread"],
        ['01', 'T300', "Westcoast-Apple, Dairyland-Milk, Wonder-Bread, Tasty-Pie"],
        ['03', 'T400', "Wonder-Bread, Sunset-Milk, Dairyland-Cheese"]]

df = pd.DataFrame(rows, columns = columns)
df

Unnamed: 0,cust-ID,TID,items_bought (in the form of brand-item_category)
0,1,T100,"King's-Crab, Sunset-Milk, Dairyland-Cheese, Be..."
1,2,T200,"Best-Cheese, Dairyland-Milk, Goldenfarm-Apple,..."
2,1,T300,"Westcoast-Apple, Dairyland-Milk, Wonder-Bread,..."
3,3,T400,"Wonder-Bread, Sunset-Milk, Dairyland-Cheese"


In [None]:
items_list = []

for transaction in df['items_bought (in the form of brand-item_category)']:
  items = transaction.split(', ')
  items_list.append(items)

items_list

[["King's-Crab", 'Sunset-Milk', 'Dairyland-Cheese', 'Best-Bread'],
 ['Best-Cheese',
  'Dairyland-Milk',
  'Goldenfarm-Apple',
  'Tasty-Pie',
  'Wonder-Bread'],
 ['Westcoast-Apple', 'Dairyland-Milk', 'Wonder-Bread', 'Tasty-Pie'],
 ['Wonder-Bread', 'Sunset-Milk', 'Dairyland-Cheese']]

# FP-Growth

In [None]:
from collections import defaultdict, namedtuple
import time

def find_frequent_itemsets(transactions, minimum_support, include_support=False):

    items = defaultdict(lambda: 0)

    for transaction in transactions:
        for item in transaction:
            items[item] += 1

    items = dict((item, support) for item, support in items.items()
        if support >= minimum_support)

    def clean_transaction(transaction):
        transaction = filter(lambda v: v in items, transaction)
        transaction_list = list(transaction)
        transaction_list.sort(key=lambda v: items[v], reverse=True)
        return transaction_list

    master = FPTree()
    for transaction in map(clean_transaction, transactions):
        master.add(transaction)

    def find_with_suffix(tree, suffix):
        for item, nodes in tree.items():
            support = sum(n.count for n in nodes)
            if support >= minimum_support and item not in suffix:

                found_set = [item] + suffix
                yield (found_set, support) if include_support else found_set

                cond_tree = conditional_tree_from_paths(tree.prefix_paths(item))
                for s in find_with_suffix(cond_tree, found_set):
                    yield s

    for itemset in find_with_suffix(master, []):
        yield itemset

class FPTree(object):

    Route = namedtuple('Route', 'head tail')

    def __init__(self):
        self._root = FPNode(self, None, None)

        self._routes = {}

    @property
    def root(self):
        return self._root

    def add(self, transaction):
        point = self._root

        for item in transaction:
            next_point = point.search(item)
            if next_point:
                next_point.increment()
            else:
                next_point = FPNode(self, item)
                point.add(next_point)

                self._update_route(next_point)

            point = next_point

    def _update_route(self, point):
        assert self is point.tree

        try:
            route = self._routes[point.item]
            route[1].neighbor = point
            self._routes[point.item] = self.Route(route[0], point)
        except KeyError:
            self._routes[point.item] = self.Route(point, point)

    def items(self):

        for item in self._routes.keys():
            yield (item, self.nodes(item))

    def nodes(self, item):

        try:
            node = self._routes[item][0]
        except KeyError:
            return

        while node:
            yield node
            node = node.neighbor

    def prefix_paths(self, item):

        def collect_path(node):
            path = []
            while node and not node.root:
                path.append(node)
                node = node.parent
            path.reverse()
            return path

        return (collect_path(node) for node in self.nodes(item))

    def inspect(self):
        #print('Tree:')
        self.root.inspect(1)

        #print
        #print('Routes:')
        for item, nodes in self.items():
            #print('  %r' % item)
            for node in nodes:
                print('    %r' % node)

def conditional_tree_from_paths(paths):
    tree = FPTree()
    condition_item = None
    items = set()

    for path in paths:
        if condition_item is None:
            condition_item = path[-1].item

        point = tree.root
        for node in path:
            next_point = point.search(node.item)
            if not next_point:
                # Add a new node to the tree.
                items.add(node.item)
                count = node.count if node.item == condition_item else 0
                next_point = FPNode(tree, node.item, count)
                point.add(next_point)
                tree._update_route(next_point)
            point = next_point

    assert condition_item is not None


    for path in tree.prefix_paths(condition_item):
        count = path[-1].count
        for node in reversed(path[:-1]):
            node._count += count

    return tree

class FPNode(object):

    def __init__(self, tree, item, count=1):
        self._tree = tree
        self._item = item
        self._count = count
        self._parent = None
        self._children = {}
        self._neighbor = None

    def add(self, child):

        if not isinstance(child, FPNode):
            raise TypeError("Can only add other FPNodes as children")

        if not child.item in self._children:
            self._children[child.item] = child
            child.parent = self

    def search(self, item):
        try:
            return self._children[item]
        except KeyError:
            return None

    def __contains__(self, item):
        return item in self._children

    @property
    def tree(self):
        return self._tree

    @property
    def item(self):
        return self._item

    @property
    def count(self):
        return self._count

    def increment(self):
        if self._count is None:
            raise ValueError("Root nodes have no associated count.")
        self._count += 1

    @property
    def root(self):
        return self._item is None and self._count is None

    @property
    def leaf(self):
        return len(self._children) == 0

    @property
    def parent(self):
        return self._parent

    @parent.setter
    def parent(self, value):
        if value is not None and not isinstance(value, FPNode):
            raise TypeError("A node must have an FPNode as a parent.")
        if value and value.tree is not self.tree:
            raise ValueError("Cannot have a parent from another tree.")
        self._parent = value

    @property
    def neighbor(self):
        return self._neighbor

    @neighbor.setter
    def neighbor(self, value):
        if value is not None and not isinstance(value, FPNode):
            raise TypeError("A node must have an FPNode as a neighbor.")
        if value and value.tree is not self.tree:
            raise ValueError("Cannot have a neighbor from another tree.")
        self._neighbor = value

    @property
    def children(self):
        return tuple(self._children.itervalues())

    def inspect(self, depth=0):
        #print(('  ' * depth) + repr(self))
        for child in self.children:
            child.inspect(depth + 1)

    def __repr__(self):
        if self.root:
            return "<%s (root)>" % type(self).__name__
        return "<%s %r (%r)>" % (type(self).__name__, self.item, self.count)




dataset = items_list


if __name__ == '__main__':

    start = time.time()

    frequent_itemsets = find_frequent_itemsets(dataset, minimum_support=1, include_support=True)
    #print(type(frequent_itemsets))   # print type

    result = []
    for itemset, support in frequent_itemsets:
        result.append((itemset, support))

    result = sorted(result, key=lambda i: i[0])
    for itemset, support in result:
        print(str(itemset) + ' ' + str(support))

    end = time.time()
    print('', str(end - start))

['Best-Bread'] 1
['Best-Cheese'] 1
['Best-Cheese', 'Goldenfarm-Apple'] 1
['Dairyland-Cheese'] 2
['Dairyland-Cheese', 'Best-Bread'] 1
['Dairyland-Cheese', "King's-Crab"] 1
['Dairyland-Cheese', "King's-Crab", 'Best-Bread'] 1
['Dairyland-Milk'] 2
['Dairyland-Milk', 'Best-Cheese'] 1
['Dairyland-Milk', 'Best-Cheese', 'Goldenfarm-Apple'] 1
['Dairyland-Milk', 'Goldenfarm-Apple'] 1
['Dairyland-Milk', 'Tasty-Pie'] 2
['Dairyland-Milk', 'Tasty-Pie', 'Best-Cheese'] 1
['Dairyland-Milk', 'Tasty-Pie', 'Best-Cheese', 'Goldenfarm-Apple'] 1
['Dairyland-Milk', 'Tasty-Pie', 'Goldenfarm-Apple'] 1
['Dairyland-Milk', 'Tasty-Pie', 'Westcoast-Apple'] 1
['Dairyland-Milk', 'Westcoast-Apple'] 1
['Goldenfarm-Apple'] 1
["King's-Crab"] 1
["King's-Crab", 'Best-Bread'] 1
['Sunset-Milk'] 2
['Sunset-Milk', 'Best-Bread'] 1
['Sunset-Milk', 'Dairyland-Cheese'] 2
['Sunset-Milk', 'Dairyland-Cheese', 'Best-Bread'] 1
['Sunset-Milk', 'Dairyland-Cheese', "King's-Crab"] 1
['Sunset-Milk', 'Dairyland-Cheese', "King's-Crab", 'Best-B

# Apriori

In [None]:
#!/usr/bin/python3
import os
import operator
from collections import defaultdict
from itertools import combinations, chain


class Apriori:
    """
   Parameters
   ----------

   minSupport: float
                           Minimum support value for a transaction
                           to be called interesting.
   support_count: collection.defaultdict(int)
                           Contains support count of itemsets.
                           {
                                   frozenset(): int,
                                   frozenset(): int,
                                   frozenset(): int,
                                   ...
                           }
                           frozenset(): set of items
                           int: support count of the itemset

   Methods
   -------
   read_transactions_from_file()
           Read transactions from the input file.

   get_one_itemset()
           Gets unique items from the list of transactions.

   self_cross()
           Takes union of a set with itself to form bigger sets.

   get_min_supp_itemsets()
           Returns those itemsets whose support is > minSupport

   apiori()
           Uses Apriori algotithm to find interesting
           k-itemsets.

   subsets()
           Returns subsets of a set.
   """

    def __init__(self, minSupport):
        self.support_count = defaultdict(int)
        self.minSupport = minSupport

    def read_transactions_from_file(self, transaction_file):
        """
       Parameters
       ----------
       transaction_file: txt file


       Return Type
       -----------
       List of transactions as read from file.
       Each transaction is a set of items.
               [{a, b, c}, {b, d, p, q}, {p, e}, .....]

               {a, b, c} - 1st itemset (3-itemset)
               {b, d, p, q} - 2nd itemset (4-itemset)
               {p, e} - 3rd itemset (2-itemset)
               ...
       """
        with open(transaction_file, "r") as infile:
            transactions = [set(line.rstrip("\n").split(";"))
                            for line in infile]

            return transactions

    def get_one_itemset(self, transactions):
        """
       Parameters
       ----------
       List of transactions. Each transasction
       is a set of items.
               [{a, b, c}, {b, d, p, q}, {p, e}, .....]

               {a, b, c} - 1st itemset (3-itemset)
               {b, d, p, q} - 2nd itemset (4-itemset)
               {p, e} - 3rd itemset (2-itemset)
               ...

       Return Type
       -----------
       one_itemset: set of unique items;
               {
                       frozenset({"a"}), frozenset({"b"}), frozenset({"c"}),
                       frozenset({"d"}), frozenset({"e"}), frozenset({"p"}),
                       frozenset({"q"})
               }
       """
        one_itemset = set()
        for transaction in transactions:
            for item in transaction:
                one_itemset.add(frozenset([item]))

        return one_itemset

    def self_cross(self, Ck, itemset_size):
        """
       Parameters
       ----------
       Ck: set
               a set of k-itemsets
               Size if each itemset in Ck is k(=itemset_size-1)

       itemset_size: int
               Required size of each itemset in resulting set(=k+1)

       Ck:
       {
               frozenset({"book", "pen"}),
               frozenset({"book", "dog"}),
               frozenset({"ox", "coke"}),
               ...
       }
       for a 2-itemset


       Return Type
       -----------
       Ck_plus_1: set
               a set of (k+1)-itemsets

       Ck_plus_1:
       {
               frozenset({"book", "pen", "dog"}),
               frozenset({"book", "dog", "ox"}),
               frozenset({"book", "coke", "dog"}),
               ...
       }
       """
        Ck_plus_1 = {itemset1.union(itemset2)
                     for itemset1 in Ck for itemset2 in Ck
                     if len(itemset1.union(itemset2)) == itemset_size}
        return Ck_plus_1

    def prune_Ck(self, Ck, Lk_minus_1, itemset_size):
        """
       Parameters
       ----------
       Ck: set
               a set of k-itemsets(k=itemset_size)

       Lk_minus_1: set
               a set of (k-1)-itemsets

       itemset_size: int
               (= k)

       Ck:
       {
               frozenset({"book", "dog", "copper"}),
               frozenset({"book", "dog", "water"}),
       }
       Ck_minus_1:
       {
               frozenset({"book", "dog"}),
               frozenset({"book", "copper"}),
               frozenset({"dog", "copper"})
               frozenset({"book", "water"}),
               frozenset({"dog", "water"}),
       }
       Lk_minus_1:
       {
               frozenset({"book", "copper"}),
               frozenset({"book", "dog"}),
               frozenset({"book", "water"}),
               frozenset({"water", "dog"})
       }

       Returns
       -------
       Ck_: set
               a set of k-itemsets
       Ck_:
       {
               frozenset({"book", "dog", "water"})
       } those Ck's whose Ck_minus_1's are in Lk_minus_1

       """
        Ck_ = set()
        for itemset in Ck:
            Ck_minus_1 = list(combinations(itemset, itemset_size-1))
            flag = 0
            for subset in Ck_minus_1:
                if not frozenset(subset) in Lk_minus_1:
                    flag = 1
                    break
            if flag == 0:
                Ck_.add(itemset)
        return Ck_

    def get_min_supp_itemsets(self, Ck, transactions):
        """
       Parameters
       ----------
       Ck: set
               a set of k-itemsets
       Transactions: list
               list of transactions. Each transaction is
               a set of items.
               [{a, b, c}, {b, d, p, q}, {p, e}, .....]

       Returns
       -------
       Lk: set
               a set of k-itemsets
               set of itemsets whose support is > minSupport

       """
        temp_freq = defaultdict(int)

        # update support count of each itemset
        for transaction in transactions:
            for itemset in Ck:
                if itemset.issubset(transaction):
                    temp_freq[itemset] += 1
                    self.support_count[itemset] += 1

        N = len(transactions)
        Lk = [itemset for itemset, freq in temp_freq.items()
              if freq/N > self.minSupport]
        return set(Lk)

    def frequent_item_set(self, transactions):
        """
       Parameters
       ----------
       transactions: list
               list of transactions. Each transaction is
                       a set of items.
                       [{a, b, c}, {b, d, p, q}, {p, e}, .....]

       Returns
       -------
       K_itemsets: dict
       {
               1: {frozenset({"dog"}), frozenset({"ox"}), ....}
               2: {frozenset({"dog", "water"}), frozenset({"book", "copper"}), .....}
               3: {frozenset({"dog", "ox", "gold"}), frozenset({"water", "dog", ox}), ...}
       }
               key: value
               int: set of frozensets of size = value of key

               each itemset in K_itemset has support > minSupport
       """
        K_itemsets = dict()
        Ck = self.get_one_itemset(transactions)
        Lk = self.get_min_supp_itemsets(Ck, transactions)
        k = 2
        while len(Lk) != 0:
            K_itemsets[k-1] = Lk
            Ck = self.self_cross(Lk, k)
            Ck = self.prune_Ck(Ck, Lk, k)
            Lk = self.get_min_supp_itemsets(Ck, transactions)
            k += 1

        return K_itemsets

    def subsets(self, iterable):
        """
        Parameters
        ----------
        iterable: an itearble container like set

        Returns
        -------
        subsets_: list powerset of elements in the iterable container
                [
                        frozenset(),
                        frozenset({a}), frozenset({b}),
                        frozenset({a, b})
                ] if iterable is like {a, b}
       """
        list_ = list(iterable)
        subsets_ = chain.from_iterable(combinations(list_, len)
                                       for len in range(len(list_)+1))
        subsets_ = list(map(frozenset, subsets_))

        return subsets_

    def write_part_1(self, K_itemsets):
        """
        Writes the frequent itemsets with their support to a file.
        """
        main_dir = "./results/part_1/"
        if not os.path.exists(main_dir):
            os.makedirs(main_dir)

        outfile_path = "./results/part_1/patterns.txt"
        with open(outfile_path, "w") as outfile:
            for key, values in K_itemsets.items():
                if key > 1:
                    break
                for value in values:
                    support_ct = self.support_count[value]
                    outfile.write("{support}:{label}\n".format(
                        support=support_ct,
                        label=";".join(list(value))
                    ))

    def write_part_2(self, K_itemsets):
        """
        Writes the frequent itemsets with their support to a file.
        """
        main_dir = './results/part_2'
        if not os.path.exists(main_dir):
            os.makedirs(main_dir)

        outfile_path = "./results/part_2/patterns.txt"
        with open(outfile_path, "w") as outfile:
            for key, values in K_itemsets.items():
                for value in values:
                    support_ct = self.support_count[value]
                    outfile.write("{support}:{label}\n".format(
                        support=support_ct,
                        label=";".join(list(value))
                    ))


if __name__ == "__main__":
    # in_transaction_file = "./categories.txt"

    ap = Apriori(minSupport=0.01)
    transactions = [set(items) for items in items_list]
    K_itemsets = ap.frequent_item_set(transactions)
    # ap.write_part_1(K_itemsets)
    # ap.write_part_2(K_itemsets)

    K_itemsets

In [None]:
K_itemsets

{1: {frozenset({'Dairyland-Cheese'}),
  frozenset({'Sunset-Milk'}),
  frozenset({"King's-Crab"}),
  frozenset({'Best-Cheese'}),
  frozenset({'Westcoast-Apple'}),
  frozenset({'Best-Bread'}),
  frozenset({'Goldenfarm-Apple'}),
  frozenset({'Tasty-Pie'}),
  frozenset({'Dairyland-Milk'}),
  frozenset({'Wonder-Bread'})},
 2: {frozenset({'Dairyland-Cheese', "King's-Crab"}),
  frozenset({'Dairyland-Cheese', 'Sunset-Milk'}),
  frozenset({'Dairyland-Milk', 'Westcoast-Apple'}),
  frozenset({'Tasty-Pie', 'Wonder-Bread'}),
  frozenset({'Best-Bread', "King's-Crab"}),
  frozenset({'Dairyland-Cheese', 'Wonder-Bread'}),
  frozenset({'Dairyland-Milk', 'Goldenfarm-Apple'}),
  frozenset({'Best-Bread', 'Sunset-Milk'}),
  frozenset({'Dairyland-Milk', 'Wonder-Bread'}),
  frozenset({'Best-Cheese', 'Goldenfarm-Apple'}),
  frozenset({"King's-Crab", 'Sunset-Milk'}),
  frozenset({'Best-Cheese', 'Wonder-Bread'}),
  frozenset({'Best-Cheese', 'Tasty-Pie'}),
  frozenset({'Best-Cheese', 'Dairyland-Milk'}),
  frozens

# Use Library

***FP-Growth***

In [None]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder, asso

  and should_run_async(code)


In [None]:
from mlxtend.frequent_patterns import fpgrowth

In [None]:
items_list

  and should_run_async(code)


[["King's-Crab", 'Sunset-Milk', 'Dairyland-Cheese', 'Best-Bread'],
 ['Best-Cheese',
  'Dairyland-Milk',
  'Goldenfarm-Apple',
  'Tasty-Pie',
  'Wonder-Bread'],
 ['Westcoast-Apple', 'Dairyland-Milk', 'Wonder-Bread', 'Tasty-Pie'],
 ['Wonder-Bread', 'Sunset-Milk', 'Dairyland-Cheese']]

In [None]:
te = TransactionEncoder()
te_ary = te.fit(items_list).transform(items_list)
df = pd.DataFrame(te_ary, columns=te.columns_)
df

  and should_run_async(code)


Unnamed: 0,Best-Bread,Best-Cheese,Dairyland-Cheese,Dairyland-Milk,Goldenfarm-Apple,King's-Crab,Sunset-Milk,Tasty-Pie,Westcoast-Apple,Wonder-Bread
0,True,False,True,False,False,True,True,False,False,False
1,False,True,False,True,True,False,False,True,False,True
2,False,False,False,True,False,False,False,True,True,True
3,False,False,True,False,False,False,True,False,False,True


In [None]:
fpgrowth(df, min_support=0.3)

  and should_run_async(code)


Unnamed: 0,support,itemsets
0,0.5,(6)
1,0.5,(2)
2,0.75,(9)
3,0.5,(7)
4,0.5,(3)
5,0.5,"(2, 6)"
6,0.5,"(9, 7)"
7,0.5,"(3, 7)"
8,0.5,"(9, 3)"
9,0.5,"(9, 3, 7)"


In [None]:
fpgrowth(df, min_support=0.3, use_colnames=True)

  and should_run_async(code)


Unnamed: 0,support,itemsets
0,0.5,(Sunset-Milk)
1,0.5,(Dairyland-Cheese)
2,0.75,(Wonder-Bread)
3,0.5,(Tasty-Pie)
4,0.5,(Dairyland-Milk)
5,0.5,"(Dairyland-Cheese, Sunset-Milk)"
6,0.5,"(Tasty-Pie, Wonder-Bread)"
7,0.5,"(Tasty-Pie, Dairyland-Milk)"
8,0.5,"(Wonder-Bread, Dairyland-Milk)"
9,0.5,"(Tasty-Pie, Wonder-Bread, Dairyland-Milk)"


***Apriori***

In [None]:
pip install apriori-python

  and should_run_async(code)


Collecting apriori-python
  Downloading apriori_python-1.0.4-py3-none-any.whl (5.0 kB)
Installing collected packages: apriori-python
Successfully installed apriori-python-1.0.4


In [None]:
from apriori_python import apriori

  and should_run_async(code)


In [None]:
items_list

  and should_run_async(code)


[["King's-Crab", 'Sunset-Milk', 'Dairyland-Cheese', 'Best-Bread'],
 ['Best-Cheese',
  'Dairyland-Milk',
  'Goldenfarm-Apple',
  'Tasty-Pie',
  'Wonder-Bread'],
 ['Westcoast-Apple', 'Dairyland-Milk', 'Wonder-Bread', 'Tasty-Pie'],
 ['Wonder-Bread', 'Sunset-Milk', 'Dairyland-Cheese']]

In [None]:
freq_items, rules = apriori(itemSetList=items_list, minSup=0.3, minConf=0.4)

  and should_run_async(code)


In [None]:
print(freq_items)

{1: {frozenset({'Dairyland-Cheese'}), frozenset({'Wonder-Bread'}), frozenset({'Tasty-Pie'}), frozenset({'Dairyland-Milk'}), frozenset({'Sunset-Milk'})}, 2: {frozenset({'Tasty-Pie', 'Wonder-Bread'}), frozenset({'Dairyland-Cheese', 'Sunset-Milk'}), frozenset({'Tasty-Pie', 'Dairyland-Milk'}), frozenset({'Wonder-Bread', 'Dairyland-Milk'})}, 3: {frozenset({'Tasty-Pie', 'Wonder-Bread', 'Dairyland-Milk'})}}


  and should_run_async(code)


In [None]:
for key, values in freq_items.items():
  print('- Number of items: ', key)
  print(' Items set list: ')
  print(list(values))
  print('-'*10)

- Number of items:  1
 Items set list: 
[frozenset({'Dairyland-Cheese'}), frozenset({'Wonder-Bread'}), frozenset({'Tasty-Pie'}), frozenset({'Dairyland-Milk'}), frozenset({'Sunset-Milk'})]
----------
- Number of items:  2
 Items set list: 
[frozenset({'Tasty-Pie', 'Wonder-Bread'}), frozenset({'Dairyland-Cheese', 'Sunset-Milk'}), frozenset({'Tasty-Pie', 'Dairyland-Milk'}), frozenset({'Wonder-Bread', 'Dairyland-Milk'})]
----------
- Number of items:  3
 Items set list: 
[frozenset({'Tasty-Pie', 'Wonder-Bread', 'Dairyland-Milk'})]
----------


  and should_run_async(code)


  and should_run_async(code)


# Use mlextend library

In [None]:
df

  and should_run_async(code)


Unnamed: 0,Best-Bread,Best-Cheese,Dairyland-Cheese,Dairyland-Milk,Goldenfarm-Apple,King's-Crab,Sunset-Milk,Tasty-Pie,Westcoast-Apple,Wonder-Bread
0,True,False,True,False,False,True,True,False,False,False
1,False,True,False,True,True,False,False,True,False,True
2,False,False,False,True,False,False,False,True,True,True
3,False,False,True,False,False,False,True,False,False,True


In [1]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, fpmax, fpgrowth

In [3]:
df = pd.read_csv('/content/market.csv', sep=';')
df

  and should_run_async(code)


Unnamed: 0,Bread,Honey,Bacon,Toothpaste,Banana,Apple,Hazelnut,Cheese,Meat,Carrot,...,Milk,Butter,ShavingFoam,Salt,Flour,HeavyCream,Egg,Olive,Shampoo,Sugar
0,1,0,1,0,1,1,1,0,0,1,...,0,0,0,0,0,1,1,0,0,1
1,1,1,1,0,1,1,1,0,0,0,...,1,1,0,0,1,0,0,1,1,0
2,0,1,1,1,1,1,1,1,1,0,...,1,0,1,1,1,1,1,0,0,1
3,1,1,0,1,0,1,0,0,0,0,...,1,0,0,0,1,0,1,1,1,0
4,0,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
459,0,1,1,0,0,1,1,1,1,1,...,1,1,0,0,1,1,1,1,1,0
460,0,0,1,0,0,0,1,0,1,0,...,0,0,1,0,0,0,1,0,0,1
461,0,0,0,0,0,1,0,1,1,0,...,1,0,0,0,0,0,1,0,0,0
462,1,0,0,1,1,0,1,1,0,1,...,1,0,0,0,1,0,1,1,0,1


In [4]:
frequent_itemsets = fpgrowth(df, min_support=0.3, use_colnames=True)
### alternatively:
#frequent_itemsets = apriori(df, min_support=0.6, use_colnames=True)
#frequent_itemsets = fpmax(df, min_support=0.6, use_colnames=True)

frequent_itemsets

  and should_run_async(code)


Unnamed: 0,support,itemsets
0,0.448276,(Banana)
1,0.431034,(Bacon)
2,0.420259,(Hazelnut)
3,0.415948,(HeavyCream)
4,0.413793,(Carrot)
5,0.407328,(Bread)
6,0.405172,(Apple)
7,0.403017,(Egg)
8,0.366379,(Sugar)
9,0.415948,(Honey)


In [7]:
from mlxtend.frequent_patterns import association_rules

association_rules(frequent_itemsets, metric="confidence", min_threshold=0.4)

  and should_run_async(code)


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric


In [8]:
frequent_itemsets = apriori(df, min_support=0.3, use_colnames=True)
#frequent_itemsets = fpmax(df, min_support=0.6, use_colnames=True)

frequent_itemsets

  and should_run_async(code)


Unnamed: 0,support,itemsets
0,0.407328,(Bread)
1,0.415948,(Honey)
2,0.431034,(Bacon)
3,0.383621,(Toothpaste)
4,0.448276,(Banana)
5,0.405172,(Apple)
6,0.420259,(Hazelnut)
7,0.443966,(Cheese)
8,0.387931,(Meat)
9,0.413793,(Carrot)


In [9]:
from mlxtend.frequent_patterns import association_rules

association_rules(frequent_itemsets, metric="confidence", min_threshold=0.4)

  and should_run_async(code)


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric


# Performance Comparison of Apriori and FP-growth Algorithms for Frequent Itemset Mining

For small to medium-sized datasets, both algorithms perform reasonably well.

However, as the dataset size increases, the FP-Growth algorithm is superior for a smaller amount of dataset and rules;

for the Apriori algorithm, its superior for a large amount of dataset, while for the Apriori algorithm with preprocessing superior for finding a large number of rules.