# Recommendations based on Frequently Reviewed Together (association rules)
For the final part of this assignment, you can turn to 5.4 in the Practical Recommender Systems book (pp 113-127). Read this chapter and [download](https://www.manning.com/downloads/1927) the code accompanied by the book. Explore `association_rules_calculator.py` in the `builder` directory and translate it to this notebook. Falk uses a different infrastructure, but it is pretty simple to adapt this code. We will provide some guidelines below to speed up the process.

The steps found in the source code are:
1. Opening the data
2. Generating the transactions or, in our case reviews
3. Calculate the Support Confidence
4. Save the results

### 1. Opening the data
Since we are not using a database but `.csv` files, we can load them into a dataframe. Decide which data is necessary since we look for user A reviewed x and y.

In [54]:
import pandas as pd
df = pd.read_csv('data/BX-Book-Ratings-Subset.csv', sep=';', encoding='latin-1')

### 2. Generating the reviews
What we want is a list containing lists of reviews belonging together. In the case of a shopping list, the output we used was
`[['eggs','milk','bread'], ['bacon', 'bread'], [...], [...]]`

In [None]:
df_reviews = df.groupby('User-ID')['ISBN'].apply(list)
reviewed = df_reviews.values.tolist()

### 3. Calculate the Support Confidence
This requires some puzzling, but looking at the source code will give you a clear idea. You can reuse the subroutines in the source code and pass along the list containing the reviews belonging together. Play around with the _minimum support_ parameter. Too strict will result in fewer associations.

In [None]:
# this code originated from the book Practical Recommender System. 
# Some minor tweaks to make it work with the current dataset.

from collections import defaultdict
from itertools import combinations
from datetime import datetime

def calculate_itemsets_one(reviewed, min_sup=0.01):
    N = len(reviewed)
    print(N)
    temp = defaultdict(int)
    one_itemsets = dict()

    for items in reviewed:
        for item in items:
            inx = frozenset({item})
            temp[inx] += 1

    print("temp:")
    i = 0
    # remove all items that is not supported.
    for key, itemset in temp.items():
        #print(f"{key}, {itemset}, {min_sup}, {min_sup * N}")
        if itemset > min_sup * N:
            i = i + 1
            one_itemsets[key] = itemset
    print(i)
    return one_itemsets

def calculate_itemsets_two(reviewed, one_itemsets):
    two_itemsets = defaultdict(int)

    for items in reviewed:
        items = list(set(items))  # remove duplications

        if (len(items) > 2):
            for perm in combinations(items, 2):
                if has_support(perm, one_itemsets):
                    two_itemsets[frozenset(perm)] += 1
        elif len(items) == 2:
            if has_support(items, one_itemsets):
                two_itemsets[frozenset(items)] += 1
    return two_itemsets

def calculate_association_rules(one_itemsets, two_itemsets, N):
    timestamp = datetime.now()

    rules = []
    for source, source_freq in one_itemsets.items():
        for key, group_freq in two_itemsets.items():
            if source.issubset(key):
                target = key.difference(source)
                support = group_freq / N
                confidence = group_freq / source_freq
                rules.append((timestamp, next(iter(source)), next(iter(target)),
                              confidence, support))
    return rules

def has_support(perm, one_itemsets):
  return frozenset({perm[0]}) in one_itemsets and \
    frozenset({perm[1]}) in one_itemsets

min_sup = 0.01
N = len(reviewed)



In [None]:
one_itemsets = calculate_itemsets_one(reviewed, min_sup)
two_itemsets = calculate_itemsets_two(reviewed, one_itemsets)
rules = calculate_association_rules(one_itemsets, two_itemsets, N)

# check how many associations are made
len(rules)

### 4. Save the results
Create a dataframe for the results of step 3. In order to make it work with the current app please make sure the columns are `source;target;support;confidence`. Save the recommendations as `recommendations-seeded-associations.csv` and replace the file in the app directory.

In [None]:
associations = []

# iterate through results and create data structure containing the results
for rule in rules:
  association = {
    'source':str(rule[1]),
    'target':str(rule[2]),
    'support':rule[3],
    'confidence':rule[4]
  }
  # append to list
  associations.append(association)

# create dataframe
df_associations = pd.DataFrame(associations) 

df_associations.to_csv('recommendations-seeded-associations.csv', index=False, sep=';')