<h1><center>Hybrid Method - Amazon Beauty Products</center></h1>

A hybrid recommender system is one that combines multiple recommendation techniques together to produce its output. 

It could mean hybridizing multiple different techniques of the same type, for instance, two different content-based recommenders could be combined. There are lots of examples of real-world systems that try to combine the strengths of the two  approaches, usually to deal with the cold-start problems.

This can be done in a number of ways:

Combine the predictions of a content-based system and a collaborative system.

1. Incorporate content-based techniques into a collaborative approach.
2. Incorporarte collaborative techniques into a content-based approach.

The unifying model can be highly effective and successful.

In [1]:
import pandas as pd
import numpy as np
import surprise as sp

pd.set_option('display.max_columns', None)  
pd.set_option('display.expand_frame_repr', False)
pd.set_option('max_colwidth', -1)

%matplotlib inline
import matplotlib.pyplot as plt
import warnings 
warnings.filterwarnings('ignore')

import gc

In [2]:
from sklearn.preprocessing import LabelEncoder, MinMaxScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.decomposition import NMF

Read the cleaned file into dataframes.

In [3]:
meta = pd.read_csv('cleaned_metadata.csv',index_col=0)
reviews = pd.read_csv('cleaned_reviews.csv',index_col=0)

In [4]:
reviews[reviews['review'].isnull()]

Unnamed: 0,reviewerID,asin,overall,reviewTime,review,upvotes,downvotes,word_count,polarity
1360701,A1LF7KGH427GYC,B00552467Q,5,2014-07-08,,0,0,2,0.0
1467181,AUII249W6FBZY,B00604MV0C,3,2013-01-11,,0,0,2,0.0
1784718,A3BX2SSKN9SFRV,B00ADXZ9LY,1,2014-07-03,,0,0,4,0.0


In [5]:
d = {5: "Excellent", 4:"Very Good", 3: "Good", 2:"Not Good", 1: "Bad"}
s = reviews.overall.map(d)
reviews['review'] = reviews['review'].combine_first(s)

In [6]:
d=None
s=None

In [7]:
reviews['review'] = reviews['review'].str.strip()

In [8]:
meta['description'] = meta['description'].fillna(meta['main_cat'])
meta['brand_title'] = meta['brand_title'].fillna(meta['main_cat'])
meta['price'] = meta['price'].fillna(0)

In [9]:
reviews.head(2)

Unnamed: 0,reviewerID,asin,overall,reviewTime,review,upvotes,downvotes,word_count,polarity
0,A39HTATAQ9V7YF,205616461,5,2013-05-28,bioactive antiaging serum love moisturizer would recommend someone dry skin fine lines wrinkles using brand day night serum,0,0,34,0.283333
1,A3JM6GV9MNOF9X,558925278,3,2012-12-14,product ok im use baby kabuki moment received product deadlinei tested baby kabuki quality material best packaging cute love itthe fibers smell soft,0,1,44,0.52


### There are 7 different types of hybrid methods:

• Weighted: The score of different recommendation components are combined numerically. 

• Switching: The system chooses among recommendation components and applies the selected one. 

• Feature Combination: Features derived from different knowledge sources are combined together and given to a single recommendation algorithm. 

• Mixed: Recommendations from different recommenders are presented together. 

• Feature Augmentation: One recommendation technique is used to compute a feature or set of features, which is then part of the input to the next technique. 

• Cascade: Recommenders are given strict priority, with the lower priority ones breaking ties in the scoring of the higher ones. 

• Meta-level: One recommendation technique is applied and produces some sort of model, which is then the input used by the next technique.   

We will explore the first 3 types with our datasets.

In [10]:
data = meta.merge(reviews, on='asin')

In [11]:
data[data['asin']=='B002QGJX6I'][:5]

Unnamed: 0,asin,description,price,brand_title,health_personal_care,beauty,main_cat,sub_cat,related_count,reviewerID,overall,reviewTime,review,upvotes,downvotes,word_count,polarity
908114,B002QGJX6I,medium hold antifrizz antihumidity volumizer plumps thickens creating instant body nanoionic complex emits natural negative ions leaving hair soft shiny keeping hair color vibrant suitable normal fine hair,8.79,Bio Ionic Bio Ionic Thermal Active Bodifying Blow,-1.0,84572.0,Hair Care,Styling Products,72.0,A2HSO93J235USY,3,2014-03-11,im using curling spray noticed real brought curl hair using days dont blow dry leave curly didnt give noticeable volume blow dry,0,0,43,0.022222
908115,B002QGJX6I,medium hold antifrizz antihumidity volumizer plumps thickens creating instant body nanoionic complex emits natural negative ions leaving hair soft shiny keeping hair color vibrant suitable normal fine hair,8.79,Bio Ionic Bio Ionic Thermal Active Bodifying Blow,-1.0,84572.0,Hair Care,Styling Products,72.0,A21QM9WMBTSHWW,1,2014-06-15,sticky messy like product ended throwing away recommend purchasing product,0,1,25,-0.2
908116,B002QGJX6I,medium hold antifrizz antihumidity volumizer plumps thickens creating instant body nanoionic complex emits natural negative ions leaving hair soft shiny keeping hair color vibrant suitable normal fine hair,8.79,Bio Ionic Bio Ionic Thermal Active Bodifying Blow,-1.0,84572.0,Hair Care,Styling Products,72.0,A1MTSRKTBRTGJP,5,2013-07-01,volume love fact gives thin hair body purchased multiple times amazon amazing prompt shipping definitely notice difference body shine improved use first introduced hairdresser,0,0,45,0.158333
908117,B002QGJX6I,medium hold antifrizz antihumidity volumizer plumps thickens creating instant body nanoionic complex emits natural negative ions leaving hair soft shiny keeping hair color vibrant suitable normal fine hair,8.79,Bio Ionic Bio Ionic Thermal Active Bodifying Blow,-1.0,84572.0,Hair Care,Styling Products,72.0,A3V3WCIJ4JS4QS,5,2010-01-30,great body purchased product bio ionic hairdryer work great together fact get much body lift hair need learn tame little,1,1,40,0.279167
908118,B002QGJX6I,medium hold antifrizz antihumidity volumizer plumps thickens creating instant body nanoionic complex emits natural negative ions leaving hair soft shiny keeping hair color vibrant suitable normal fine hair,8.79,Bio Ionic Bio Ionic Thermal Active Bodifying Blow,-1.0,84572.0,Hair Care,Styling Products,72.0,A3G7JVWVQ6UET2,5,2012-12-25,great adding volume must use blow dryer hair salon recommended shampoo days dont use volume shampoo spray works well though must used round brush blow dryer work best volume,2,2,46,0.533333


In [12]:
#data.loc[data['asin'] == 'B0055MYJ0U', 'reviewerID'].unique()
print (data.shape)

(2023070, 17)


In [13]:
usersperasin = data['asin'].value_counts()
asinsperuser = data['reviewerID'].value_counts()

In [14]:
data_s = data[data['asin'].isin(usersperasin[usersperasin>5].index) & data['reviewerID'].isin(asinsperuser[asinsperuser>5].index)]

#data_s = reviews[reviews['asin'].isin(usersperasin[usersperasin>10].index)]

In [15]:
data_s.reset_index(inplace=True)

In [16]:
data_s.head(1)

Unnamed: 0,index,asin,description,price,brand_title,health_personal_care,beauty,main_cat,sub_cat,related_count,reviewerID,overall,reviewTime,review,upvotes,downvotes,word_count,polarity
0,34,1304351475,too faced natural eyes shadow palette colors include heaven silk teddy nude beach velvet revolver pushup honey pot sexspresso erotica cocoa puff collectible tin version new box full size,33.99,Omagazee NEW EUROPEAN COLLECTION Too Faced Natural,-1.0,15567.0,Makeup,Eyes,309.0,A274NIJWOQWE30,5,2013-11-24,thank best price ever love pallet buying natural eye faced year love colors one use best price ever seen listed pallet kind comes mirror awesome buy,2,3,67,0.671429


In [17]:
print (data_s.shape)
print (data_s.reviewerID.nunique())
print (data_s.asin.nunique())

(318158, 18)
36065
48267


In [18]:
review_s = data_s[['asin', 'reviewerID', 'overall']].copy()
review_s.columns

Index(['asin', 'reviewerID', 'overall'], dtype='object')

In [19]:
print (review_s.shape)
print (review_s.reviewerID.nunique())

(318158, 3)
36065


In [20]:
# Collaborative Filtering

def predict_cf(userid, itemid):
    is_target = (data_s['reviewerID'] == userid) & (data_s['asin'] == itemid)
    target = data_s[is_target].iloc[0]
    
    train_set = sp.Dataset.load_from_df(
        data_s[~is_target][['reviewerID', 'asin', 'overall']], 
        sp.Reader(rating_scale=(1,5))
    ).build_full_trainset()

    algo = sp.KNNBasic(verbose=False)
    algo.fit(train_set)
    prediction = algo.predict(target['reviewerID'], target['asin'], verbose=False)
    return prediction.est, prediction.est - target['overall'], target['overall']

In [21]:
# Content-Based

def predict_cn(df, userid, itemid):
    user_ratings = review_s[review_s['reviewerID'] == userid].join(df.set_index('asin'), on='asin')
    is_target = (user_ratings['asin'] == itemid)
    
    features = pd.get_dummies(user_ratings.drop(columns=['overall']))
    train_features = features[~is_target]
    target_features = features[is_target]
    
    encoder = LabelEncoder()
    train_labels = encoder.fit_transform(user_ratings[~is_target]['overall'])
    target_label = user_ratings[is_target]['overall'].iloc[0]

    clf = KNeighborsClassifier(n_neighbors=1)
    clf.fit(np.nan_to_num(train_features), train_labels)
    prediction = encoder.inverse_transform(clf.predict(target_features))[0]
    return prediction, prediction - target_label, target_label

In [22]:
def test_classifier(user, item):
    pred_cf, error_cf, truth = predict_cf(user, item)
    pred_cn, error_cn, truth = predict_cn(meta, user, item)
    print("Results for {} on wine with id {}:".format(user, item))
    print("Collaborative Filtering: \t prediction: {:.5f} \t error: {:.5f}".format(pred_cf, error_cf))
    print("Content-Based: \t\t\t prediction: {:.5f} \t error: {:.5f}".format(pred_cn, error_cn))

In [None]:
is_target = (data_s['reviewerID'] == 'A274NIJWOQWE30') & (data_s['asin'] == '1304351475')

In [None]:
data_s[is_target]

In [23]:
test_classifier(user='A274NIJWOQWE30', item='1304351475')

Results for A274NIJWOQWE30 on wine with id 1304351475:
Collaborative Filtering: 	 prediction: 4.19354 	 error: -0.80646
Content-Based: 			 prediction: 5.00000 	 error: 0.00000


In [24]:
data_s = data[data['asin'].isin(usersperasin[usersperasin>20].index) & data['reviewerID'].isin(asinsperuser[asinsperuser>20].index)]
data_s.reset_index(inplace=True)
data_s.shape

(58941, 18)

In [25]:
data_s.head(2)

Unnamed: 0,index,asin,description,price,brand_title,health_personal_care,beauty,main_cat,sub_cat,related_count,reviewerID,overall,reviewTime,review,upvotes,downvotes,word_count,polarity
0,365,7806397051,an extensive range 15 multiple vibrant long wear concealer colour different skin tones create 10000 amazing looks using commonly applied shades ensures best skin colour match guarantees traceless natural finish enabling layering mixing provides total camouflage almost skin problem including blemishes scars birthmarks black circles it also suitable use bronzer the light colour suitable redness acne on the medium colour perfect dark shadows undereye area the dark colour provides exceptional camouflage adheres well skin silky glossy colour high quality ingredients together care skin around last day long it perfect professional salon wedding party home use size 154 x 102 x 13cm each diameter 26cm concealer 15 color concealer,5.04,COKA WAWO 15 Color Professionl Makeup Eyeshadow Ca,-1.0,10486.0,Makeup,Face,219.0,A3BTN14HIZET6Z,5,2013-04-15,nice palette happy get palette wish offered subscription form like makeup use daily unnamed set lot blending sculpting highlighting concealing product nicely pigmented smooth applies well blends beautifully normal dry skin im 59 years works wonderfully foundation also use silicone based primerwhich works well skin type give fresh dewy look loveyou get good amount product money well worth cost say one willing repurchase product says mere words,1,2,137,0.348718
1,488,9746427962,with age skin loses ability produce energy deep within skins cells as result wrinkles appear deeper wrinkles accentuated product action now 100 q10 ever part high performing antiwrinkle system nivea visage antiwrinkle q10plus night care activates skins energy metabolism within thereby continuously fighting wrinkles deeper layers skin during night skin receptive moisture intake intensively moisturizing formula replenishes regenerates skin effectively daily stress soft supple skin morning how use after cleansing face apply cream face neck every night,12.99,Nivea Nivea Visage Q10 Plus Anti Wrinkle Night Car,-1.0,7082.0,Skin Care,Face,124.0,A18M9192WX98HP,5,2014-04-14,love stuff used buy made usa really liked missed able use regret order cream think worth skin really likes product seems work better anything smooth lines nothing gets rid makes appear less think makes skin improve time ill keep ordering unless nivea starts selling us,1,1,87,0.259259


### Weighted Recommender

In this type, the outputs of the collaborative and content-based methods are combined using a linear weighting scheme. It is the simplest design for hybrid system. 

In [26]:
def predict_weighted(user, item):
    prediction_cf, _, truth = predict_cf(user, item)
    prediction_cn, _, truth = predict_cn(meta, user, item)
    
    # Weights can be chosen differently, depending on 
    # the (assumed) quality of the recommenders
    prediction = 0.5 * prediction_cf + 0.5 * prediction_cn
    error = prediction - truth
    return prediction, error, truth

pred_weighted, error_weighted, truth = predict_weighted(user='A3BTN14HIZET6Z', item='7806397051')
print("Weighted Hybrid: \t prediction: {:.5f} \t error: {:.5f}".format(pred_weighted, error_weighted))

pred_weighted, error_weighted, truth = predict_weighted(user='A18M9192WX98HP', item='9746427962')
print("Weighted Hybrid: \t prediction: {:.5f} \t error: {:.5f}".format(pred_weighted, error_weighted))

Weighted Hybrid: 	 prediction: 4.58907 	 error: -0.41093
Weighted Hybrid: 	 prediction: 4.58907 	 error: -0.41093


### Feature Combination

The idea of feature combination is to inject features of one source (such as collaborative recommendation) into an algorithm designed to process data with a different source (such a content-based recommendation). This method is used to improve the data that is used by pedict_cn. The NMF class from sklearn.decomposition can be used to perform Matrix Factorization.

In [27]:
ratings_matrix = pd.pivot_table(data=data_s, values='overall', index='asin', columns='reviewerID')

# NaN can be filld with the mean of either 
# the user's or the item's ratings.
ratings_matrix = ratings_matrix.fillna(ratings_matrix.mean())

model = NMF(n_components=3, init='random', random_state=0)

W = model.fit_transform(ratings_matrix)

asin_plus = pd.concat([meta, pd.DataFrame(W)], axis=1)

pd.DataFrame(W)

Unnamed: 0,0,1,2
0,1.518081,1.196979,1.103308
1,1.518095,1.197225,1.103086
2,1.518836,1.195393,1.103484
3,1.517373,1.195917,1.104289
4,1.516930,1.195497,1.106870
...,...,...,...
12580,1.519381,1.194704,1.105716
12581,1.518161,1.196824,1.103465
12582,1.518240,1.196049,1.104177
12583,1.518349,1.196672,1.103166


In [28]:
pred_weighted, error_weighted, truth = predict_cn(asin_plus, 'A3BTN14HIZET6Z', '7806397051')
print("Weighted Hybrid: \t prediction: {:.5f} \t error: {:.5f}".format(pred_weighted, error_weighted))

Weighted Hybrid: 	 prediction: 5.00000 	 error: 0.00000


In [29]:
pred_weighted, error_weighted, truth = predict_cn(asin_plus, 'A18M9192WX98HP', '9746427962')
print("Weighted Hybrid: \t prediction: {:.5f} \t error: {:.5f}".format(pred_weighted, error_weighted))

Weighted Hybrid: 	 prediction: 5.00000 	 error: 0.00000


This method improved the prediction of the content-based recommender and gave an accurate prediction.

### Switching Hybrid

This method selects a single recommender from among its constituents based on the recommendation situation. Different recommenders might get chosen for different profiles. It begins the process by selecting one of its components as appropriate in the current situation, based on its switching criteria. Once that choice is made, the unchosen component has no role in the remaining recommendation process. 

In [30]:
def predict_switching(df, user, item):
    # The selection of the recommender is done based on the 
    # number of ratings that have been recorded for the item.
    num_ratings = len(data_s[data_s['asin'] == item])
    if num_ratings > 3:
        print('Using Collaborative Filtering recommender')
        return predict_cf(user, item)
    else:
        print('Using Content-Based recommender')
        return predict_cn(meta, user, item)


pred, error, truth = predict_switching(meta, user='A274NIJWOQWE30', item='1304351475')
print("Switching Hybrid: \t prediction: {:.5f} \t error: {:.5f}".format(pred, error))

pred, error, truth = predict_switching(meta, user='A2JAEZ0FMAMJVW', item='B009FKNGGQ')
print("Switching Hybrid: \t prediction: {:.5f} \t error: {:.5f}".format(pred, error))

Using Content-Based recommender
Switching Hybrid: 	 prediction: 5.00000 	 error: 0.00000
Using Collaborative Filtering recommender
Switching Hybrid: 	 prediction: 4.10532 	 error: 0.10532


In [31]:
data_s.loc[data_s['asin'] == 'B001MA0QY2', 'brand_title'].unique()  

array(['HSI PROFESSIONAL HSI PROFESSIONAL 1 CERAMIC TOURMA'], dtype=object)