# **Content-Based Recommender System**

A content-based recommender system is created using the product titles of the data in the Amazon dataset. Using the cosine similarity of the product titles, the system will recommend other products with similar titles to the users. 

Firstly, settings were made to display more words in dataframe.

In [9]:
#Settings to display more words in dataframe
import pandas as pd

pd.set_option('display.max_colwidth',10000)

Next, we load the dataset into a dataframe.

In [10]:
data = pd.read_csv('amazon_dataset_w_avgRating_v2.csv')

df = pd.DataFrame(data)
df = df.drop(['Unnamed: 0'], axis = 1)
df

Unnamed: 0,marketplace,customer_id,review_id,product_id,product_parent,product_title,product_category,star_rating,helpful_votes,total_votes,vine,verified_purchase,review_headline,review_body,review_date,predicted_sentiment,avg_rating
0,US,1797882.0,R3I2DHQBR577SS,B001ANOOOE,2102612.0,"The Naked Bee Vitmin C Moisturizing Sunscreen SPF 30, 5.5 oz (163 ml.)",Beauty,5.0,0.0,0.0,N,Y,Five Stars,love excel sun block,2015-08-31,5,5.0
1,US,18381298.0,R1QNE9NQFJC2Y4,B0016J22EQ,106393691.0,"Alba Botanica Sunless Tanning Lotion, 4 Ounce",Beauty,5.0,0.0,0.0,N,Y,Thank you Alba Bontanica!,great thing cream doesnt smell weird like chemic laden one get nice healthi unfak look tan isnt orang make skin soft,2015-08-31,5,5.0
2,US,19242472.0,R3LIDG2Q4LJBAO,B00HU6UQAG,375449471.0,"Elysee Infusion Skin Therapy Elixir, 2oz.",Beauty,5.0,0.0,0.0,N,Y,Five Stars,great product im year old claim,2015-08-31,5,5.0
3,US,19551372.0,R3KSZHPAEVPEAL,B002HWS7RM,255651889.0,"Diane D722 Color, Perm And Conditioner Processing Caps - 100-Pack - Clear",Beauty,5.0,0.0,0.0,N,Y,GOOD DEAL!,use shower cap condit cap like theyr bulk save lot money,2015-08-31,4,4.5
4,US,14802407.0,RAI2OIG50KZ43,B00SM99KWU,116158747.0,Biore UV Aqua Rich Watery Essence SPF50+/PA++++ (pack of 2),Beauty,5.0,0.0,0.0,N,Y,this soaks in quick and provides a nice base for makeup,goto daili sunblock leav white cast clean pleasant scent your makeup wearer soak quick provid nice base makeup ive use brand year daili use tube last coupl month,2015-08-31,5,5.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
29995,US,6985401.0,RLWQNDS7I46ST,B00TDJ5VTO,959170259.0,"Waterproof Bluetooth Speaker, Keedox Wireless Outdoor Speaker Shower Bluetooth Speaker for Apple iPhone 6s Plus, iPhone 6s, Samsung and More.(Black+Yellow)",Mobile_Electronics,5.0,0.0,0.0,N,Y,cool,work greatbr get wet reason tobr gal water proof,2015-06-14,4,4.5
29996,US,127390.0,RIDE7LEW92C1U,B00S015EMK,231869191.0,"Game Day w/ GPLC & Pure PF3, Blue Bomb-Sicle 250g",Health & Personal Care,1.0,0.0,0.0,N,Y,One Star,old batch gameday color tast differ ive bought item,2015-08-31,3,2.0
29997,US,13150882.0,R2M3HM87H61ETP,B007A1XG5I,730919193.0,New Chapter Every Woman's One Daily 40 Plus Bonus Multivitamin Tablets,Health & Personal Care,5.0,3.0,4.0,N,Y,Amazon Prime is the best and most cost effective way to buy them,take vitamin year make differ mental physicallybr br amazon prime best cost effect way buy,2015-08-31,5,5.0
29998,US,4722303.0,RW8FGV1OTGXMK,B005NJLN86,70378667.0,"Massachusetts Engineers Arch and Logo Short Sleeve T-shirt - Maroon ,",Sports,5.0,0.0,0.0,N,Y,Five Stars,good cotton flawlessli sewb togeth good job,2015-08-31,4,4.5


In this section, we used the TD-TDF algorithm to create a matrix for the product title. 

In [11]:
#Import TfIdfVectorizer from scikit-learn
from sklearn.feature_extraction.text import TfidfVectorizer

#Define a TF-IDF Vectorizer Object. Remove all english stop words such as 'the', 'a'
tfidf = TfidfVectorizer(stop_words='english')

#Replace NaN with an empty string
df['product_title'] = df['product_title'].fillna('')

#Construct the required TF-IDF matrix by fitting and transforming the data
tfidf_matrix = tfidf.fit_transform(df['product_title'])

#Output the shape of tfidf_matrix
tfidf_matrix.shape

(30000, 26299)

Next, we computed the cosine similarity matrix of the product titles and created a dataframe for the cosine similarity matrix.

In [12]:
# Import linear_kernel
from sklearn.metrics.pairwise import linear_kernel

# Compute the cosine similarity matrix
cosine_sim_matrix = linear_kernel(tfidf_matrix, tfidf_matrix)

cosine_sim_matrix_df = pd.DataFrame(cosine_sim_matrix)
cosine_sim_matrix_df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,29990,29991,29992,29993,29994,29995,29996,29997,29998,29999
0,1.0,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.0,0.0,0.000000,...,0.0,0.0,0.000000,0.00000,0.0000,0.00000,0.0,0.00000,0.0,0.0
1,0.0,1.0,0.0,0.000000,0.000000,0.0,0.000000,0.0,0.0,0.000000,...,0.0,0.0,0.000000,0.00000,0.0000,0.00000,0.0,0.00000,0.0,0.0
2,0.0,0.0,1.0,0.000000,0.000000,0.0,0.000000,0.0,0.0,0.000000,...,0.0,0.0,0.000000,0.00000,0.0000,0.00000,0.0,0.00000,0.0,0.0
3,0.0,0.0,0.0,1.000000,0.026772,0.0,0.000000,0.0,0.0,0.066561,...,0.0,0.0,0.042238,0.00000,0.0469,0.00000,0.0,0.00000,0.0,0.0
4,0.0,0.0,0.0,0.026772,1.000000,0.0,0.000000,0.0,0.0,0.000000,...,0.0,0.0,0.000000,0.00000,0.0000,0.00000,0.0,0.00000,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
29995,0.0,0.0,0.0,0.000000,0.000000,0.0,0.054269,0.0,0.0,0.000000,...,0.0,0.0,0.000000,0.07502,0.0000,1.00000,0.0,0.03572,0.0,0.0
29996,0.0,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.0,0.0,0.000000,...,0.0,0.0,0.045709,0.00000,0.0000,0.00000,1.0,0.00000,0.0,0.0
29997,0.0,0.0,0.0,0.000000,0.000000,0.0,0.046269,0.0,0.0,0.000000,...,0.0,0.0,0.000000,0.00000,0.0000,0.03572,0.0,1.00000,0.0,0.0
29998,0.0,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.0,0.0,0.000000,...,0.0,0.0,0.000000,0.00000,0.0000,0.00000,0.0,0.00000,1.0,0.0


After obtaining the cosine similarity matrix, we created a function for the recommendation of products based on product title. The function returns 10 top recommended products based on cosine_similarity and displays the product title, product id and average rating of the recommended products, as well as the cosine similarity.

We remove any duplicates of the recommended products. If the recommended product is itself, this means that there are no similar products found.

In [13]:
#Function for recommendation of products based on product title

def product_recommendation(title):
    copyofdf = df.copy().reset_index().drop('index',axis=1)
    index = copyofdf[copyofdf['product_title'] == title].index[0]
    top_10_index = list(cosine_sim_matrix_df[index].nlargest(10).index)
    try:
        top_10_index.remove(index)
    except:
        pass

    product_df = copyofdf.iloc[top_10_index][['product_title']]
    product_df['Product Id'] = copyofdf.iloc[top_10_index][['product_id']]
    product_df['Average Rating'] = copyofdf.iloc[top_10_index][['avg_rating']]
    product_df['Cosine Similarity'] = cosine_sim_matrix_df[index].iloc[top_10_index]

    product_df = product_df.drop_duplicates(subset=['product_title'])

    product_df = product_df[product_df['product_title'] != title]

    if len(product_df) == 0:
      product_df = "There are no similar products found."

    return product_df

In [14]:
#Results showing the recommended products with a product title as an input

product_recommendation('Dr Song Benzoyl Peroxide 10% Acne Cream Gel Treatment Lotion up to 8oz')

Unnamed: 0,product_title,Product Id,Average Rating,Cosine Similarity
4571,2.5% Benzoyl Peroxide Dr Song Acne Gel Treatment Lotion,B00DFEGDV8,5.0,0.891287
3974,X-ZIT Natural Acne Control Treatment Cream - Best for Controlling Acne - For Teens and Adults - Benzoyl Peroxide-free (2oz),B00RY3ONPE,5.0,0.44412
435,Extra Strength Benzoyl Peroxide 10% Acne Cleanser for Face & Body by Beauty Facial Extreme.,B00SDUD7IQ,5.0,0.422857
19143,Dr Song Home Professional Teeth Whitening Kit 35% Carbamide Peroxide 4 XL Syringe with Light,B00DQ31IUO,5.0,0.329913
2592,"Nelsons: Pure & Clear Acne Treatment Gel, 1 oz",B000YLFHPS,4.0,0.266699
4903,Acne Spot Treatment 0.25 Oz,B00MPF1NQ8,3.0,0.245693
3228,"Dr. Teal's Lotion, Detox Ginger and Clay, 10 Ounce",B00V3QKL4A,1.0,0.228966


A random product title from the original dataframe is created to illustrate on the function output without putting a specific product title for generating the results.

In [15]:
#Creating a random product title from original dataframe

random_data = df.sample()
random_data_product_title = random_data['product_title']

#Changing product title to a string

strrdata = str(random_data_product_title)

# Removing unnecessary words which is not part of the product title
separator = '\n'

result = strrdata.rpartition('  ')[2]
result = result.split(separator, 1)[0]
print(result) 

JBJ MT-603-LED 28 watt Nano Cube LED Intermediate Lighting, 28-Gallon


In [16]:
#Results showing the recommended products with the random product title as an input

recommended_products = product_recommendation(result)
recommended_products

Unnamed: 0,product_title,Product Id,Average Rating,Cosine Similarity
27480,JBJ Nano Glo LED Refugium Light for Aquarium,B003J89HEU,1.5,0.320375
29200,"API AquaWave Aquarium Kit with LED Lighting and Internal Filter, 2-1/2-Gallon",B0083S5PHO,1.0,0.258504
22840,"Spalding NBA Street Basketball - Pink & Purple - Intermediate Size 6 (28.5"")",B003TUCB82,2.0,0.250573
25853,LORIZA® Extendable LED Aquarium Lighting Aquarium Light Any Size 28 - 48 cm Dimmable Blue White fish tanks and aquariums LED Aquarium Lighting Reef Fish Tank Light (11-to-18-inch),B00P2AF32I,5.0,0.236845
25043,Tetra 29095 Cube Aquarium Kit 3 Gallon,B008CA7W7E,5.0,0.185102
26306,API Aquaview 360 Aquarium Kit with LED Lighting and Internal Filter,B002DRLESA,4.5,0.183403
26942,Amzdeal® Aquarium Light Fixture Lighting Led Fish Lights Underwater Lighting for Aquarium or Fish Tank with Suction Cup Mouted,B00XJGV0OM,4.0,0.173251
