<a href="https://colab.research.google.com/github/umarmaul/Recommendation_System/blob/main/Content_Based_Filtering.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [7]:
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from nltk.tokenize import word_tokenize

In [8]:
df = pd.read_csv("https://raw.githubusercontent.com/umarmaul/Recommendation_System/main/Dataset/content_by_synopsis.csv")
df.head()

Unnamed: 0,title,overview
0,Toy Story,"Led by Woody, Andy's toys live happily in his ..."
1,Jumanji,When siblings Judy and Peter discover an encha...
2,Grumpier Old Men,A family wedding reignites the ancient feud be...
3,Waiting to Exhale,"Cheated on, mistreated and stepped on, the wom..."
4,Father of the Bride Part II,Just when George Banks has recovered from his ...


#Encode all synopsis menjadi bank

In [10]:
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [11]:
bow = CountVectorizer(stop_words="english", tokenizer=word_tokenize)
bank = bow.fit_transform(df.overview)

#Step 1: Encode what user watch

In [12]:
idx = 0

In [14]:
content = df.loc[idx, "overview"]
content

"Led by Woody, Andy's toys live happily in his room until Andy's birthday brings Buzz Lightyear onto the scene. Afraid of losing his place in Andy's heart, Woody plots against Buzz. But when circumstances separate Buzz and Woody from their owner, the duo eventually learns to put aside their differences."

In [15]:
code = bow.transform([content])
code

<1x86599 sparse matrix of type '<class 'numpy.int64'>'
	with 28 stored elements in Compressed Sparse Row format>

#Step 2: Document search

In [16]:
from sklearn.metrics.pairwise import cosine_distances

In [18]:
dist = cosine_distances(code, bank)
dist

array([[0.        , 0.68698928, 0.70198022, ..., 0.88529213, 0.68931574,
        0.75277431]])

In [20]:
rec_idx = dist.argsort()[0, 1:11]
rec_idx

array([14706,  2945,  9984, 36827, 40606, 13404, 22084, 14078,  6172,
       27006])

#Step 3: Recommend

In [21]:
df.loc[rec_idx]

Unnamed: 0,title,overview
14706,Toy Story 3,"Woody, Buzz, and the rest of Andy's toys haven..."
2945,Toy Story 2,"Andy heads off to Cowboy Camp, leaving his toy..."
9984,The 40 Year Old Virgin,Andy Stitzer has a pleasant life with a nice a...
36827,Wabash Avenue,Andy Clark discovers he was cheated out of a h...
40606,Stasis,After a night out of partying and left behind ...
13404,The Gang's All Here,"Playboy Andy Mason, on leave from the army, ro..."
22084,The Pied Piper,"Greed, corruption, ignorance, and disease. Mid..."
14078,A Matter of Dignity,"During one of her parents many parties, Chloe ..."
6172,The Courtship of Eddie's Father,The film that started the classic TV series. A...
27006,Superdome,"It's Superbowl. And there's a lot of drama, on..."


#ML Engineering Implementation

In [26]:
from sklearn.metrics.pairwise import cosine_distances

class RecommenderSystem:
  def __init__(self, data, content_col):
    self.df = pd.read_csv(data)
    self.content_col = content_col
    self.encoder = None
    self.bank = None

  def fit(self):
    self.encoder = CountVectorizer(stop_words="english", tokenizer=word_tokenize)
    self.bank = self.encoder.fit_transform(self.df[self.content_col])

  def recommend(self, idx, topk=10):
    content = df.loc[idx, self.content_col]
    code = self.encoder.transform([content])
    dist = cosine_distances(code, self.bank)
    rec_idx = dist.argsort()[0, 1:(topk+1)]
    return self.df.loc[rec_idx]

In [29]:
recsys = RecommenderSystem("https://raw.githubusercontent.com/umarmaul/Recommendation_System/main/Dataset/content_by_synopsis.csv", content_col="overview")
recsys.fit()



In [30]:
recsys.recommend(1) #Jumanji

Unnamed: 0,title,overview
27006,Superdome,"It's Superbowl. And there's a lot of drama, on..."
40606,Stasis,After a night out of partying and left behind ...
37971,Snowed Under,"Alan Tanner's new play opens in a week, but Ta..."
18715,Wreck-It Ralph,"Wreck-It Ralph is the 9-foot-tall, 643-pound v..."
40431,Liar Game: Reborn,"To exact revenge, the Liar Game office is revi..."
38232,Enter the Battlefield: Life on the Magic - The...,Magic: The Gathering is the most popular colle...
36540,Beta Test,While testing the latest first person shooter ...
14859,Le Pont du Nord,"Marie, is just out from prison when she runs i..."
13105,Break Up,"Jimmy is married to the abusive Frank, but she..."
17918,Dante's Inferno: An Animated Epic,Dante journeys through the nine circles of Hel...


In [31]:
recsys.recommend(579) #Home Alone

Unnamed: 0,title,overview
1959,The Return of Jafar,The evil Jafar escapes from the magic lamp as ...
34229,The Princess and the Pea,"In one of the realms, there was time for the p..."
26972,For a Handful of Kisses,A girl. A boy. A love story. But also about dr...
25328,"War and Peace, Part II: Natasha Rostova","In the end of 1809, Natasha attends her first ..."
18759,The Well Digger's Daughter,It's the beginning of the WWII. South of Franc...
26411,Lost and Found,"While visiting Switzerland, an American colleg..."
27938,Streetwise,Portrays the lives of nine desperate teenagers...
41081,Ordinary Wonder,In a romantic and philosophical tale of magic ...
10890,The Tiger and the Snow,Love and injury in time of war. Attilio de Gio...
34417,Dudes & Dragons,"When the powerful wizard, Lord Tensley, is jil..."


#Multiple Information using Metadata soup

In [33]:
df = pd.read_csv("https://raw.githubusercontent.com/umarmaul/Recommendation_System/main/Dataset/content_by_multiple.csv")
df.head()

Unnamed: 0,title,genres,cast,keywords,director,metadata
0,Toy Story,animation comedy family,tom_hanks tim_allen don_rickles,jealousy toy boy,john_lasseter,animation comedy family tom_hanks tim_allen do...
1,Jumanji,adventure fantasy family,robin_williams jonathan_hyde kirsten_dunst,board_game disappearance based_on_children's_book,joe_johnston,adventure fantasy family robin_williams jonath...
2,Grumpier Old Men,romance comedy,walter_matthau jack_lemmon ann-margret,fishing best_friend duringcreditsstinger,howard_deutch,romance comedy walter_matthau jack_lemmon ann-...
3,Waiting to Exhale,comedy drama romance,whitney_houston angela_bassett loretta_devine,based_on_novel interracial_relationship single...,forest_whitaker,comedy drama romance whitney_houston angela_ba...
4,Father of the Bride Part II,comedy,steve_martin diane_keaton martin_short,baby midlife_crisis confidence,charles_shyer,comedy steve_martin diane_keaton martin_short ...


In [34]:
recsys = RecommenderSystem("https://raw.githubusercontent.com/umarmaul/Recommendation_System/main/Dataset/content_by_multiple.csv", content_col="metadata")
recsys.fit()



In [35]:
recsys.recommend(0) #Toy Story

Unnamed: 0,title,genres,cast,keywords,director,metadata
2963,Toy Story 2,animation comedy family,tom_hanks tim_allen joan_cusack,museum prosecution identity_crisis,john_lasseter,animation comedy family tom_hanks tim_allen jo...
14771,Toy Story 3,animation family comedy,tom_hanks tim_allen ned_beatty,hostage college toy,lee_unkrich,animation family comedy tom_hanks tim_allen ne...
24390,The Legend of Mor'du,animation family,tom_hanks tim_allen joan_cusack,toy short toy_story,steve_purcell,animation family tom_hanks tim_allen joan_cusa...
3273,Creature Comforts,animation comedy family,,animation,nick_park,animation comedy family animation nick_park
25917,A Flintstones Christmas Carol,animation comedy family,,,rein_raamat,animation comedy family rein_raamat
41974,Banana,animation comedy family,,,adam_foulkes,animation comedy family adam_foulkes
34722,Open Season: Scared Silly,animation comedy family,,,dee_hibbert-jones,animation comedy family dee_hibbert-jones
25857,"I Want a Dog for Christmas, Charlie Brown",animation comedy family,,,dony_permedi,animation comedy family dony_permedi
29030,Tom and Jerry: Shiver Me Whiskers,animation comedy family,pablo_francisco,,,animation comedy family pablo_francisco
37061,VeggieTales: The Ultimate Silly Song Countdown,animation family,,,,animation family


In [36]:
recsys.recommend(1) #Jumanji

Unnamed: 0,title,genres,cast,keywords,director,metadata
41600,The Kingdom of Fairies,adventure fantasy,,,,adventure fantasy
28394,The Rain Fairy,family fantasy,,,,family fantasy
39899,Tainá: An Amazon Adventure,family fantasy adventure,,comedy,kahane_cooperman,family fantasy adventure comedy kahane_cooperman
552,The Pagemaster,fantasy science_fiction family,macaulay_culkin christopher_lloyd patrick_stewart,library adventure part_animated,joe_johnston,fantasy science_fiction family macaulay_culkin...
40803,Princess Goldilocks,adventure family fantasy,charlie_durkin,woman_director,callie_t._wiser,adventure family fantasy charlie_durkin woman_...
14070,Playmobil: The Secret of Pirate Island,action adventure family,lee_tockar caitlin_williams,fantasy adventure cartoon,alexander_e._sokoloff,action adventure family lee_tockar caitlin_wil...
15781,Cirque du Soleil: Varekai,drama family fantasy,,,,drama family fantasy
21579,The Young and Prodigious T.S. Spivet,adventure drama family,,,,adventure drama family
12560,City of Ember,adventure family fantasy,saoirse_ronan harry_treadaway mary_kay_place,underground_world mayor adventure,gil_kenan,adventure family fantasy saoirse_ronan harry_t...
17504,G.I. Joe: The Revenge of Cobra,family fantasy action,,,,family fantasy action


In [37]:
recsys.recommend(579) #Home Alone

Unnamed: 0,title,genres,cast,keywords,director,metadata
2808,Home Alone 2: Lost in New York,comedy family adventure,macaulay_culkin joe_pesci catherine_o'hara,holiday new_york new_york_city,chris_columbus,comedy family adventure macaulay_culkin joe_pe...
19021,Nativity!,comedy family,daniel_stern braeden_lemasters stacey_travis,holiday,brian_levant,comedy family daniel_stern braeden_lemasters s...
34843,Father of Four: Never Gives Up!,comedy family,,,,comedy family
369,Ri¢hie Ri¢h,comedy family,macaulay_culkin john_larroquette edward_herrmann,family life_raft private_airplane,donald_petrie,comedy family macaulay_culkin john_larroquette...
39019,"Good Luck Charlie, It's Christmas!",comedy family,,,william_k.l._dickson_,comedy family william_k.l._dickson_
41843,50 Kilos of Sour Cherry,family drama comedy,,,,family drama comedy
25916,Norm MacDonald: Me Doing Standup,comedy,,holiday,timothy_quay,comedy holiday timothy_quay
30648,Oh How It Hurts 66,comedy family,,,bertrand_avril,comedy family bertrand_avril
23872,"Welcome, or No Trespassing",comedy family,,,elem_klimov,comedy family elem_klimov
31241,Get Santa,family comedy,,,karin_steinberger,family comedy karin_steinberger
