# Proyek Machine Learning Rekomendasi Film - Alfandi Firnando

## Data Loading

### Import library machine learning yang diperlukan

In [7]:
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
import nltk
nltk.download('punkt')
from nltk.tokenize import word_tokenize

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


### Connect dengan googgle drive untuk mengakses data

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### Melakukan data loading dari directory

In [3]:
df = pd.read_csv('/content/drive/MyDrive/content_by_synopsis.csv')
df

Unnamed: 0,title,overview
0,Toy Story,"Led by Woody, Andy's toys live happily in his ..."
1,Jumanji,When siblings Judy and Peter discover an encha...
2,Grumpier Old Men,A family wedding reignites the ancient feud be...
3,Waiting to Exhale,"Cheated on, mistreated and stepped on, the wom..."
4,Father of the Bride Part II,Just when George Banks has recovered from his ...
...,...,...
41357,Caged Heat 3000,It's the year 3000 AD. The world's most danger...
41358,Subdue,Rising and falling between a man and woman.
41359,Century of Birthing,An artist struggles to finish his work while a...
41360,Satan Triumphant,"In a small town live two brothers, one a minis..."


## Data Preparation

### Encode semua feature overview

In [8]:
 bow = CountVectorizer(stop_words='english', tokenizer=word_tokenize)
 bank = bow.fit_transform(df.overview)

### Encode overview untuk film index ke-0 yaitu film Toy Story

In [10]:
#Menampilkan overview untuk film index ke-0 yaitu film Toy Story
idx = 0
content = df.loc[idx, "overview"]
content

"Led by Woody, Andy's toys live happily in his room until Andy's birthday brings Buzz Lightyear onto the scene. Afraid of losing his place in Andy's heart, Woody plots against Buzz. But when circumstances separate Buzz and Woody from their owner, the duo eventually learns to put aside their differences."

In [11]:
#Melakukan transformasi fitur ke bentuk matrix
code = bow.transform([content])
code

<1x86740 sparse matrix of type '<class 'numpy.int64'>'
	with 28 stored elements in Compressed Sparse Row format>

In [12]:
code.toarray()

array([[0, 0, 0, ..., 0, 0, 0]])

## Modeling

###  menghitung derajat kesamaan (similarity degree) dengan teknik cosine distances

In [13]:
from sklearn.metrics.pairwise import cosine_distances

In [15]:
dist = cosine_distances(code, bank)
dist

array([[0.        , 0.68698928, 0.70198022, ..., 0.88529213, 0.68931574,
        0.75277431]])

## Evaluation

### Memberikan Rekomendasi Film

In [18]:
#Memberi rekomendasi untuk top 10 film berdasarkan kemiripan content sinopsis untuk film index ke-0
rec_idx = dist.argsort()[0, 1:11]
rec_idx

array([14706,  2945,  9984, 36827, 40606, 13404, 22084, 14078,  6172,
       27006])

In [19]:
#Menampilkan Top 10 film rekomendasi berdasarkan kemiripan synopsis
df.loc[rec_idx]

Unnamed: 0,title,overview
14706,Toy Story 3,"Woody, Buzz, and the rest of Andy's toys haven..."
2945,Toy Story 2,"Andy heads off to Cowboy Camp, leaving his toy..."
9984,The 40 Year Old Virgin,Andy Stitzer has a pleasant life with a nice a...
36827,Wabash Avenue,Andy Clark discovers he was cheated out of a h...
40606,Stasis,After a night out of partying and left behind ...
13404,The Gang's All Here,"Playboy Andy Mason, on leave from the army, ro..."
22084,The Pied Piper,"Greed, corruption, ignorance, and disease. Mid..."
14078,A Matter of Dignity,"During one of her parents many parties, Chloe ..."
6172,The Courtship of Eddie's Father,The film that started the classic TV series. A...
27006,Superdome,"It's Superbowl. And there's a lot of drama, on..."
