## RECOMMENDER SYSTEM LAB



In [1]:
# Load Libraries
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer

In [3]:
from google.colab import files
uploaded = files.upload()

Saving movies_dataset.csv to movies_dataset.csv


In [9]:
# Step 2: Load Dataset

df = pd.read_csv("movies_dataset.csv")
print("Dataset Preview:")
print(df.head(15))

Dataset Preview:
    ID                  MOVIE_TITLE  RATING               GENRE
0    1                      Sabrina       3     Romance | Drama
1    1  Father of the Bride Part II       4              Comedy
2    1            Waiting to Exhale       3     Romance | Drama
3    2                    Toy Story       4  Animation | Comedy
4    2                      Jumanji       5           Adventure
5    2                     Chatered       4           Adventure
6    3                      Encanto       5           Animation
7    3                       Wicked       5             Fantasy
8    3             Grumpier Old Men       4    Romance | Comedy
9    4                       Frozen       4           Animation
10   4                    John Wick       5              Action
11   4                    Rush Hour       5              Action
12   5                     Bad Boys       4              Action
13   5                      Mr Bean       5              Comedy
14   5                D

In [11]:
# Step 3: Create User-Item Matrix

user_item_matrix = df.pivot_table(index="ID", columns="MOVIE_TITLE", values="RATING")
print("\nUser-Item Matrix:")
print(user_item_matrix.head())


User-Item Matrix:
MOVIE_TITLE  Bad Boys  Chatered  Despicable Me  Encanto  \
ID                                                        
1                 NaN       NaN            NaN      NaN   
2                 NaN       4.0            NaN      NaN   
3                 NaN       NaN            NaN      5.0   
4                 NaN       NaN            NaN      NaN   
5                 4.0       NaN            5.0      NaN   

MOVIE_TITLE  Father of the Bride Part II  Frozen  Grumpier Old Men  John Wick  \
ID                                                                              
1                                    4.0     NaN               NaN        NaN   
2                                    NaN     NaN               NaN        NaN   
3                                    NaN     NaN               4.0        NaN   
4                                    NaN     4.0               NaN        5.0   
5                                    NaN     NaN               NaN        NaN   


In [12]:
# Step 4: Collaborative Filtering

# Filling missing values
matrix_filled = user_item_matrix.fillna(0)

# Compute cosine similarity between users
similarity = cosine_similarity(matrix_filled)
print("\nUser Similarity Matrix:\n", similarity)

# Example: Recommend for user 1 (user_index = 0)
user_index = 0
similar_users = similarity[user_index]

# Find most similar user (excluding self)
most_similar_user = np.argsort(similar_users)[-2]

# Get movies rated by most similar user
recommended_movies = user_item_matrix.iloc[most_similar_user].dropna().index.tolist()
print(f"\nRecommended movies for User 1 (Collaborative Filtering): {recommended_movies}")


User Similarity Matrix:
 [[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]]

Recommended movies for User 1 (Collaborative Filtering): ['Bad Boys', 'Despicable Me', 'Mr Bean']


In [14]:
#Step 5: Content-Based Filtering
# Use TF-IDF on genres
tfidf = TfidfVectorizer(stop_words="english")
tfidf_matrix = tfidf.fit_transform(df['GENRE'].fillna(""))

# Compute cosine similarity between movies
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

# Build index mapping for movie titles
indices = pd.Series(df.index, index=df['MOVIE_TITLE']).drop_duplicates()

# Example: Recommend movies similar to "Frozen"
movie_name = "Frozen"
if movie_name in indices:
    movie_index = indices[movie_name]
    similar_indices = cosine_sim[movie_index].argsort()[-6:-1]  # Top 5 similar
    print(f"\nMovies similar to {movie_name} (Content-Based):")
    print(df['MOVIE_TITLE'].iloc[similar_indices].tolist())
else:
    print(f"\nMovie '{movie_name}' not found in dataset!")


Movies similar to Frozen (Content-Based):
['Bad Boys', 'Rush Hour', 'Toy Story', 'Encanto', 'Frozen']


## **Reflection Section**



**Reflection:**

1. Which method worked better (Content-Based vs Collaborative)?
   - Collaborative Filtering worked better for this small dataset, since it uses actual user ratings to make recommendations.
     For example, User 1 got movie suggestions that similar users had already rated highly, which feels more personalized.
     Content-Based Filtering also worked, but since genres are broad and limited, the recommendations were less specific.

2. What challenges did you face?
   - Missing data: not every user rated every movie, so the user-item matrix had many NaN values.
   - Small dataset: with only a few users and movies, recommendations are limited and may not feel very accurate.

3. Which method would scale better with more data?
   - Collaborative Filtering scales better when there are lots of users and ratings, because it learns from community behavior.
   - Content-Based Filtering scales better when new movies are added, since recommendations can be made from metadata (like genres),
     but it struggles with personalization if user rating history is short.
   - A Hybrid system combining both methods would likely give the best results.