# Movie Recommendation System

**Goal:** Build a system that recommends movies to a user based on similarity with other users and their ratings.

**Dataset:** MovieLens 100K Dataset (Kaggle) containing user ratings for movies.


In [None]:
import pandas as pd
import numpy as np
from google.colab import files

uploaded = files.upload()

ratings = pd.read_csv("u.data", sep='\t', names=['user_id', 'movie_id', 'rating', 'timestamp'])
ratings.head()

movies = pd.read_csv("u.item", sep='|', encoding='latin-1', usecols=[0,1], names=['movie_id','title'])
movies.head()

data = ratings.merge(movies, on='movie_id')
data.head()


Saving u.data to u (1).data
Saving u.item to u (1).item


Unnamed: 0,user_id,movie_id,rating,timestamp,title
0,196,242,3,881250949,Kolya (1996)
1,186,302,3,891717742,L.A. Confidential (1997)
2,22,377,1,878887116,Heavyweights (1994)
3,244,51,2,880606923,Legends of the Fall (1994)
4,166,346,1,886397596,Jackie Brown (1997)


## User-Item Matrix and User Similarity
I created a matrix where each row represents a user and each column represents a movie.
Missing ratings are filled with 0.
I then computed cosine similarity between users to find users with similar tastes.


In [None]:
from sklearn.metrics.pairwise import cosine_similarity

user_item_matrix = data.pivot(index='user_id', columns='movie_id', values='rating').fillna(0)
user_item_matrix.head()

user_similarity = cosine_similarity(user_item_matrix)
user_similarity_df = pd.DataFrame(user_similarity, index=user_item_matrix.index, columns=user_item_matrix.index)
user_similarity_df.head()


user_id,1,2,3,4,5,6,7,8,9,10,...,934,935,936,937,938,939,940,941,942,943
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,1.0,0.166931,0.04746,0.064358,0.378475,0.430239,0.440367,0.319072,0.078138,0.376544,...,0.369527,0.119482,0.274876,0.189705,0.197326,0.118095,0.314072,0.148617,0.179508,0.398175
2,0.166931,1.0,0.110591,0.178121,0.072979,0.245843,0.107328,0.103344,0.161048,0.159862,...,0.156986,0.307942,0.358789,0.424046,0.319889,0.228583,0.22679,0.161485,0.172268,0.105798
3,0.04746,0.110591,1.0,0.344151,0.021245,0.072415,0.066137,0.08306,0.06104,0.065151,...,0.031875,0.042753,0.163829,0.069038,0.124245,0.026271,0.16189,0.101243,0.133416,0.026556
4,0.064358,0.178121,0.344151,1.0,0.031804,0.068044,0.09123,0.18806,0.101284,0.060859,...,0.052107,0.036784,0.133115,0.193471,0.146058,0.030138,0.196858,0.152041,0.170086,0.058752
5,0.378475,0.072979,0.021245,0.031804,1.0,0.237286,0.3736,0.24893,0.056847,0.201427,...,0.338794,0.08058,0.094924,0.079779,0.148607,0.071459,0.239955,0.139595,0.152497,0.313941


## Movie Recommendation Function
This function recommends top-rated unseen movies for a given user based on the similarity scores of other users.


In [None]:
def recommend_movies(user_id, user_item_matrix, user_similarity_df, movies, top_n=5):
    similar_users = user_similarity_df[user_id].sort_values(ascending=False)
    user_ratings = user_item_matrix.loc[user_id]
    unseen_movies = user_ratings[user_ratings == 0].index
    pred_ratings = {}

    for movie in unseen_movies:
        sim_scores = similar_users.drop(user_id)
        ratings_of_movie = user_item_matrix[movie].drop(user_id)  # drop same user
        pred_rating = np.dot(sim_scores, ratings_of_movie) / sim_scores.sum()
        pred_ratings[movie] = pred_rating

    top_movies = sorted(pred_ratings.items(), key=lambda x: x[1], reverse=True)[:top_n]
    top_movie_ids = [movie_id for movie_id, _ in top_movies]

    return movies[movies['movie_id'].isin(top_movie_ids)][['movie_id', 'title']]


## Test Recommendations
Example: Recommend top 5 movies for a specific user using the recommendation function.


In [None]:
recommend_movies(1, user_item_matrix, user_similarity_df, movies, top_n=5)


Unnamed: 0,movie_id,title
285,286,"English Patient, The (1996)"
287,288,Scream (1996)
293,294,Liar Liar (1997)
299,300,Air Force One (1997)
312,313,Titanic (1997)


## Evaluation: Precision at K
Precision@K measures the proportion of relevant recommended movies among the top K recommendations for a user.
Here, I considered a movie relevant if the user rated it 4 or higher.


In [None]:
def precision_at_k(user_id, k=5):
    recommended = recommend_movies(user_id, user_item_matrix, user_similarity_df, movies, top_n=k)['movie_id']

    actual = ratings[(ratings['user_id'] == user_id) & (ratings['rating'] >= 4)]['movie_id']

    precision = len(set(recommended) & set(actual)) / k
    return precision

precision_at_k(1, k=5)


0.0

## Average Precision at K (Subset of Users)
To speed up evaluation, I calculated Precision@k for a smaller subset of users instead of all 943 users.


In [None]:
subset_users = ratings['user_id'].unique()[:50]

precision_scores = [precision_at_k(user_id, k=5) for user_id in subset_users]

average_precision_at_5 = np.mean(precision_scores)
print(f"Average Precision@5 (subset of 50 users): {average_precision_at_5:.4f}")


Average Precision@5 (subset of 50 users): 0.0000


## Top Recommendations for Multiple Users
We will display the top 10 recommended movies for a small subset of users in a table for easier visualization.


In [None]:
subset_users = ratings['user_id'].unique()[:10]

recommendation_dict = {}

for user_id in subset_users:
    recommended_movies = recommend_movies(user_id, user_item_matrix, user_similarity_df, movies, top_n=5)
    recommendation_dict[user_id] = recommended_movies['title'].tolist()

recommendation_df = pd.DataFrame.from_dict(recommendation_dict, orient='index', columns=[f"Top {i+1}" for i in range(5)])
recommendation_df


Unnamed: 0,Top 1,Top 2,Top 3,Top 4,Top 5
196,Star Wars (1977),Fargo (1996),Raiders of the Lost Ark (1981),Return of the Jedi (1983),Contact (1997)
186,Star Wars (1977),"Godfather, The (1972)",Raiders of the Lost Ark (1981),Return of the Jedi (1983),"English Patient, The (1996)"
22,Toy Story (1995),Pulp Fiction (1994),"Silence of the Lambs, The (1991)",Fargo (1996),"English Patient, The (1996)"
244,"Silence of the Lambs, The (1991)","Godfather, The (1972)","English Patient, The (1996)",Scream (1996),Air Force One (1997)
166,Star Wars (1977),"Silence of the Lambs, The (1991)",Fargo (1996),Raiders of the Lost Ark (1981),Return of the Jedi (1983)
298,Twelve Monkeys (1995),Pulp Fiction (1994),Fargo (1996),Contact (1997),Scream (1996)
115,Toy Story (1995),Contact (1997),"English Patient, The (1996)",Scream (1996),Air Force One (1997)
253,Raiders of the Lost Ark (1981),Return of the Jedi (1983),Contact (1997),"English Patient, The (1996)",Scream (1996)
305,Dead Man Walking (1995),Braveheart (1995),Scream (1996),Liar Liar (1997),Titanic (1997)
6,Independence Day (ID4) (1996),"Empire Strikes Back, The (1980)",Return of the Jedi (1983),Scream (1996),Air Force One (1997)


## Summary
- Built a **Movie Recommendation System** using the MovieLens 100K dataset.  
- Created a **user-item rating matrix** and computed **cosine similarity** between users.  
- Defined a **recommendation function** to suggest top-rated unseen movies for each user.  
- Evaluated the system using **Precision at K**, both for individual users and an average over a subset.  
- Displayed **top recommendations for multiple users** in a clear table.  

This notebook demonstrates a **similarity-based user recommendation system** and provides a foundation to explore more advanced techniques like matrix factorization or hybrid methods.
