# Movie Recommendation System
This notebook demonstrates how to build a movie recommendation system using different collaborative filtering techniques. We will explore:
- **User-based Collaborative Filtering**
- **Item-based Collaborative Filtering**
- **Matrix Factorization (SVD-based)**

Each method is explained in detail and implemented in Python, with the goal of recommending movies to users based on their past ratings.


In [1]:
# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from scipy.sparse import csr_matrix
from sklearn.neighbors import NearestNeighbors
from sklearn.decomposition import TruncatedSVD

# Load the datasets
ratings_df = pd.read_csv('Dataset_Rating.csv')
movies_df = pd.read_csv('Dataset_Movie.csv')

# Merge the ratings with movie information to get full details
merged_df = pd.merge(ratings_df, movies_df, on="Movie_ID")
merged_df.head()


Unnamed: 0,User_ID,Rating,Movie_ID,Year,Name
0,712664,5,3,1997,Character
1,1331154,4,3,1997,Character
2,2632461,3,3,1997,Character
3,44937,5,3,1997,Character
4,656399,4,3,1997,Character


### User-based Collaborative Filtering with Sparse Matrices and Approximate Nearest Neighbors

User-based collaborative filtering recommends movies by identifying users who have similar rating patterns. This approach uses a sparse matrix for efficient computation and Nearest Neighbors to find similar users.
Steps:

    Create a sparse pivot table where each row represents a user, and each column represents a movie. Missing ratings are filled with zero.
    Use the NearestNeighbors algorithm to find the k-nearest neighbors for each movie.
    For each rated movie, recommend movies that similar users have rated highly.



In [2]:
# 1. User-based Collaborative Filtering with Sparse Matrices and Approximate Nearest Neighbors
def user_based_collaborative_filtering(user_id, n_recommendations=5):
    # Create a sparse pivot table for ratings
    pivot_table = ratings_df.pivot(index='User_ID', columns='Movie_ID', values='Rating').fillna(0)
    pivot_sparse = csr_matrix(pivot_table.values)
    
    # Compute the similarity using Nearest Neighbors (k-nearest neighbors) for efficiency
    nn = NearestNeighbors(metric='cosine', algorithm='auto', n_neighbors=10, n_jobs=-1)
    nn.fit(pivot_sparse.T)  # Use the transpose to compute item similarity
    distances, indices = nn.kneighbors(pivot_sparse.T)  # Get k-nearest neighbors for each movie

    # Get the index of the given user
    user_idx = ratings_df[ratings_df['User_ID'] == user_id].drop_duplicates('User_ID').index[0]
    
    # Find the movies rated by the given user
    rated_movie_ids = ratings_df[ratings_df['User_ID'] == user_id]['Movie_ID'].tolist()
    
    # Initialize a list to store recommended movie IDs
    recommended_movies = []
    
    # For each movie rated by the user, recommend similar movies based on nearest neighbors
    for movie_id in rated_movie_ids:
        movie_idx = pivot_table.columns.get_loc(movie_id)
        # Get the top 5 similar movies for this movie
        similar_movies = indices[movie_idx][:n_recommendations]
        recommended_movies.extend(similar_movies)
    
    # Get unique movie IDs from the recommended list (remove duplicates)
    recommended_movie_ids = list(set([pivot_table.columns[i] for i in recommended_movies]))
    
    # Merge with movie details to get movie names and years
    recommended_movie_details = movies_df[movies_df['Movie_ID'].isin(recommended_movie_ids)]
    
    return recommended_movie_details[['Name', 'Year']]


In [3]:

# Example: Recommend movies for user 712664 based on User-based Collaborative Filtering
user_recommendations = user_based_collaborative_filtering(712664)
print("\nMovies recommended for user 712664 (User-based Collaborative Filtering):")
print(user_recommendations)



Movies recommended for user 712664 (User-based Collaborative Filtering):
                              Name  Year
2                        Character  1997
15                       Screamers  1996
16                       7 Seconds  2005
17                Immortal Beloved  1994
25                 Never Die Alone  2004
27                 Lilo and Stitch  2002
29          Something's Gotta Give  2003
43                  Spitfire Grill  1996
45  Rudolph the Red-Nosed Reindeer  1964
46       The Bad and the Beautiful  1952
51         The Weather Underground  2002
55                       Carandiru  2004
56                     Richard III  1995
57                     Dragonheart  1996
76                           Congo  1995
77              Jingle All the Way  1996
78                     The Killing  1956
82                        Silkwood  1983
96                   Mostly Martha  2002


### Item-based Collaborative Filtering with Sparse Matrices

Item-based collaborative filtering recommends movies based on the similarity of items (movies). Similar to user-based filtering, it uses a sparse matrix but focuses on finding similar movies rather than users.
Steps:

    Create a sparse pivot table of ratings.
    Use NearestNeighbors to compute the similarity between movies.
    For each movie rated by the user, recommend movies similar to it.



In [4]:
def item_based_collaborative_filtering(user_id, n_recommendations=5):
    # Create a sparse pivot table for ratings
    pivot_table = ratings_df.pivot(index='User_ID', columns='Movie_ID', values='Rating').fillna(0)
    pivot_sparse = csr_matrix(pivot_table.values)
    
    # Compute the similarity between items (movies) using Nearest Neighbors
    nn = NearestNeighbors(metric='cosine', algorithm='auto', n_neighbors=10, n_jobs=-1)
    nn.fit(pivot_sparse)  # Use the regular matrix to compute movie similarity
    distances, indices = nn.kneighbors(pivot_sparse, n_neighbors=n_recommendations)
    
    # Get the movie IDs rated by the user
    rated_movie_ids = ratings_df[ratings_df['User_ID'] == user_id]['Movie_ID'].tolist()
    
    # Initialize a list to store recommended movie IDs
    recommended_movies = []
    
    # For each movie rated by the user, recommend similar movies based on item similarity
    for movie_id in rated_movie_ids:
        movie_idx = pivot_table.columns.get_loc(movie_id)  # Get column index for the movie
        # Get the top similar movies for this movie
        similar_movies = indices[movie_idx]
        recommended_movies.extend(similar_movies)
    
    # Get unique movie IDs from the recommended list (remove duplicates)
    recommended_movie_ids = list(set([pivot_table.columns[i] for i in recommended_movies if i < len(pivot_table.columns)]))
    
    # Merge with movie details to get movie names and years
    recommended_movie_details = movies_df[movies_df['Movie_ID'].isin(recommended_movie_ids)]
    
    return recommended_movie_details[['Name', 'Year']]


In [5]:

# Example: Recommend movies for user 712664 based on Item-based Collaborative Filtering
item_recommendations = item_based_collaborative_filtering(712664)
print("\nMovies recommended for user 712664 (Item-based Collaborative Filtering):")
print(item_recommendations)



Movies recommended for user 712664 (Item-based Collaborative Filtering):
                Name  Year
17  Immortal Beloved  1994
76             Congo  1995


### Matrix Factorization (SVD-based Collaborative Filtering)

Singular Value Decomposition (SVD) is a matrix factorization technique that reduces the dimensionality of the data. This approach attempts to predict ratings by breaking down the user-item matrix into components and then recombining them.
Steps:

    Create a sparse pivot table for ratings.
    Apply SVD to reduce dimensionality and factorize the matrix.
    Compute the predicted ratings by multiplying the user vector with the item vector.

In [6]:

# 3. Matrix Factorization (SVD-based Collaborative Filtering)
def svd_collaborative_filtering(user_id, n_recommendations=5):
    # Create a sparse pivot table for ratings
    pivot_table = ratings_df.pivot(index='User_ID', columns='Movie_ID', values='Rating').fillna(0)
    pivot_sparse = csr_matrix(pivot_table.values)
    
    # Apply Singular Value Decomposition (SVD) to reduce dimensionality
    svd = TruncatedSVD(n_components=5)
    matrix_factorization = svd.fit_transform(pivot_sparse)
    
    # Compute the predicted ratings for all movies for the given user
    user_idx = ratings_df[ratings_df['User_ID'] == user_id].drop_duplicates('User_ID').index[0]
    predicted_ratings = matrix_factorization[user_idx].dot(svd.components_)
    
    # Get the movie IDs and their corresponding predicted ratings
    predicted_movie_ratings = pd.Series(predicted_ratings, index=pivot_table.columns)
    
    # Get the top n recommended movies
    top_n_recommended_movies = predicted_movie_ratings.sort_values(ascending=False).head(n_recommendations)
    
    # Merge with movie details to get movie names
    recommended_movie_details = movies_df[movies_df['Movie_ID'].isin(top_n_recommended_movies.index)]
    
    return recommended_movie_details[['Name', 'Year']]


In [7]:

# Example: Recommend movies for user 712664 based on Matrix Factorization (SVD)
svd_recommendations = svd_collaborative_filtering(712664)
print("\nMovies recommended for user 712664 (SVD-based Collaborative Filtering):")
print(svd_recommendations)



Movies recommended for user 712664 (SVD-based Collaborative Filtering):
                          Name  Year
7   What the #$*! Do We Know!?  2004
16                   7 Seconds  2005
25             Never Die Alone  2004
29      Something's Gotta Give  2003
96               Mostly Martha  2002


In [8]:
# Example: Recommend movies for user 712664 based on User-based Collaborative Filtering
user_recommendations = user_based_collaborative_filtering(712664)
print("\nMovies recommended for user 712664 (User-based Collaborative Filtering):")
print(user_recommendations)



Movies recommended for user 712664 (User-based Collaborative Filtering):
                              Name  Year
2                        Character  1997
15                       Screamers  1996
16                       7 Seconds  2005
17                Immortal Beloved  1994
25                 Never Die Alone  2004
27                 Lilo and Stitch  2002
29          Something's Gotta Give  2003
43                  Spitfire Grill  1996
45  Rudolph the Red-Nosed Reindeer  1964
46       The Bad and the Beautiful  1952
51         The Weather Underground  2002
55                       Carandiru  2004
56                     Richard III  1995
57                     Dragonheart  1996
76                           Congo  1995
77              Jingle All the Way  1996
78                     The Killing  1956
82                        Silkwood  1983
96                   Mostly Martha  2002


In [9]:
def item_based_collaborative_filtering(user_id, n_recommendations=5):
    # Create a sparse pivot table for ratings
    pivot_table = ratings_df.pivot(index='User_ID', columns='Movie_ID', values='Rating').fillna(0)
    pivot_sparse = csr_matrix(pivot_table.values)
    
    # Compute the similarity between items (movies) using Nearest Neighbors
    nn = NearestNeighbors(metric='cosine', algorithm='auto', n_neighbors=10, n_jobs=-1)
    nn.fit(pivot_sparse)  # Use the regular matrix to compute movie similarity
    distances, indices = nn.kneighbors(pivot_sparse, n_neighbors=n_recommendations)
    
    # Get the movie IDs rated by the user
    rated_movie_ids = ratings_df[ratings_df['User_ID'] == user_id]['Movie_ID'].tolist()
    
    # Initialize a list to store recommended movie IDs
    recommended_movies = []
    
    # For each movie rated by the user, recommend similar movies based on item similarity
    for movie_id in rated_movie_ids:
        movie_idx = pivot_table.columns.get_loc(movie_id)  # Get column index for the movie
        # Get the top similar movies for this movie
        similar_movies = indices[movie_idx]
        recommended_movies.extend(similar_movies)
    
    # Get unique movie IDs from the recommended list (remove duplicates)
    recommended_movie_ids = list(set([pivot_table.columns[i] for i in recommended_movies if i < len(pivot_table.columns)]))
    
    # Merge with movie details to get movie names and years
    recommended_movie_details = movies_df[movies_df['Movie_ID'].isin(recommended_movie_ids)]
    
    return recommended_movie_details[['Name', 'Year']]


In [10]:

# Example: Recommend movies for user 712664 based on Item-based Collaborative Filtering
item_recommendations = item_based_collaborative_filtering(712664)
print("\nMovies recommended for user 712664 (Item-based Collaborative Filtering):")
print(item_recommendations)



Movies recommended for user 712664 (Item-based Collaborative Filtering):
                Name  Year
17  Immortal Beloved  1994
76             Congo  1995


In [11]:

# 3. Matrix Factorization (SVD-based Collaborative Filtering)
def svd_collaborative_filtering(user_id, n_recommendations=5):
    # Create a sparse pivot table for ratings
    pivot_table = ratings_df.pivot(index='User_ID', columns='Movie_ID', values='Rating').fillna(0)
    pivot_sparse = csr_matrix(pivot_table.values)
    
    # Apply Singular Value Decomposition (SVD) to reduce dimensionality
    svd = TruncatedSVD(n_components=5)
    matrix_factorization = svd.fit_transform(pivot_sparse)
    
    # Compute the predicted ratings for all movies for the given user
    user_idx = ratings_df[ratings_df['User_ID'] == user_id].drop_duplicates('User_ID').index[0]
    predicted_ratings = matrix_factorization[user_idx].dot(svd.components_)
    
    # Get the movie IDs and their corresponding predicted ratings
    predicted_movie_ratings = pd.Series(predicted_ratings, index=pivot_table.columns)
    
    # Get the top n recommended movies
    top_n_recommended_movies = predicted_movie_ratings.sort_values(ascending=False).head(n_recommendations)
    
    # Merge with movie details to get movie names
    recommended_movie_details = movies_df[movies_df['Movie_ID'].isin(top_n_recommended_movies.index)]
    
    return recommended_movie_details[['Name', 'Year']]


In [12]:

# Example: Recommend movies for user 712664 based on Matrix Factorization (SVD)
svd_recommendations = svd_collaborative_filtering(712664)
print("\nMovies recommended for user 712664 (SVD-based Collaborative Filtering):")
print(svd_recommendations)



Movies recommended for user 712664 (SVD-based Collaborative Filtering):
                          Name  Year
7   What the #$*! Do We Know!?  2004
16                   7 Seconds  2005
25             Never Die Alone  2004
29      Something's Gotta Give  2003
96               Mostly Martha  2002
