Simple Movie Recommendation System Idea

Objective:

Create a basic movie recommendation system using Python that suggests similar movies to a user based on a movie they have already liked.

Approach:

You can use the cosine similarity metric to find movies that are most similar to a selected movie based on genres, and then recommend them to the user. The dataset used will be a small, structured movie dataset that includes movie titles, genres, and ratings.

Implementation Plan:

Dataset: Use the MovieLens dataset (small version) which contains information on movies, their genres, and user ratings. Download the movies.csv file, which contains columns like movieId, title, and genres.

Steps to Build:

Import Libraries:

In [1]:
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer

Upload dataset, we are using this one: https://www.kaggle.com/datasets/shubhammehta21/movie-lens-small-latest-dataset


In [3]:
# Load dataset
movies = pd.read_csv("movies.csv")

# Preview dataset
print(movies.head())

   movieId                               title  \
0        1                    Toy Story (1995)   
1        2                      Jumanji (1995)   
2        3             Grumpier Old Men (1995)   
3        4            Waiting to Exhale (1995)   
4        5  Father of the Bride Part II (1995)   

                                        genres  
0  Adventure|Animation|Children|Comedy|Fantasy  
1                   Adventure|Children|Fantasy  
2                               Comedy|Romance  
3                         Comedy|Drama|Romance  
4                                       Comedy  


Feature Engineering: Convert the genres into a format that can be used to compute similarity (e.g., a count matrix).

In [4]:
# Combine genres into a single string for each movie
movies['genres'] = movies['genres'].apply(lambda x: x.replace('|', ' '))

# Create the count matrix using CountVectorizer
count_vectorizer = CountVectorizer()
count_matrix = count_vectorizer.fit_transform(movies['genres'])

Calculate Cosine Similarity: Use cosine similarity to find movies that are similar based on genres.

In [5]:
# Compute the cosine similarity matrix
cosine_sim = cosine_similarity(count_matrix, count_matrix)


Recommendation Function: Write a function that takes a movie title as input and recommends the top 5 similar movies.

In [6]:
def recommend_movies(title, cosine_sim=cosine_sim):
    # Get the index of the movie that matches the title
    idx = movies[movies['title'] == title].index[0]

    # Get the pairwise similarity scores for all movies
    sim_scores = list(enumerate(cosine_sim[idx]))

    # Sort the movies based on the similarity scores
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    # Get the indices of the top 5 most similar movies
    sim_scores = sim_scores[1:6]

    # Get the movie titles
    movie_indices = [i[0] for i in sim_scores]
    return movies['title'].iloc[movie_indices]

Test the Recommender System: Test your recommender system by passing in a movie title.

In [7]:
# Test the recommendation function
print(recommend_movies("Toy Story (1995)"))

1706                                       Antz (1998)
2355                                Toy Story 2 (1999)
2809    Adventures of Rocky and Bullwinkle, The (2000)
3000                  Emperor's New Groove, The (2000)
3568                             Monsters, Inc. (2001)
Name: title, dtype: object
