<a href="https://colab.research.google.com/github/urvashi-agrawal-dev/Movie-recommendation/blob/main/Movie_recommendation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
##We'll build a system that recommends movies to a user based on the ratings from other similar users (User-Based Collaborative Filtering) or similar movies (Item-Based).

STEP 1: Understand the Goal

Input: User's past movie ratings

Output: A list of recommended movies

We’ll build:

A rating matrix of users vs movies

Use cosine similarity or correlation

Recommend movies not yet rated by the user



In [1]:
pip install pandas numpy scikit-learn



In [2]:
import pandas as pd

# Load ratings
ratings = pd.read_csv('u.data', sep='\t', names=['user_id', 'movie_id', 'rating', 'timestamp'])

# Load movie titles
movies = pd.read_csv('u.item', sep='|', encoding='latin-1', names=['movie_id', 'title'], usecols=[0, 1])

# Merge datasets
data = pd.merge(ratings, movies, on='movie_id')
data.head()


Unnamed: 0,user_id,movie_id,rating,timestamp,title
0,196,242,3,881250949,Kolya (1996)
1,186,302,3,891717742,L.A. Confidential (1997)
2,22,377,1,878887116,Heavyweights (1994)
3,244,51,2,880606923,Legends of the Fall (1994)
4,166,346,1,886397596,Jackie Brown (1997)


In [3]:
#Create User-Movie Rating Matrix
user_movie_matrix = data.pivot_table(index='user_id', columns='title', values='rating')


STEP 5: Choose Collaborative Filtering Type

Option 1: User-Based Collaborative Filtering

Find users who rated movies similarly

Option 2: Item-Based Collaborative Filtering

Find movies similar to each other (better for scalability)

Let’s do Item-Based as it’s easier to start.



In [4]:
# Choose a target movie
target_movie = "Star Wars (1977)"

# Get movie rating vector
target_ratings = user_movie_matrix[target_movie]

# Compute correlations
similar_movies = user_movie_matrix.corrwith(target_ratings)

# Clean and sort
corr_df = pd.DataFrame(similar_movies, columns=['Correlation'])
corr_df.dropna(inplace=True)

# Add number of ratings for filtering
rating_counts = data.groupby('title')['rating'].count()
corr_df = corr_df.join(rating_counts)
corr_df.columns = ['Correlation', 'RatingCount']

# Show similar movies with at least 50 ratings
corr_df[corr_df['RatingCount'] > 50].sort_values('Correlation', ascending=False).head(10)


  c /= stddev[:, None]
  c /= stddev[None, :]
  c = cov(x, y, rowvar, dtype=dtype)
  c *= np.true_divide(1, fact)
  c *= np.true_divide(1, fact)


Unnamed: 0_level_0,Correlation,RatingCount
title,Unnamed: 1_level_1,Unnamed: 2_level_1
Star Wars (1977),1.0,583
"Empire Strikes Back, The (1980)",0.747981,367
Return of the Jedi (1983),0.672556,507
Raiders of the Lost Ark (1981),0.536117,420
Giant (1956),0.488093,51
"Life Less Ordinary, A (1997)",0.411638,53
Austin Powers: International Man of Mystery (1997),0.377433,130
"Sting, The (1973)",0.367538,241
Indiana Jones and the Last Crusade (1989),0.350107,331
Pinocchio (1940),0.347868,101


In [5]:
#make a function
def recommend(movie_name, min_ratings=50):
    movie_ratings = user_movie_matrix[movie_name]
    similar = user_movie_matrix.corrwith(movie_ratings)

    corr_df = pd.DataFrame(similar, columns=['Correlation'])
    corr_df.dropna(inplace=True)

    rating_counts = data.groupby('title')['rating'].count()
    corr_df = corr_df.join(rating_counts)
    corr_df.columns = ['Correlation', 'RatingCount']

    return corr_df[corr_df['RatingCount'] > min_ratings].sort_values('Correlation', ascending=False).head(10)


In [6]:
recommend("Striptease (1996)")


  c /= stddev[:, None]
  c /= stddev[None, :]
  c = cov(x, y, rowvar, dtype=dtype)
  c *= np.true_divide(1, fact)
  c *= np.true_divide(1, fact)


Unnamed: 0_level_0,Correlation,RatingCount
title,Unnamed: 1_level_1,Unnamed: 2_level_1
Seven Years in Tibet (1997),1.0,155
"39 Steps, The (1935)",1.0,59
High Noon (1952),1.0,88
She's So Lovely (1997),1.0,53
"Mrs. Brown (Her Majesty, Mrs. Brown) (1997)",1.0,96
Flubber (1997),1.0,53
"Ice Storm, The (1997)",1.0,108
Striptease (1996),1.0,67
Excess Baggage (1997),0.970725,52
Money Talks (1997),0.927173,92
