In [None]:
pip install pandas numpy scikit-learn




In [None]:
!pip install implicit

Collecting implicit
  Downloading implicit-0.7.2-cp310-cp310-manylinux2014_x86_64.whl.metadata (6.1 kB)
Downloading implicit-0.7.2-cp310-cp310-manylinux2014_x86_64.whl (8.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.9/8.9 MB[0m [31m43.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: implicit
Successfully installed implicit-0.7.2


**Description:** Develop a recommendation engine using collaborative filtering and content-based methods to suggest products to customers based on their past behavior
and preferences.
Use Case: E-commerce giants like Amazon and Netflix use ML to deliver personalized recommendations,
increasing user engagement and sales. Students can work on building a recommendation system
using datasets like MovieLens or Amazon Reviews to offer personalized shopping experiences.

The ratings dataset contains user-generated movie ratings. Each row represents a single rating for a movie, with information about the user who made the rating, the movie they rated, and the rating itself. This dataset is essential for collaborative filtering, as it allows us to predict the rating a user might give to unseen movies.               

The movies dataset contains information about movies, including the title and the genres they belong to. This dataset is crucial for content-based filtering, where recommendations are made based on the content features of the movies (e.g., genres, keywords).

In [None]:

ratings_url = 'movies.csv'
movies_url = 'ratings.csv'

ratings = pd.read_csv(ratings_url)
movies = pd.read_csv(movies_url)

print(ratings.head())
print(movies.head())



   movieId                               title  \
0        1                    Toy Story (1995)   
1        2                      Jumanji (1995)   
2        3             Grumpier Old Men (1995)   
3        4            Waiting to Exhale (1995)   
4        5  Father of the Bride Part II (1995)   

                                        genres  
0  Adventure|Animation|Children|Comedy|Fantasy  
1                   Adventure|Children|Fantasy  
2                               Comedy|Romance  
3                         Comedy|Drama|Romance  
4                                       Comedy  
   userId  movieId  rating  timestamp
0       1        1     4.0  964982703
1       1        3     4.0  964981247
2       1        6     4.0  964982224
3       1       47     5.0  964983815
4       1       50     5.0  964982931


In [None]:

print(ratings.columns)


Index(['movieId', 'title', 'genres'], dtype='object')


In [None]:
print(movies.columns)


Index(['userId', 'movieId', 'rating', 'timestamp'], dtype='object')


**Libraries Used:**
**pandas:** For data manipulation and reading CSV files.
**scipy.sparse:** Provides sparse matrix representation, which is used to efficiently store the user-item matrix for ALS.
**implicit.als:** The ALS algorithm from the implicit library, which is used for collaborative filtering on implicit feedback (e.g., ratings data).

In [None]:
import pandas as pd
from scipy.sparse import csr_matrix
from implicit.als import AlternatingLeastSquares


ratings_url = 'ratings.csv'
movies_url = 'movies.csv'
ratings = pd.read_csv(ratings_url)
movies = pd.read_csv(movies_url)
print("Ratings columns:", ratings.columns)
print("Movies columns:", movies.columns)

user_movie_matrix = ratings.pivot(index='userId', columns='movieId', values='rating').fillna(0)

user_movie_sparse = csr_matrix(user_movie_matrix.values)


model = AlternatingLeastSquares(factors=50, regularization=0.1, iterations=15)
model.fit(user_movie_sparse)


def get_als_recommendations(user_id, top_n=10):

    if user_id not in user_movie_matrix.index:
        raise ValueError(f"User ID {user_id} not found in the user-item matrix.")


    user_idx = user_movie_matrix.index.get_loc(user_id)


    user_interactions = user_movie_sparse[user_idx]


    recommendations = model.recommend(user_idx, user_interactions, N=top_n, filter_already_liked_items=True)


    recommended_movie_ids = [movie[0] for movie in recommendations]


    recommended_movie_titles = movies[movies['movieId'].isin(recommended_movie_ids)]['title'].tolist()

    return recommended_movie_titles


als_recommendations = get_als_recommendations(user_id=1, top_n=10)
print("Top 10 ALS Recommendations:", als_recommendations)

Ratings columns: Index(['userId', 'movieId', 'rating', 'timestamp'], dtype='object')
Movies columns: Index(['movieId', 'title', 'genres'], dtype='object')


  0%|          | 0/15 [00:00<?, ?it/s]

Top 10 ALS Recommendations: []


This code implements a movie recommendation system using Alternating Least Squares (ALS) from the implicit library, which is commonly used for collaborative filtering with implicit feedback data (e.g., ratings). The system takes user ratings from a ratings.csv file, builds a user-item interaction matrix, and then applies ALS to identify latent factors for each user and movie. It provides personalized movie recommendations by predicting movies that a user is likely to enjoy based on their previous ratings and the behavior of similar users. The model is trained on this sparse matrix, and recommendations are generated by filtering out movies already rated by the user. The resulting top-N movie recommendations are then mapped to their titles from the movies.csv dataset.

In [None]:
import pandas as pd
from surprise import Reader, Dataset, SVD
from surprise.model_selection import train_test_split
from surprise import accuracy
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity


In [None]:

movie_ratings = pd.merge(ratings, movies[['movieId', 'title', 'genres']], on='movieId')


print(movie_ratings.head())


   userId  movieId  rating  timestamp                        title  \
0       1        1     4.0  964982703             Toy Story (1995)   
1       1        3     4.0  964981247      Grumpier Old Men (1995)   
2       1        6     4.0  964982224                  Heat (1995)   
3       1       47     5.0  964983815  Seven (a.k.a. Se7en) (1995)   
4       1       50     5.0  964982931   Usual Suspects, The (1995)   

                                        genres  
0  Adventure|Animation|Children|Comedy|Fantasy  
1                               Comedy|Romance  
2                        Action|Crime|Thriller  
3                             Mystery|Thriller  
4                       Crime|Mystery|Thriller  


This code merges the ratings and movies datasets on the movieId column, adding movie titles and genres to the ratings data. The result is a new dataframe, movie_ratings, which is then displayed using **print(movie_ratings.head()).**

In [None]:

reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(movie_ratings[['userId', 'movieId', 'rating']], reader)


trainset, testset = train_test_split(data, test_size=0.2)


svd = SVD()
svd.fit(trainset)

predictions = svd.test(testset)
print("RMSE: ", accuracy.rmse(predictions))


uid = str(1)
iid = str(50)
pred = svd.predict(uid, iid)
print(pred)


RMSE: 0.8645
RMSE:  0.8644647103451446
user: 1          item: 50         r_ui = None   est = 3.50   {'was_impossible': False}


This code implements a movie recommendation system using the SVD (Singular Value Decomposition) algorithm from the Surprise library. It begins by preparing the data using the Reader and Dataset classes, loading the ratings into a format suitable for model training. The data is split into training and test sets using train_test_split. The SVD model is then trained on the training data (trainset) and evaluated on the test data (testset) using Root Mean Squared Error (RMSE) to assess prediction accuracy. Finally, it predicts the rating for a specific user-item pair (user 1, movie 50) and prints the predicted rating.

In [None]:

tfidf = TfidfVectorizer(stop_words='english')


tfidf_matrix = tfidf.fit_transform(movies['genres'])


cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

def get_movie_recommendations(title, cosine_sim=cosine_sim):
    idx = movies[movies['title'] == title].index[0]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:11]
    movie_indices = [i[0] for i in sim_scores]
    return movies['title'].iloc[movie_indices]


recommended_movies = get_movie_recommendations('Toy Story (1995)')
print(recommended_movies)


1706                                          Antz (1998)
2355                                   Toy Story 2 (1999)
2809       Adventures of Rocky and Bullwinkle, The (2000)
3000                     Emperor's New Groove, The (2000)
3568                                Monsters, Inc. (2001)
6194                                     Wild, The (2006)
6486                               Shrek the Third (2007)
6948                       Tale of Despereaux, The (2008)
7760    Asterix and the Vikings (Astérix et les Viking...
8219                                         Turbo (2013)
Name: title, dtype: object



*   This code uses TF-IDF to vectorize movie genres and calculates cosine similarity between movies.
*  The function get_movie_recommendations takes a movie title, finds the most similar movies based on genre similarity, and returns the top 10 recommendations.
* It then prints the recommendations for "Toy Story (1995)  






In [None]:
def hybrid_recommendation(user_id, movie_name, collaborative_model=svd, cosine_sim=cosine_sim):

    collab_pred = collaborative_model.predict(str(user_id), str(movies[movies['title'] == movie_name]['movieId'].values[0]))


    content_recs = get_movie_recommendations(movie_name)

    return collab_pred, content_recs


user_id = 1
movie_name = 'Toy Story (1995)'
collab_pred, content_recs = hybrid_recommendation(user_id, movie_name)
print(f"Collaborative Filtering Prediction: {collab_pred}")
print(f"Content-Based Recommendations: {content_recs}")


Collaborative Filtering Prediction: user: 1          item: 1          r_ui = None   est = 3.50   {'was_impossible': False}
Content-Based Recommendations: 1706                                          Antz (1998)
2355                                   Toy Story 2 (1999)
2809       Adventures of Rocky and Bullwinkle, The (2000)
3000                     Emperor's New Groove, The (2000)
3568                                Monsters, Inc. (2001)
6194                                     Wild, The (2006)
6486                               Shrek the Third (2007)
6948                       Tale of Despereaux, The (2008)
7760    Asterix and the Vikings (Astérix et les Viking...
8219                                         Turbo (2013)
Name: title, dtype: object



*  **Hybrid Recommendation:** Combines collaborative filtering (SVD) and content-based filtering (cosine similarity).
*  **Function:** hybrid_recommendation predicts the rating for a given movie (collab_pred) and returns top 10 similar movies based on genre (content_recs).
Input: Takes user_id and movie_name.
*  **Output:**Prints collaborative filtering prediction and content-based recommendations for the movie "Toy Story (1995)".

In [None]:
def hybrid_recommendation(user_id, movie_name, collaborative_model=svd, cosine_sim=cosine_sim):

    movie_id = movies[movies['title'] == movie_name]['movieId'].values[0]  # Get movieId for collaborative filtering
    collab_pred = collaborative_model.predict(str(user_id), str(movie_id))


    content_recs = get_movie_recommendations(movie_name)

    return collab_pred, content_recs


In [None]:
def get_user_input():

    print("Available Movies: ", movies['title'].head(10).to_list())  # Displaying the first 10 movie titles for user reference
    movie_name = input("Enter a movie title from the list: ")


    if movie_name not in movies['title'].values:
        print("Movie not found. Please enter a valid movie title from the list.")
        return None, None


    user_id = input("Enter your user ID: ")

    try:

        user_id = int(user_id)
    except ValueError:
        print("Invalid user ID. Please enter a valid number.")
        return None, None

    return user_id, movie_name


In [None]:
def run_recommendation_system():
    print("Welcome to the Movie Recommendation System!\n")

    user_id, movie_name = get_user_input()

    if user_id is not None and movie_name is not None:

        collab_pred, content_recs = hybrid_recommendation(user_id, movie_name)


        print(f"\nCollaborative Filtering Prediction for user {user_id} and movie '{movie_name}':")
        print(f"Predicted rating: {collab_pred.est:.2f}")

        print("\nContent-Based Recommendations based on movie '{}':".format(movie_name))
        for i, rec in enumerate(content_recs, 1):
            print(f"{i}. {rec}")
    else:
        print("Please enter valid inputs.")


run_recommendation_system()


Welcome to the Movie Recommendation System!

Available Movies:  ['Toy Story (1995)', 'Jumanji (1995)', 'Grumpier Old Men (1995)', 'Waiting to Exhale (1995)', 'Father of the Bride Part II (1995)', 'Heat (1995)', 'Sabrina (1995)', 'Tom and Huck (1995)', 'Sudden Death (1995)', 'GoldenEye (1995)']
Enter a movie title from the list: Toy Story (1995)
Enter your user ID: 1

Collaborative Filtering Prediction for user 1 and movie 'Toy Story (1995)':
Predicted rating: 3.50

Content-Based Recommendations based on movie 'Toy Story (1995)':
1. Antz (1998)
2. Toy Story 2 (1999)
3. Adventures of Rocky and Bullwinkle, The (2000)
4. Emperor's New Groove, The (2000)
5. Monsters, Inc. (2001)
6. Wild, The (2006)
7. Shrek the Third (2007)
8. Tale of Despereaux, The (2008)
9. Asterix and the Vikings (Astérix et les Vikings) (2006)
10. Turbo (2013)


### **conclusion:**  the hybrid recommendation system effectively combines both collaborative filtering and content-based filtering to offer personalized movie suggestions. By leveraging the strengths of both methods—collaborative filtering for user behavior-based predictions and content-based filtering for genre similarity—the system provides more accurate and diverse recommendations. This approach ensures that users receive suggestions tailored not only to their past preferences but also to the content characteristics of the movies they enjoy.