# TASK-2:

## **Movie Recommendation System :**
**Build a movie recommendation system
using collaborative filtering and machine
learning techniques in Python.**

In [None]:
# Install necessary libraries
!pip install kaggle
!pip install surprise
!pip install scikit-learn

# Import libraries
import pandas as pd
import numpy as np
from surprise import Dataset, Reader, SVD
from surprise.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
import seaborn as sns


Collecting surprise
  Downloading surprise-0.1-py2.py3-none-any.whl (1.8 kB)
Collecting scikit-surprise (from surprise)
  Downloading scikit_surprise-1.1.4.tar.gz (154 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m154.4/154.4 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: scikit-surprise
  Building wheel for scikit-surprise (pyproject.toml) ... [?25l[?25hdone
  Created wheel for scikit-surprise: filename=scikit_surprise-1.1.4-cp310-cp310-linux_x86_64.whl size=2357255 sha256=58623de98cba6f5004aaebda9c68f953730a5043231754f74cce0ba69fd57535
  Stored in directory: /root/.cache/pip/wheels/4b/3f/df/6acbf0a40397d9bf3ff97f582cc22fb9ce66adde75bc71fd54
Successfully built scikit-surprise
Installing collected packages: scikit-surprise, surprise
Successfully inst

In [None]:
# Upload kaggle.json
from google.colab import files
files.upload()

# Move kaggle.json to the appropriate directory
!mkdir -p ~/.kaggle
!mv kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

# Download the MovieLens dataset
!kaggle datasets download -d grouplens/movielens-20m-dataset

# Unzip the dataset
!unzip movielens-20m-dataset.zip


Saving kaggle.json to kaggle.json
Dataset URL: https://www.kaggle.com/datasets/grouplens/movielens-20m-dataset
License(s): unknown
Downloading movielens-20m-dataset.zip to /content
100% 195M/195M [00:04<00:00, 45.3MB/s]
100% 195M/195M [00:04<00:00, 43.0MB/s]
Archive:  movielens-20m-dataset.zip
  inflating: genome_scores.csv       
  inflating: genome_tags.csv         
  inflating: link.csv                
  inflating: movie.csv               
  inflating: rating.csv              
  inflating: tag.csv                 


In [None]:
# Load dataset
ratings = pd.read_csv('rating.csv')
movies = pd.read_csv('movie.csv')

# Display the first few rows of the datasets
print(ratings.head())
print(movies.head())


   userId  movieId  rating            timestamp
0       1        2     3.5  2005-04-02 23:53:47
1       1       29     3.5  2005-04-02 23:31:16
2       1       32     3.5  2005-04-02 23:33:39
3       1       47     3.5  2005-04-02 23:32:07
4       1       50     3.5  2005-04-02 23:29:40
   movieId                               title  \
0        1                    Toy Story (1995)   
1        2                      Jumanji (1995)   
2        3             Grumpier Old Men (1995)   
3        4            Waiting to Exhale (1995)   
4        5  Father of the Bride Part II (1995)   

                                        genres  
0  Adventure|Animation|Children|Comedy|Fantasy  
1                   Adventure|Children|Fantasy  
2                               Comedy|Romance  
3                         Comedy|Drama|Romance  
4                                       Comedy  


In [None]:
# Prepare the data for Surprise library
reader = Reader(rating_scale=(0.5, 5.0))
data = Dataset.load_from_df(ratings[['userId', 'movieId', 'rating']], reader)

# Split the data into train and test sets
trainset, testset = train_test_split(data, test_size=0.2)


In [None]:
# Build and train the SVD model
model = SVD()
model.fit(trainset)


<surprise.prediction_algorithms.matrix_factorization.SVD at 0x7b997c10f580>

In [None]:
# Make predictions on the test set
predictions = model.test(testset)

# Calculate RMSE
pred_ratings = [pred.est for pred in predictions]
true_ratings = [pred.r_ui for pred in predictions]
rmse = mean_squared_error(true_ratings, pred_ratings, squared=False)
print(f'RMSE: {rmse}')


RMSE: 0.7863508967386441


In [None]:
# Function to get top N recommendations for a given user
def get_top_n_recommendations(user_id, model, movies, n=10):
    user_ratings = ratings[ratings['userId'] == user_id]
    user_rated_movies = user_ratings['movieId'].tolist()
    all_movie_ids = movies['movieId'].tolist()
    unrated_movies = [m for m in all_movie_ids if m not in user_rated_movies]

    predictions = [model.predict(user_id, movie_id) for movie_id in unrated_movies]
    predictions.sort(key=lambda x: x.est, reverse=True)

    top_n_predictions = predictions[:n]
    top_n_movie_ids = [pred.iid for pred in top_n_predictions]
    top_n_movie_titles = movies[movies['movieId'].isin(top_n_movie_ids)]['title'].tolist()

    return top_n_movie_titles

# Get top 10 recommendations for a user
user_id = 1
top_n_recommendations = get_top_n_recommendations(user_id, model, movies)
print(f'Top 10 movie recommendations for user {user_id}:')
for i, movie in enumerate(top_n_recommendations):
    print(f'{i+1}. {movie}')


Top 10 movie recommendations for user 1:
1. Life Is Beautiful (La Vita è bella) (1997)
2. Prime Suspect (1991)
3. Phone Box, The (Cabina, La) (1972)
4. Serenity (2005)
5. Amazing Journey: The Story of The Who (2007)
6. How to Train Your Dragon (2010)
7. Very Potter Musical, A (2009)
8. For the Birds (2000)
9. Frozen Planet (2011)
10. Interstellar (2014)


# **Project Explanation:**

# Purpose of the Code:
The primary purpose of this code is to build a movie recommendation system using collaborative filtering and machine learning techniques in Python. Specifically, it utilizes the Singular Value Decomposition (SVD) algorithm from the Surprise library to recommend movies to users based on their past ratings.

## Libraries and Frameworks Used
1. **Pandas**: For data manipulation and analysis. It provides data structures like DataFrame to work with structured data.
2. **NumPy:** A library for numerical computations in Python.
3. **Surprise:** A Python scikit for building and analyzing recommender systems. It provides various ready-to-use algorithms, including SVD.
4. **Scikit-learn:** A machine learning library in Python, used here for evaluating the model.
5. **Matplotlib and Seaborn:** Libraries for data visualization.

## Technologies Used
1. **Collaborative Filtering:** A technique used by recommender systems to find similarities between users or items and provide recommendations.
2. **Singular Value Decomposition (SVD):** A matrix factorization technique used in collaborative filtering to decompose the user-item interaction matrix into latent factors.
3. **Kaggle API:** Used to download datasets directly from Kaggle.

## Dataset Used
The MovieLens 20M dataset from Kaggle is used in this project. This dataset contains 20 million ratings for 27,000 movies by 138,000 users. It is a popular dataset for building and benchmarking recommendation systems.

## Step-by-Step Explanation
**Step 1: Set Up Environment and Install Necessary Libraries**

In this step, we install the required libraries using pip. The libraries include kaggle for downloading datasets, surprise for building recommendation models, and scikit-learn for evaluation metrics.

**Code:**

```
!pip install kaggle
!pip install surprise
!pip install scikit-learn
```


We then import the necessary libraries for data handling, machine learning, and visualization.

**Code:**
```
import pandas as pd
import numpy as np
from surprise import Dataset, Reader, SVD
from surprise.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
import seaborn as sns
```
**Step 2: Configure Kaggle API and Download Dataset**

We configure the Kaggle API to download the MovieLens dataset. You need to upload your Kaggle API key (kaggle.json) to the Colab environment. The key is used to authenticate and download datasets from Kaggle.

**Code:**
```
Upload kaggle.json
from google.colab import files
files.upload()
```

We then create a directory for the Kaggle key, move the uploaded file to this directory, and set the appropriate permissions.

**Code:**
```
!mkdir -p ~/.kaggle
!mv kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json
```

Next, we download and unzip the MovieLens dataset.

**Code:**
```
!kaggle datasets download -d grouplens/movielens-20m-dataset
!unzip movielens-20m-dataset.zip
```

**Step 3: Load and Preprocess the Data**

We load the ratings and movies data into Pandas DataFrames.

**Code:**
```
ratings = pd.read_csv('rating.csv')
movies = pd.read_csv('movie.csv')
```
Displaying the first few rows of these datasets helps to understand their structure.

**Code:**
```
print(ratings.head())
print(movies.head())
```

**Step 4: Prepare Data for Surprise Library**

The Surprise library requires data in a specific format. We use the Reader class to define the rating scale and load the data from our DataFrame.

**Code:**
```
reader = Reader(rating_scale=(0.5, 5.0))
data = Dataset.load_from_df(ratings[['userId', 'movieId', 'rating']], reader)
```

We then split the data into training and testing sets using an 80-20 split.

**Code:**
```
trainset, testset = train_test_split(data, test_size=0.2)
```

**Step 5: Build and Train the SVD Model**

We initialize the SVD model and train it using the training set.

**Code:**
```
model = SVD()
model.fit(trainset)
```

**Step 6: Evaluate the Model**

To evaluate the model, we make predictions on the test set and calculate the Root Mean Squared Error (RMSE) between the predicted and true ratings. RMSE is a common metric used to measure the accuracy of predicted ratings.

**Code:**
```
predictions = model.test(testset)
pred_ratings = [pred.est for pred in predictions]
true_ratings = [pred.r_ui for pred in predictions]
rmse = mean_squared_error(true_ratings, pred_ratings, squared=False)
print(f'RMSE: {rmse}')
```
**Step 7: Generate Movie Recommendations**

We define a function to generate the top N movie recommendations for a given user. The function identifies movies that the user has not rated yet, predicts ratings for these movies, and returns the top N movies with the highest predicted ratings.

**Code:**
```
def get_top_n_recommendations(user_id, model, movies, n=10):
    user_ratings = ratings[ratings['userId'] == user_id]
    user_rated_movies = user_ratings['movieId'].tolist()
    all_movie_ids = movies['movieId'].tolist()
    unrated_movies = [m for m in all_movie_ids if m not in user_rated_movies]
    
    predictions = [model.predict(user_id, movie_id) for movie_id in unrated_movies]
    predictions.sort(key=lambda x: x.est, reverse=True)
    
    top_n_predictions = predictions[:n]
    top_n_movie_ids = [pred.iid for pred in top_n_predictions]
    top_n_movie_titles = movies[movies['movieId'].isin(top_n_movie_ids)]['title'].tolist()
    
    return top_n_movie_titles
```

We can then use this function to get movie recommendations for a specific user.

**Code:**
```
user_id = 1
top_n_recommendations = get_top_n_recommendations(user_id, model, movies)
print(f'Top 10 movie recommendations for user {user_id}:')
for i, movie in enumerate(top_n_recommendations):
    print(f'{i+1}. {movie}')
```

## Conclusion

This project demonstrates how to build a movie recommendation system using collaborative filtering with the SVD algorithm. The code covers the entire pipeline from data loading and preprocessing to model training, evaluation, and generating recommendations. The MovieLens dataset is used for this purpose, and the Surprise library provides an efficient way to implement and evaluate the recommendation algorithms.