## Knowledge Based Recommenders

In [5]:
# Importing necessary libraries
import pandas as pd
import numpy as np

# Load the datasets
ratings_df = pd.read_csv('Dataset_Rating.csv')
movies_df = pd.read_csv('Dataset_Movie.csv')

# Inspect the datasets
print("Ratings DataFrame:")
print(ratings_df.head())
print("\nMovies DataFrame:")
print(movies_df.head())



Ratings DataFrame:
   User_ID  Rating  Movie_ID
0   712664       5         3
1  1331154       4         3
2  2632461       3         3
3    44937       5         3
4   656399       4         3

Movies DataFrame:
   Movie_ID  Year                          Name
0         1  2003               Dinosaur Planet
1         2  2004    Isle of Man TT 2004 Review
2         3  1997                     Character
3         4  1994  Paula Abdul's Get Up & Dance
4         5  2004      The Rise and Fall of ECW


### 1. Merging Ratings with Movie Information:

```python
# Merge the ratings with movie information to get full details
merged_df = pd.merge(ratings_df, movies_df, on="Movie_ID")
```

- **`ratings_df`**: This DataFrame contains user ratings for movies. It includes columns like `User_ID`, `Rating`, and `Movie_ID`.
- **`movies_df`**: This DataFrame contains information about the movies, including `Movie_ID`, `Year`, and `Name`.
- **`pd.merge(ratings_df, movies_df, on="Movie_ID")`**: This merges the two DataFrames on the `Movie_ID` column. After merging, the resulting DataFrame `merged_df` will contain:
  - `User_ID`, `Rating`, and `Movie_ID` from the ratings dataset.
  - `Year` and `Name` from the movie dataset.
  
This merged DataFrame allows us to have detailed information about the movies along with the ratings users have given.

---

### 2. Recommendation Based on the Same Year:

#### Function Definition:

```python
def recommend_by_year(user_id, n_recommendations=5):
```

This function takes in two parameters:
- `user_id`: The ID of the user for whom the recommendations are being made.
- `n_recommendations`: The number of movie recommendations to return. It defaults to 5.

#### Get the Movies Rated by the User:

```python
# Get the movies rated by the user
user_ratings = ratings_df[ratings_df['User_ID'] == user_id]
```

- **`ratings_df[ratings_df['User_ID'] == user_id]`**: This filters the `ratings_df` DataFrame to get only the rows where the `User_ID` matches the provided `user_id`.
- The result is a DataFrame `user_ratings` that contains all the movies that the specific user has rated.

#### Get the Movie IDs Rated by the User:

```python
# Get the movie IDs rated by the user
rated_movie_ids = user_ratings['Movie_ID'].tolist()
```

- **`user_ratings['Movie_ID']`**: This selects the `Movie_ID` column from the `user_ratings` DataFrame.
- **`.tolist()`**: Converts the `Movie_ID` column into a list of movie IDs that the user has rated.

#### Get the Movie Details for the Rated Movies:

```python
# Get the movie details for the rated movies
rated_movies = movies_df[movies_df['Movie_ID'].isin(rated_movie_ids)]
```

- **`movies_df[movies_df['Movie_ID'].isin(rated_movie_ids)]`**: This filters the `movies_df` DataFrame to get only the rows where the `Movie_ID` is in the list of `rated_movie_ids`. The resulting `rated_movies` DataFrame contains the detailed movie information (e.g., name, year) for all the movies rated by the user.

#### Get the Unique Years of Rated Movies:

```python
# Get the years of the movies rated by the user
rated_movie_years = rated_movies['Year'].unique()
```

- **`rated_movies['Year']`**: This selects the `Year` column from the `rated_movies` DataFrame.
- **`.unique()`**: This returns an array of unique years from the movies the user has rated. This will help us identify the years of interest for recommending other movies from the same years.

#### Filter the Movie Dataset for Movies from the Same Year:

```python
# Filter the movie dataset to get movies from the same year
recommended_movies = movies_df[movies_df['Year'].isin(rated_movie_years)]
```

- **`movies_df[movies_df['Year'].isin(rated_movie_years)]`**: This filters the `movies_df` DataFrame to get only the movies that were released in the years present in `rated_movie_years`. The result is the `recommended_movies` DataFrame, which contains all movies from the same years as the movies rated by the user.

#### Remove Movies the User Has Already Rated:

```python
# Remove movies the user has already rated
recommended_movies = recommended_movies[~recommended_movies['Movie_ID'].isin(rated_movie_ids)]
```

- **`~recommended_movies['Movie_ID'].isin(rated_movie_ids)`**: The `~` operator negates the boolean values returned by `isin()`. This means it will select movies whose `Movie_ID` is **not** in the list of `rated_movie_ids`, ensuring that the recommended movies are new (i.e., the user has not already rated them).
- This step ensures that the user doesn't get recommendations for movies they've already rated.

#### Return the Top `n` Recommendations:

```python
# Return the top n recommendations
return recommended_movies[['Name', 'Year']].head(n_recommendations)
```

- **`recommended_movies[['Name', 'Year']]`**: This selects only the `Name` and `Year` columns from the `recommended_movies` DataFrame, as we only want to display the movie names and their release years in the recommendations.
- **`.head(n_recommendations)`**: This returns the top `n` rows from the filtered `recommended_movies` DataFrame. The number of recommendations returned is controlled by the `n_recommendations` parameter, which defaults to 5.

---


In [6]:
# Merge the ratings with movie information to get full details
merged_df = pd.merge(ratings_df, movies_df, on="Movie_ID")

# 1. Recommendation based on the same year
def recommend_by_year(user_id, n_recommendations=5):
    # Get the movies rated by the user
    user_ratings = ratings_df[ratings_df['User_ID'] == user_id]
    
    # Get the movie IDs rated by the user
    rated_movie_ids = user_ratings['Movie_ID'].tolist()
    
    # Get the movie details for the rated movies
    rated_movies = movies_df[movies_df['Movie_ID'].isin(rated_movie_ids)]
    
    # Get the years of the movies rated by the user
    rated_movie_years = rated_movies['Year'].unique()
    
    # Filter the movie dataset to get movies from the same year
    recommended_movies = movies_df[movies_df['Year'].isin(rated_movie_years)]
    
    # Remove movies the user has already rated
    recommended_movies = recommended_movies[~recommended_movies['Movie_ID'].isin(rated_movie_ids)]
    
    # Return the top n recommendations
    return recommended_movies[['Name', 'Year']].head(n_recommendations)

# Example: Recommend movies for user 712664 based on the same year
user_recommendations_year = recommend_by_year(712664)
print("\nMovies recommended for user 712664 based on the same year:")
print(user_recommendations_year)



Movies recommended for user 712664 based on the same year:
                           Name  Year
0               Dinosaur Planet  2003
1    Isle of Man TT 2004 Review  2004
3  Paula Abdul's Get Up & Dance  1994
4      The Rise and Fall of ECW  2004
5                          Sick  1997



### 2. **Recommendation Based on Similar Ratings**:

#### Get the Movies Rated by the User:

```python
# Get the movies rated by the user
user_ratings = ratings_df[ratings_df['User_ID'] == user_id]
```

- **`ratings_df[ratings_df['User_ID'] == user_id]`**: This filters the `ratings_df` DataFrame to select only the rows where the `User_ID` matches the provided `user_id`. The result, `user_ratings`, is a DataFrame containing all the movies that the specific user has rated, along with their ratings.

#### Get the Movie IDs Rated by the User:

```python
# Get the movie IDs rated by the user
rated_movie_ids = user_ratings['Movie_ID'].tolist()
```

- **`user_ratings['Movie_ID']`**: This selects the `Movie_ID` column from the `user_ratings` DataFrame, which contains the IDs of the movies rated by the user.
- **`.tolist()`**: This converts the `Movie_ID` column into a list of movie IDs that the user has rated. This list is stored in `rated_movie_ids`.

#### Get the Movie Details for the Rated Movies:

```python
# Get the movie details for the rated movies
rated_movies = movies_df[movies_df['Movie_ID'].isin(rated_movie_ids)]
```

- **`movies_df[movies_df['Movie_ID'].isin(rated_movie_ids)]`**: This filters the `movies_df` DataFrame to get only the movies whose `Movie_ID` is in the `rated_movie_ids` list. The result is a DataFrame `rated_movies` that contains the details (e.g., `Name`, `Year`) of the movies the user has rated.

#### Get the Ratings of the Movies Rated by the User:

```python
# Get the ratings of the movies rated by the user
rated_movie_ratings = rated_movies.merge(user_ratings, on="Movie_ID", how="left")
```

- **`rated_movies.merge(user_ratings, on="Movie_ID", how="left")`**: This merges the `rated_movies` DataFrame with the `user_ratings` DataFrame on the `Movie_ID` column. The `how="left"` ensures that all movies from `rated_movies` are retained, and corresponding ratings from `user_ratings` are included.
- The result, `rated_movie_ratings`, contains detailed information about the movies the user rated along with the user’s rating for each movie.

#### Find Movies with Similar Ratings:

```python
# Find movies with a similar rating
similar_ratings = ratings_df[ratings_df['Rating'].isin(rated_movie_ratings['Rating'])]
```

- **`ratings_df[ratings_df['Rating'].isin(rated_movie_ratings['Rating'])]`**: This filters the `ratings_df` DataFrame to find all movies that have been rated with the same ratings that the user has given. The `isin(rated_movie_ratings['Rating'])` checks if the rating in `ratings_df` matches any of the ratings in `rated_movie_ratings['Rating']`.
- This results in the `similar_ratings` DataFrame, which contains all the movies that share the same ratings as the movies rated by the user.

#### Get the Movie Names of the Similar Ratings:

```python
# Merge to get the movie names of the similar ratings
recommended_movies = movies_df[movies_df['Movie_ID'].isin(similar_ratings['Movie_ID'])]
```

- **`movies_df[movies_df['Movie_ID'].isin(similar_ratings['Movie_ID'])]`**: This filters the `movies_df` DataFrame to get all the movies that are present in the `similar_ratings` DataFrame. The result, `recommended_movies`, contains the names and details of movies that have the same ratings as the ones rated by the user.

#### Remove Movies the User Has Already Rated:

```python
# Remove movies the user has already rated
recommended_movies = recommended_movies[~recommended_movies['Movie_ID'].isin(rated_movie_ids)]
```

- **`~recommended_movies['Movie_ID'].isin(rated_movie_ids)`**: The `~` operator negates the boolean values returned by `isin()`. This ensures that we filter out movies that the user has already rated.
- The result is a list of recommended movies that the user has not yet rated, but which share similar ratings to those they have rated.

#### Return the Top `n` Recommendations:

```python
# Return the top n recommendations
return recommended_movies[['Name', 'Year']].head(n_recommendations)
```

- **`recommended_movies[['Name', 'Year']]`**: This selects only the `Name` and `Year` columns from the `recommended_movies` DataFrame, as we are interested in displaying the movie name and year in the recommendations.
- **`.head(n_recommendations)`**: This returns the top `n` recommendations, where `n_recommendations` is the number of movies to recommend. The default is 5.

---

In [8]:

# 2. Recommendation based on similar ratings
def recommend_by_rating(user_id, n_recommendations=5):
    # Get the movies rated by the user
    user_ratings = ratings_df[ratings_df['User_ID'] == user_id]
    
    # Get the movie IDs rated by the user
    rated_movie_ids = user_ratings['Movie_ID'].tolist()
    
    # Get the movie details for the rated movies
    rated_movies = movies_df[movies_df['Movie_ID'].isin(rated_movie_ids)]
    
    # Get the ratings of the movies rated by the user
    rated_movie_ratings = rated_movies.merge(user_ratings, on="Movie_ID", how="left")
    
    # Find movies with a similar rating
    similar_ratings = ratings_df[ratings_df['Rating'].isin(rated_movie_ratings['Rating'])]
    
    # Merge to get the movie names of the similar ratings
    recommended_movies = movies_df[movies_df['Movie_ID'].isin(similar_ratings['Movie_ID'])]
    
    # Remove movies the user has already rated
    recommended_movies = recommended_movies[~recommended_movies['Movie_ID'].isin(rated_movie_ids)]
    
    # Return the top n recommendations
    return recommended_movies[['Name', 'Year']].head(n_recommendations)

# Example: Recommend movies for user 712664 based on similar ratings
user_recommendations_rating = recommend_by_rating(712664)
print("\nMovies recommended for user 712664 based on similar ratings:")
print(user_recommendations_rating)

# 3. Recommendation based on movies that are not rated by the user
def recommend_unrated_movies(user_id, n_recommendations=5):
    # Get the movies rated by the user
    user_ratings = ratings_df[ratings_df['User_ID'] == user_id]
    
    # Get the movie IDs rated by the user
    rated_movie_ids = user_ratings['Movie_ID'].tolist()
    
    # Get the movie dataset to recommend movies that have not been rated by the user
    unrated_movies = movies_df[~movies_df['Movie_ID'].isin(rated_movie_ids)]
    
    # Return the top n recommendations
    return unrated_movies[['Name', 'Year']].head(n_recommendations)



Movies recommended for user 712664 based on similar ratings:
                          Name  Year
7   What the #$*! Do We Know!?  2004
15                   Screamers  1996
16                   7 Seconds  2005
27             Lilo and Stitch  2002
29      Something's Gotta Give  2003


In [9]:

# Example: Recommend unrated movies for user 712664
user_recommendations_unrated = recommend_unrated_movies(712664)
print("\nMovies recommended for user 712664 that have not been rated yet:")
print(user_recommendations_unrated)

# 4. Recommendation based on the most popular movies (i.e., movies with highest ratings)
def recommend_popular_movies(n_recommendations=5):
    # Find the most popular movies based on ratings
    movie_avg_ratings = ratings_df.groupby('Movie_ID')['Rating'].mean()
    
    # Merge the ratings with movie information to get full details
    popular_movies = movie_avg_ratings.sort_values(ascending=False).head(n_recommendations)
    
    # Merge with movies_df to get movie names and years
    recommended_movies = movies_df[movies_df['Movie_ID'].isin(popular_movies.index)]
    
    # Return the top n popular recommendations
    return recommended_movies[['Name', 'Year']]

# Example: Recommend popular movies
popular_movies_recommendations = recommend_popular_movies()
print("\nMost popular movies:")
print(popular_movies_recommendations)



Movies recommended for user 712664 that have not been rated yet:
                           Name  Year
0               Dinosaur Planet  2003
1    Isle of Man TT 2004 Review  2004
3  Paula Abdul's Get Up & Dance  1994
4      The Rise and Fall of ECW  2004
5                          Sick  1997

Most popular movies:
                                       Name  Year
1475               Six Feet Under: Season 4  2004
2101                 The Simpsons: Season 6  1994
3443  Family Guy: Freakin' Sweet Collection  2004
3455                         Lost: Season 1  2004
4237                              Inu-Yasha  2000
