#__User-Based Collaborative Filtering__

Let's explore how to create collaborative filtering based on users.



## Step 1: Import Required Libraries

- Import pandas, NumPy, scipy.stats, seaborn, and cosine_similarity
- Import Matplotlib for visualization


In [None]:
import pandas as pd
import numpy as np
import scipy.stats
import seaborn as sns
from sklearn.metrics.pairwise import cosine_similarity

## Step 2: Load and Preprocess the Data

- Read the CSV files containing the anime and rating data
- Filter the ratings data
- Check the number of ratings, unique users, and unique animes


- We are using 2 datasets here: 
1.   anime
2.   rating



In [None]:
animes = pd.read_csv('anime.csv')

In [None]:
ratings = pd.read_csv('rating.csv')

- In rating, we have a few people who did not rate.
- Let's not consider that.

In [None]:
ratings = ratings[ratings.rating != -1]
ratings.head()

__Observations:__
- Here, you can see a few rows from the rating data.
- The fields are user_id, anime_id, and rating.

In [None]:
animes.head()

__Observations:__
- Here, you can see the top five observations about anime.
- The fields are anime_id, name, genre, type, episodes, rating, and members.

## Step 3: Data Exploration

- Calculate and visualize the average number of anime rated per user
- Calculate and visualize the average number of ratings given per anime


In [None]:
len(ratings)

__Observation:__
- The length of the rating is 509577.

In [None]:
len(ratings['user_id'].unique())

__Observation:__
- There are 5726 old users.

In [None]:
len(animes['anime_id'].unique())

__Observation:__
- There are 12294 unique anime IDs.
- Import statistics
- Group by ratings per user

In [None]:
import statistics
ratings_per_user = ratings.groupby('user_id')['rating'].count()
statistics.mean(ratings_per_user.tolist())

__Observation:__
- On average, there are 88 ratings per user.

Create a per-user rating histogram

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
ratings_per_user.hist(bins=20, range=(0,500))

__Observations:__
- Here, we can see the histogram based on rating per user using 20 bins.
- It is clear from the histogram that the first bin has a higher value.



Now, let's check the rating per anime.

In [None]:
ratings_per_anime = ratings.groupby('anime_id')['rating'].count()
statistics.mean(ratings_per_anime.tolist())

__Observation:__
- The average rating per anime is 72.075.

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
ratings_per_anime.hist(bins=20, range=(0,2500))

__Observation:__
- Here, we can see the histogram based on rating per anime using 20 bins.

Create a DataFrame per anime and a filter
  - Filter ratings greater than 1000 to find a popular anime

In [None]:
ratings_per_anime_df = pd.DataFrame(ratings_per_anime)

filtered_ratings_per_anime_df = ratings_per_anime_df[ratings_per_anime_df.rating >= 1000]

popular_anime = filtered_ratings_per_anime_df.index.tolist()

Create ratings per user DataFrame
  - Filter the ratings greater than 500 and create an index to list prolific users

In [None]:
ratings_per_user_df = pd.DataFrame(ratings_per_user)

filtered_ratings_per_user_df = ratings_per_user_df[ratings_per_user_df.rating >= 500]

prolific_users = filtered_ratings_per_user_df.index.tolist()

## Step 4: Filtering the Data

- Filter the data based on the number of ratings per user and per anime
- Create a pivot table from the filtered data
- Fill NaN values with 0 in the rating matrix

In [None]:
filtered_ratings = ratings[ratings.anime_id.isin(popular_anime)]
filtered_ratings = ratings[ratings.user_id.isin(prolific_users)]
len(filtered_ratings)

In [None]:
rating_matrix = filtered_ratings.pivot_table(index='user_id', columns='anime_id', values='rating')
rating_matrix = rating_matrix.fillna(0)
rating_matrix.head()

__Observation:__
- The anime and user ID details are displayed above.

## Step 5: Define the similar_users Function

- Define a function to find similar_users based on the cosine similarity of their rating vectors

In [None]:
from sklearn.metrics.pairwise import cosine_similarity
import operator
def similar_users(user_id, matrix, k=3):
    user = matrix[matrix.index == user_id]
    
    other_users = matrix[matrix.index != user_id]
    
    similarities = cosine_similarity(user,other_users)[0].tolist()
    
    indices = other_users.index.tolist()
    
    index_similarity = dict(zip(indices, similarities))
    
    index_similarity_sorted = sorted(index_similarity.items(), key=operator.itemgetter(1))
    index_similarity_sorted.reverse()
    
    top_users_similarities = index_similarity_sorted[:k]
    users = [u[0] for u in top_users_similarities]
    
    return users

## Step 6: Test the similar_users Function

- Test the `similar_users` function with a sample user ID


In [None]:
current_user = 226
similar_user_indices = similar_users(current_user, rating_matrix)
print(similar_user_indices)

__Observation:__
- Here, we have similar indices.

## Step 7: Define the recommend_item Function

- Define a function to recommend items (animes) for a user based on the average ratings of similar users
- Test the `recommend_item` function with a sample user ID and the similar user indices obtained in Step 6


In [None]:
def recommend_item(user_index, similar_user_indices, matrix, items=5):
    
    similar_users = matrix[matrix.index.isin(similar_user_indices)]
   
    similar_users = similar_users.mean(axis=0)
    
    similar_users_df = pd.DataFrame(similar_users, columns=['mean'])
    
    user_df = matrix[matrix.index == user_index]

    user_df_transposed = user_df.transpose()
   
    user_df_transposed.columns = ['rating']
    
    user_df_transposed = user_df_transposed[user_df_transposed['rating']==0]
   
    animes_unseen = user_df_transposed.index.tolist()
    
  
    similar_users_df_filtered = similar_users_df[similar_users_df.index.isin(animes_unseen)]
 
    similar_users_df_ordered = similar_users_df.sort_values(by=['mean'], ascending=False)
   
    top_n_anime = similar_users_df_ordered.head(items)
    top_n_anime_indices = top_n_anime.index.tolist()

    anime_information = animes[animes['anime_id'].isin(top_n_anime_indices)]
    
    return anime_information

recommend_item(226, similar_user_indices, rating_matrix)

__Observation:__

The recommendations shown above were given to the user.