# **Recommendation System**

## **Data Description:**
* ### Unique ID of each anime.

* ### Anime title.

* ### Anime broadcast type, such as TV, OVA, etc.

* ### anime genre.

* ### The number of episodes of each anime.

* ### The average rating for each anime compared to the number of users who gave ratings.
* ### Number of community members for each anime.

## **Objective:**

* ### The objective of this assignment is to implement a recommendation system using cosine similarity on an anime dataset.

## **Dataset:**
* ### Use the Anime Dataset which contains information about various anime, including their titles, genres,No.of episodes and user ratings etc.

In [None]:
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MultiLabelBinarizer, StandardScaler
from sklearn.metrics import precision_score, recall_score, f1_score

## **Tasks:**

## **Data Preprocessing:**

## Load the dataset into a suitable data structure (e.g., pandas DataFrame).

In [None]:
# Load the dataset
df = pd.read_csv('/content/anime.csv')

In [None]:
# Display the first few rows of the dataset
print(df.head())

   anime_id                              name  \
0     32281                    Kimi no Na wa.   
1      5114  Fullmetal Alchemist: Brotherhood   
2     28977                          Gintama°   
3      9253                       Steins;Gate   
4      9969                     Gintama&#039;   

                                               genre   type episodes  rating  \
0               Drama, Romance, School, Supernatural  Movie        1    9.37   
1  Action, Adventure, Drama, Fantasy, Magic, Mili...     TV       64    9.26   
2  Action, Comedy, Historical, Parody, Samurai, S...     TV       51    9.25   
3                                   Sci-Fi, Thriller     TV       24    9.17   
4  Action, Comedy, Historical, Parody, Samurai, S...     TV       51    9.16   

   members  
0   200630  
1   793665  
2   114262  
3   673572  
4   151266  


## Handle missing values, if any.

In [None]:
# Handle missing values
df.fillna({'rating': df['rating'].mean(), 'genre': ''}, inplace=True)

## Explore the dataset to understand its structure and attributes.

In [None]:
# Explore the dataset
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12294 entries, 0 to 12293
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   anime_id  12294 non-null  int64  
 1   name      12294 non-null  object 
 2   genre     12294 non-null  object 
 3   type      12269 non-null  object 
 4   episodes  12294 non-null  object 
 5   rating    12294 non-null  float64
 6   members   12294 non-null  int64  
dtypes: float64(1), int64(2), object(4)
memory usage: 672.5+ KB
None


## **Feature Extraction:**

## Convert categorical features into numerical representations if necessary.

In [None]:
# Convert genres into numerical representations
mlb = MultiLabelBinarizer()
df['genre'] = df['genre'].apply(lambda x: x.split(', ') if pd.notna(x) else [])
genres = mlb.fit_transform(df['genre'])

## Decide on the features that will be used for computing similarity (e.g., genres, user ratings).

In [None]:
# Combine numerical features
features = pd.concat([pd.DataFrame(genres, columns=mlb.classes_), df[['rating', 'members']]], axis=1)

## Normalize numerical features if required.

In [None]:
# Normalize numerical features
scaler = StandardScaler()
features[['rating', 'members']] = scaler.fit_transform(features[['rating', 'members']])

## **Recommendation System:**

## Design a function to recommend anime based on cosine similarity.

In [None]:
# Compute cosine similarity
cosine_sim = cosine_similarity(features)

## Given a target anime, recommend a list of similar anime based on cosine similarity scores.

In [None]:
# Function to recommend anime with similarity threshold
def recommend_anime(anime_title, cosine_sim=cosine_sim, df=df, top_n=10, threshold=0.5):
    idx = df[df['name'] == anime_title].index[0]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = [score for score in sim_scores if score[1] > threshold]
    sim_scores = sim_scores[1:top_n+1]
    anime_indices = [i[0] for i in sim_scores]
    return df['name'].iloc[anime_indices]

## Experiment with different threshold values for similarity scores to adjust the recommendation list size.

In [None]:
# Example recommendation with different thresholds
print("Recommendations for 'Naruto' with threshold 0.3:")
print(recommend_anime('Naruto', threshold=0.3))

print("Recommendations for 'Naruto' with threshold 0.5:")
print(recommend_anime('Naruto', threshold=0.5))

print("Recommendations for 'Naruto' with threshold 0.7:")
print(recommend_anime('Naruto', threshold=0.7))

Recommendations for 'Naruto' with threshold 0.3:
615    Naruto: Shippuuden
582                Bleach
86     Shingeki no Kyojin
281          Kill la Kill
159          Angel Beats!
440            Soul Eater
804      Sword Art Online
288            Fairy Tail
643        Ao no Exorcist
40             Death Note
Name: name, dtype: object
Recommendations for 'Naruto' with threshold 0.5:
615    Naruto: Shippuuden
582                Bleach
86     Shingeki no Kyojin
281          Kill la Kill
159          Angel Beats!
440            Soul Eater
804      Sword Art Online
288            Fairy Tail
643        Ao no Exorcist
40             Death Note
Name: name, dtype: object
Recommendations for 'Naruto' with threshold 0.7:
615    Naruto: Shippuuden
582                Bleach
86     Shingeki no Kyojin
281          Kill la Kill
159          Angel Beats!
440            Soul Eater
804      Sword Art Online
288            Fairy Tail
643        Ao no Exorcist
40             Death Note
Name: name, dtype: ob

## **Evaluation:**

## Split the dataset into training and testing sets.

In [None]:
# Split the dataset into training and testing sets
train, test = train_test_split(df, test_size=0.2, random_state=42)

## Evaluate the recommendation system using appropriate metrics such as precision, recall, and F1-score.

In [None]:
# Evaluate the recommendation system
def evaluate_recommendation_system(train, test, cosine_sim, df, top_n=10, threshold=0.5):
    y_true = []
    y_pred = []
    for idx, row in test.iterrows():
        recommended_anime = recommend_anime(row['name'], cosine_sim, df, top_n, threshold)
        for anime in recommended_anime:
            y_true.append(row['name'])
            y_pred.append(anime)
    precision = precision_score(y_true, y_pred, average='micro')
    recall = recall_score(y_true, y_pred, average='micro')
    f1 = f1_score(y_true, y_pred, average='micro')
    return precision, recall, f1

In [None]:
# Evaluate with different thresholds
precision, recall, f1 = evaluate_recommendation_system(train, test, cosine_sim, df, threshold=0.3)
print(f'Precision with threshold 0.3: {precision}, Recall: {recall}, F1-Score: {f1}')

precision, recall, f1 = evaluate_recommendation_system(train, test, cosine_sim, df, threshold=0.5)
print(f'Precision with threshold 0.5: {precision}, Recall: {recall}, F1-Score: {f1}')

precision, recall, f1 = evaluate_recommendation_system(train, test, cosine_sim, df, threshold=0.7)
print(f'Precision with threshold 0.7: {precision}, Recall: {recall}, F1-Score: {f1}')


Precision with threshold 0.3: 0.0001220008133387556, Recall: 0.0001220008133387556, F1-Score: 0.0001220008133387556
Precision with threshold 0.5: 0.0001220008133387556, Recall: 0.0001220008133387556, F1-Score: 0.0001220008133387556
Precision with threshold 0.7: 0.00012339077859581293, Recall: 0.00012339077859581293, F1-Score: 0.00012339077859581293


## Analyze the performance of the recommendation system and identify areas of improvement.

## **Analysis of Performance**

### 1. Threshold 0.3:

**Precision: Precision is likely to be lower because the threshold is lenient, allowing more recommendations, including less relevant ones.**

**Recall: Recall might be higher since more recommendations are considered, increasing the chance of hitting relevant ones.**

**F1-Score: Balances precision and recall. A low precision might drag the F1-score down.**

### 2. Threshold 0.5:

**Precision: Expected to be better than at 0.3 as the threshold filters out less similar anime.**

**Recall: Might decrease slightly as fewer recommendations are made.
F1-Score: Should improve if precision improves more than recall decreases.**

### 3. Threshold 0.7:

**Precision: Should be highest among the three since only highly similar anime are recommended.**

**Recall: Expected to be the lowest since fewer recommendations are made.**

**F1-Score: Can be high if the system finds a balance, but might drop if recall falls significantly.**

## **Areas of Improvement**

### 1. Data Quality:

**Missing Data: Ensure all relevant data is filled accurately. Currently, missing ratings are filled with the mean, which might not be the best strategy.**

**Detailed Genres: Split multi-genre entries for better feature extraction.**

### 2. Feature Selection:

**Additional Features: Include more features like anime broadcast type and number of episodes.**

**User Preferences: Incorporate user-specific features if available (e.g., user ratings for specific anime).**

### 3. Algorithm:

**Hybrid Approach: Combine content-based filtering (like cosine similarity) with collaborative filtering for better recommendations.**

**Advanced Models: Use more sophisticated models like matrix factorization or neural networks.**

### 4. Evaluation Metrics:

**Diversity and Novelty: Ensure the recommendations are not only accurate but also diverse and novel.**

**User Feedback: Incorporate real user feedback for continuous improvement.**

### 5. Threshold Tuning:

**Dynamic Thresholds: Adjust thresholds dynamically based on user profiles or real-time feedback.**

**Personalized Thresholds: Customize thresholds for different user segments.**