# Recommender System
Follow these instructions to complete the mini project.

### Step 1 Download the Dataset
Download the Dataset
Method 1
Download the dataset from the following link:
https://www.kaggle.com/jealousleopard/goodreadsbooks/download

### Step 2 Reading the Dataset
Read the dataset into a Pandas Dataframe!


In [13]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Load dataset
df = pd.read_csv("books.csv", on_bad_lines='skip')
df = df[['title', 'authors', 'average_rating', 'ratings_count']]
df.dropna(inplace=True)

In [14]:
# View structure
print(df[['title', 'authors', 'average_rating', 'ratings_count']].head())

                                               title  \
0  Harry Potter and the Half-Blood Prince (Harry ...   
1  Harry Potter and the Order of the Phoenix (Har...   
2  Harry Potter and the Chamber of Secrets (Harry...   
3  Harry Potter and the Prisoner of Azkaban (Harr...   
4  Harry Potter Boxed Set  Books 1-5 (Harry Potte...   

                      authors  average_rating  ratings_count  
0  J.K. Rowling/Mary GrandPré            4.57        2095690  
1  J.K. Rowling/Mary GrandPré            4.49        2153167  
2                J.K. Rowling            4.42           6333  
3  J.K. Rowling/Mary GrandPré            4.56        2339585  
4  J.K. Rowling/Mary GrandPré            4.78          41428  


## Step 3 Popularity-based Recommender
Create a function named Popularity Recommender and use it to recommend books based on popularity.
Use a weighted rank similar to that used in the IMDB rating example in Lesson 2.

In [15]:
def popularity_recommender(df, top_n=10):
    C = df['average_rating'].mean()
    m = df['ratings_count'].quantile(0.90)

    popular_books = df[df['ratings_count'] >= m].copy()
    
    # IMDb weighted rating formula
    popular_books['score'] = popular_books.apply(
        lambda x: (x['ratings_count'] / (x['ratings_count'] + m)) * x['average_rating'] + 
                  (m / (x['ratings_count'] + m)) * C, axis=1)

    return popular_books.sort_values('score', ascending=False)[['title', 'authors', 'average_rating', 'ratings_count', 'score']].head(top_n)

In [16]:
# test popularity_recommender
print("🔝 Top 10 Popular Books:\n")
print(popularity_recommender(df))

🔝 Top 10 Popular Books:

                                                  title  \
0     Harry Potter and the Half-Blood Prince (Harry ...   
3     Harry Potter and the Prisoner of Azkaban (Harr...   
1     Harry Potter and the Order of the Phoenix (Har...   
4     Harry Potter Boxed Set  Books 1-5 (Harry Potte...   
21    J.R.R. Tolkien 4-Book Boxed Set: The Hobbit an...   
4244                                  The Complete Maus   
6587                     The Complete Calvin and Hobbes   
4254         The Two Towers (The Lord of the Rings  #2)   
4415  Harry Potter and the Chamber of Secrets (Harry...   
288   Fullmetal Alchemist  Vol. 1 (Fullmetal Alchemi...   

                             authors  average_rating  ratings_count     score  
0         J.K. Rowling/Mary GrandPré            4.57        2095690  4.562576  
3         J.K. Rowling/Mary GrandPré            4.56        2339585  4.553447  
1         J.K. Rowling/Mary GrandPré            4.49        2153167  4.483682  
4    

## Step 4 Content-based Recommender
Create a function named Content-based Recommender and use it to recommend books based on content.

TF-IDF Vectorizer

Use TF-IDF Vectorizer on the author data for each book.

Distance matrix

Choose cosine similarity for pairwise distances comparison.

In [17]:
def content_based_recommender(book_title, df, top_n=10):
    # Clean missing values
    df = df.reset_index(drop=True)
    df['authors'] = df['authors'].fillna('Unknown')

    # TF-IDF Vectorizer
    tfidf = TfidfVectorizer(stop_words='english')
    tfidf_matrix = tfidf.fit_transform(df['authors'])

    # Cosine similarity
    cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

    # Map titles to indices
    indices = pd.Series(df.index, index=df['title']).drop_duplicates()

    # Find index for given book title
    idx = indices.get(book_title)

    if idx is None:
        return f"Book '{book_title}' not found."

    # Get pairwise similarity scores
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:top_n + 1]

    book_indices = [i[0] for i in sim_scores]
    return df[['title', 'authors']].iloc[book_indices]
    

In [18]:
# test content_based_recommender
book_to_check = "The Catcher in the Rye"  # Change as needed
print(f"\n📚 Books similar to '{book_to_check}':\n")
print(content_based_recommender(book_to_check, df))


📚 Books similar to 'The Catcher in the Rye':

                                                  title  \
1462                             The Catcher in the Rye   
1464                                   Franny and Zooey   
1465  Raise High the Roof Beam  Carpenters & Seymour...   
3273  Raise High the Roof Beam  Carpenters and Seymo...   
4095  The Catcher in the Rye: Annotations and Study ...   
4096                               Der Fänger im Roggen   
0     Harry Potter and the Half-Blood Prince (Harry ...   
1     Harry Potter and the Order of the Phoenix (Har...   
2     Harry Potter and the Chamber of Secrets (Harry...   
3     Harry Potter and the Prisoner of Azkaban (Harr...   

                           authors  
1462                 J.D. Salinger  
1464                 J.D. Salinger  
1465                 J.D. Salinger  
3273                 J.D. Salinger  
4095  J.D. Salinger/Rudolph F. Rau  
4096  J.D. Salinger/Eike Schönfeld  
0       J.K. Rowling/Mary GrandPré  
1      