# Booktuber Bias

Title: “Are Booktubers Overhyping Your DNF Pile?”

🧠 Concept:
Compare ratings from a Booktuber’s recommendation list with your personal ratings (or Goodreads average ratings) to detect possible bias, hype, or mismatch.

✅ Goals:
Identify which books Booktubers loved but the public didn’t—or vice versa.

Spot genre/author preference patterns in Booktuber picks.

Create a fun chart: “Overhyped vs Underrated”

🔧 Tools:
Python (BeautifulSoup or requests + pandas)

Optional: Web scrape goodread/youtube transacript

Plot: Seaborn or Altair

🧩 Steps:
Pick one Booktuber video with a book list (Top 10 of 2023, 5-star reads, etc.).

Extract book titles manually or via web scraping.

Get average Goodreads ratings for each.

Compare with your rating or Goodreads average baseline (e.g., global mean).

Plot a scatterplot: x = Booktuber rating, y = Goodreads rating

🎁 Output:
A heatmap, scatterplot, or quadrant chart:

"Hype Discrepancy: Who’s Overrated, Who’s Underrated?"

🔥 Vibe:
Witty, data-driven, perfect for Instagram stories or a blog post.
Can become a monthly series: "Hype Check: June Edition".

In [1]:
import pandas as pd

# Load the CSV into a DataFrame
df = pd.read_csv("data/jack_edwards_goodread_cleaned.csv")

# Preview the first few rows (optional)
print(df.head())


               Title                Author  Average_Rating Booktuber_Rating  \
0    Wandering Stars         Orange, Tommy            3.84  really liked it   
1          Skipshock  O'Donoghue, Caroline            4.28  really liked it   
2  Glorious Exploits        Lennon, Ferdia            4.15   it was amazing   
3               Gunk            Sams, Saba            3.95   it was amazing   
4         Perfection   Latronico, Vincenzo            3.65  really liked it   

      Date_Read    Date_Added  
0  Jun 10, 2025  Jun 08, 2025  
1  May 27, 2025  May 27, 2025  
2  May 27, 2025  May 27, 2025  
3  Jun 08, 2025  May 25, 2025  
4  May 22, 2025  May 22, 2025  


In [3]:
rating_order = [
    'did not like it',
    'it was ok',
    'liked it',
    'really liked it',
    'it was amazing'
]

df["Booktuber_Rating"] = pd.Categorical(
    df["Booktuber_Rating"],
    categories=rating_order,
    ordered=True
)

rating_map = {
    'did not like it': 1,
    'it was ok': 2,
    'liked it': 3,
    'really liked it': 4,
    'it was amazing': 5
}

df["Booktuber_Rating_Num"] = df["Booktuber_Rating"].map(rating_map)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 934 entries, 0 to 933
Data columns (total 7 columns):
 #   Column                Non-Null Count  Dtype   
---  ------                --------------  -----   
 0   Title                 934 non-null    object  
 1   Author                934 non-null    object  
 2   Average_Rating        934 non-null    float64 
 3   Booktuber_Rating      934 non-null    category
 4   Date_Read             932 non-null    object  
 5   Date_Added            934 non-null    object  
 6   Booktuber_Rating_Num  934 non-null    category
dtypes: category(2), float64(1), object(4)
memory usage: 38.9+ KB


In [4]:
df.head()

Unnamed: 0,Title,Author,Average_Rating,Booktuber_Rating,Date_Read,Date_Added,Booktuber_Rating_Num
0,Wandering Stars,"Orange, Tommy",3.84,really liked it,"Jun 10, 2025","Jun 08, 2025",4
1,Skipshock,"O'Donoghue, Caroline",4.28,really liked it,"May 27, 2025","May 27, 2025",4
2,Glorious Exploits,"Lennon, Ferdia",4.15,it was amazing,"May 27, 2025","May 27, 2025",5
3,Gunk,"Sams, Saba",3.95,it was amazing,"Jun 08, 2025","May 25, 2025",5
4,Perfection,"Latronico, Vincenzo",3.65,really liked it,"May 22, 2025","May 22, 2025",4
