<a href="https://colab.research.google.com/github/vidhishhah/BOOK-trope-analyzer/blob/main/BOOK.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [9]:
import requests
from bs4 import BeautifulSoup

# This URL goes to the 'Romance' genre page directly
url = 'https://www.goodreads.com/shelf/show/romance'

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36'
}

response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')

# In the 'Shelf' view, titles are inside <a> tags with class 'bookTitle'
titles = []
for title in soup.find_all('a', class_='bookTitle'):
    titles.append(title.get_text().strip())

# Clean up: Sometimes the title includes the author, let's just take the first 10
print(f"Success! Found {len(titles)} actual Romance books.")
for i, t in enumerate(titles[:100], 1):
    print(f"{i}. {t}")



Success! Found 50 actual Romance books.
1. The Love Hypothesis (Paperback)
2. Beach Read (Paperback)
3. Book Lovers (Paperback)
4. Pride and Prejudice (Hardcover)
5. Red, White & Royal Blue (Paperback)
6. It Ends with Us (It Ends with Us, #1)
7. People We Meet on Vacation (Paperback)
8. The Fault in Our Stars (Hardcover)
9. Twilight (The Twilight Saga, #1)
10. The Hating Game (Paperback)
11. A Court of Thorns and Roses (A Court of Thorns and Roses, #1)
12. The Unhoneymooners (Unhoneymooners, #1)
13. Happy Place (Hardcover)
14. The Seven Husbands of Evelyn Hugo (Hardcover)
15. The Kiss Quotient (The Kiss Quotient, #1)
16. Funny Story (Kindle Edition)
17. Twisted Love (Twisted, #1)
18. Ugly Love (Kindle Edition)
19. The Spanish Love Deception (Love Deception, #1)
20. Icebreaker (UCMH, #1)
21. Love, Theoretically (Paperback)
22. The Deal (Off-Campus, #1)
23. A Court of Mist and Fury (A Court of Thorns and Roses, #2)
24. The Duke and I (Bridgertons, #1)
25. Love on the Brain (Kindle Editio

In [12]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import time

# Dictionary you created (shortened here for brevity)
romance_tropes_keywords =  {
    'Enemies to Lovers': ['enemies', 'rivals', 'foe', 'hate', 'antagonist', 'forbidden', 'rivalry', 'opponent'],
    'Grumpy Sunshine': ['grumpy', 'sunshine', 'curmudgeon', 'optimist', 'bright', 'cheery', 'moody', 'positive'],
    'Fake Dating': ['fake dating', 'pretend relationship', 'fake boyfriend', 'fake girlfriend', 'arranged', 'agreement', 'charade'],
    'Second Chance': ['second chance', 'rekindle', 'ex-lovers', 'old flame', 'reunion', 'past love'],
    'Billionaire Romance': ['billionaire', 'wealthy', 'rich', 'CEO', 'mogul', 'empire', 'money'],
    'Office Romance': ['office', 'coworker', 'colleague', 'workplace', 'boss', 'secretary', 'cubicle'],
    'Friends to Lovers': ['friends', 'best friend', 'platonic', 'childhood friends', 'confidant', 'buddy'],
    'Love Triangle': ['love triangle', 'two men', 'two women', 'choice', 'dilemma', 'rivals in love'],
    'Forbidden Love': ['forbidden love', 'taboo', 'secret', 'clandestine', 'defy', 'social rules', 'forbidden'],
    'Age Gap': ['age gap', 'older man', 'younger woman', 'older woman', 'younger man', 'different ages'],
    'Small Town Romance': ['small town', 'countryside', 'rural', 'community', 'close-knit', 'quaint'],
    'Sports Romance': ['sports', 'athlete', 'team', 'coach', 'competition', 'player', 'gym', 'arena'],
    'Supernatural Romance': ['supernatural', 'vampire', 'werewolf', 'witch', 'magic', 'ghost', 'mythical', 'paranormal'],
    'Marriage of Convenience': ['marriage of convenience', 'arranged marriage', 'contract marriage', 'agreement', 'necessity'],
    'Bad Boy/Good Girl': ['bad boy', 'good girl', 'rebel', 'innocent', 'troublemaker', 'sweet', 'opposites attract']
}

url = 'https://www.goodreads.com/shelf/show/romance'
headers = {'User-Agent': 'Mozilla/5.0'}

response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')

# 1. Find the book containers
book_containers = soup.find_all('div', class_='elementList')
book_data = []

print("Starting deep analysis of book descriptions...")

# 2. Loop through the first 10 books (start small to test!)
for container in book_containers[:50]:
    title_tag = container.find('a', class_='bookTitle')
    title = title_tag.text.strip()
    book_link = "https://www.goodreads.com" + title_tag['href']

    # 3. Visit the individual book page to get the description
    book_page = requests.get(book_link, headers=headers)
    book_soup = BeautifulSoup(book_page.text, 'html.parser')

    # Goodreads descriptions are usually in a div with data-testid="description"
    desc_tag = book_soup.find('div', {'data-testid': 'description'})
    description = desc_tag.text.lower() if desc_tag else ""

    # 4. Check for Tropes using your dictionary
    found_trope = "General Romance"
    for trope, keywords in romance_tropes_keywords.items():
        if any(word in description for word in keywords):
            found_trope = trope
            break

    book_data.append({
        'Title': title,
        'Trope': found_trope,
        'Link': book_link
    })

    print(f"Analyzed: {title} -> Trope: {found_trope}")
    time.sleep(1) # Be nice to the website!

# 5. Create DataFrame
df = pd.DataFrame(book_data)
df.to_excel("Romance_Trope_Report.xlsx", index=False)
print("\nSuccess! Your Excel report is ready.")

Starting deep analysis of book descriptions...
Analyzed: The Love Hypothesis (Paperback) -> Trope: Grumpy Sunshine
Analyzed: Beach Read (Paperback) -> Trope: General Romance
Analyzed: Book Lovers (Paperback) -> Trope: Enemies to Lovers
Analyzed: Pride and Prejudice (Hardcover) -> Trope: General Romance
Analyzed: Red, White & Royal Blue (Paperback) -> Trope: Friends to Lovers
Analyzed: It Ends with Us (It Ends with Us, #1) -> Trope: Office Romance
Analyzed: People We Meet on Vacation (Paperback) -> Trope: Friends to Lovers
Analyzed: The Fault in Our Stars (Hardcover) -> Trope: General Romance
Analyzed: Twilight (The Twilight Saga, #1) -> Trope: Grumpy Sunshine
Analyzed: The Hating Game (Paperback) -> Trope: Enemies to Lovers
Analyzed: A Court of Thorns and Roses (A Court of Thorns and Roses, #1) -> Trope: Supernatural Romance
Analyzed: The Unhoneymooners (Unhoneymooners, #1) -> Trope: Office Romance
Analyzed: Happy Place (Hardcover) -> Trope: Friends to Lovers
Analyzed: The Seven Husban

# Task
Define a dictionary of common romance tropes and assign keywords to each trope. Confirm the list of defined tropes and keywords and discuss how they will be used for book categorization.

## Define Romance Tropes and Keywords

### Subtask:
Create a dictionary of common romance tropes and assign relevant keywords to each trope. This will be the basis for categorization.


**Reasoning**:
To define the common romance tropes and their associated keywords, I need to create a Python dictionary as instructed. This dictionary will serve as the basis for categorization later.



**Reasoning**:
To confirm the list of defined tropes and keywords, I will display the content of the `romance_tropes_keywords` dictionary.



### How the Tropes and Keywords will be used for Book Categorization

The `romance_tropes_keywords` dictionary serves as the foundation for categorizing romance books based on their underlying themes. For each book, we will extract relevant text (e.g., title, description, summary).

Then, for each trope defined in the dictionary, we will check if any of its associated keywords appear within the extracted book text. If a book's text contains one or more keywords from a specific trope, that book will be categorized under that trope. A single book can be associated with multiple tropes if its text contains keywords from various trope categories.

## Final Task

### Subtask:
Confirm the list of defined tropes and keywords, and discuss how they will be used for book categorization.


## Summary:

### Q&A
The list of defined tropes and keywords has been confirmed. The created dictionary, `romance_tropes_keywords`, contains 15 common romance tropes such as 'Enemies to Lovers', 'Grumpy Sunshine', and 'Fake Dating', each with a list of associated keywords.

These tropes and keywords will be used for book categorization by extracting relevant text from books (e.g., title, description). For each book, the system will check if any of the defined keywords from a specific trope appear in its text. If keywords are found, the book will be categorized under that trope. A single book can be associated with multiple tropes if its text contains keywords from various categories.

### Data Analysis Key Findings
*   A Python dictionary named `romance_tropes_keywords` was successfully created, containing 15 common romance tropes.
*   Each trope is assigned a list of relevant keywords; for example, the 'Enemies to Lovers' trope includes keywords such as 'enemies', 'rivals', 'foe', and 'hate'.
*   The complete dictionary was displayed and confirmed, ensuring the accurate definition of tropes and their associated keywords.

### Insights or Next Steps
*   This structured dictionary provides a robust foundation for automated romance book categorization based on thematic elements.
*   The next step involves applying this categorization methodology to actual book data by parsing book descriptions and titles to identify and assign relevant tropes.
