****Project Brief****
Problem: Choice Overload. Users waste more time searching than reading because of decision paralysis and a "winner-takes-all" market.
Solution: A context-aware engine that replaces popularity bias with situational matching (mood, time, environment e.g. bedtime, 10mins, long holiday/travelling).
Goal: Reduce decision fatigue and unlock the "Long Tail" of publishing/giving niche books the spotlight while helping readers find the perfect book for their current moment.

****Goodreads Webscraping****
Book data required 
- Genre 
- Title 
- Author
- Rating
- Rating counts 
- Description 
- Page numbers 
- ISBN
- Language 
- Published Year 
- Book Cover Image 
- Link to the book 

****Open Library API***
Identifiers: ISBN-13
Physical Specs: Number of pages, physical dimensions, weight, and binding type (Hardcover, mass-market paperback, etc.).

Publishing Info: Publisher name, specific publication date, and series name.

Table of Contents: Often includes a full list of chapters (a feature many other APIs lack).

3. The "Author" Layer
Open Library treats authors as distinct entities with their own metadata.

Biographical Data: Full name, birth/death dates, and a biography.

Identifiers: Links to external authority files like VIAF, Wikidata, and Library of Congress ID.

Photos: Portraits of the author when available.

4. Digital & Community Data
Because Open Library is part of the Internet Archive, it includes unique "living" data:

Availability: Data on whether an eBook version is available to borrow, read online, or download.

Community Activity: User-generated Reading Logs (Want to Read, Currently Reading, Have Read), public Book Lists, and user ratings.

Revision History: Every single change made to a record is stored, meaning you can access previous "versions" of a book's data.

In [1]:

import pandas as pd
from bs4 import BeautifulSoup
import requests

In [None]:
link = 
r = requests.get('https:/')
r.status_code

In [None]:
prices = []
for li in soup.select ("ol.row p.prices_color"):
    prices.append(li.get_text()[1:])
prices

In [None]:
books = pd.DataFrame ({})

In [None]:
soup = BeautifulSoup(r.text, 'html.parser' ) 
soup

In [None]:
import requests
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36'}
r = requests.get("https://", headers=headers)
print (r.status_code)

In [None]:
#Web Scraping of a Website

def scrape_all_books(max_pages=13):
    base_url = "https://www.layerlicensing.com/collab-tracker"
    all_data = []
    
    page_param = "d14182ed_page" 
    
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
    }

    for page in range(1, max_pages + 1):
        print(f"Fetching Page {page}...")
        params = {page_param: page}
        
        try:
            response = requests.get(base_url, headers=headers, params=params, timeout=10)
            if response.status_code != 200:
                print(f"Finished: Page {page} not found.")
                break
            
            soup = BeautifulSoup(response.text, 'html.parser')
            
            items = soup.find_all('div', class_='w-dyn-item')
            
            if not items:
                print("No more items found. Stopping.")
                break
            
            for item in items:
             
                name = item.find('h3').get_text(strip=True) if item.find('h3') else "N/A"
                
                all_data.append({
                    "e1": page,
                    "e2": name,
                    "e3": item.get_text(separator=" | ", strip=True)
                })
                
            time.sleep(1.5)
            
        except Exception as e:
            print(f"Error on page {page}: {e}")
            break

    return pd.DataFrame(all_data)

# Run for as many pages as you need
goodreads_df = scrape_all_collabs(max_pages=20)

# Save the final result
goodreads_df.to_csv("example.csv", index=False)
print(f"Success! Scraped {len(df)} total collaborations.")

In [None]:
#[API] 
#Reading the JikanAPI to see if the call is successful 
jikan_url = "http://discord.jikan.moe"
response = requests.get (jikan_url)
response.status_code

In [None]:
#[API]
#Testing the API call with 3 Manga originated Global Intellectual Property Hits; 'One Piece', 'Kimetsu no Yaiba (Demon Slayers)', 'Dandadan'
def get_manga_info(manga_id):
    url = f"https://api.jikan.moe/v4/manga/{manga_id}/full"
    
    try:
        response = requests.get(url)
        
        if response.status_code != 200:
            print(f"Server Error for ID {manga_id}: {response.status_code}")
            return None
            
        if not response.text.strip():
            print(f"Empty envelope for ID {manga_id}. The server sent nothing.")
            return None
            
        return response.json()['data']
        
    except Exception as e:
        print(f"Connection error for ID {manga_id}: {e}")
        return None

ids = [13, 96792, 135496] # One Piece, Demon Slayer, Dandadan
manga_results = []

for m_id in ids:
    data = get_manga_info(m_id)
    if data:
        manga_results.append(data)
        print(f"Successfully pulled: {data['title']}")
    
    # wait 2 seconds before running it again 
    time.sleep(2)