Python script to scrap www.goodreads.com books shelves.
-
Updated
Jan 19, 2024 - Python
Python script to scrap www.goodreads.com books shelves.
Old book pages (with groundtruth), formerly used for OCR studies. There are several versions of the set (concerning resolution and binarization). Noised and denoised sets (done by several methods) are eventually going to be uploaded.
This project combines web scraping and exploratory data analysis (EDA) on book listings from "Books to Scrape". Scraped 1000 books using Python, cleaned the dataset, and uncovered insights on categories, pricing trends, and availability using Pandas and Seaborn. Part of my Data Analyst portfolio series.
Hackathon conducted on Machinehack to predict the price of books find which feature are affecting the price more and train your model with your choice of machine learning model.
Add a description, image, and links to the books-dataset topic page so that developers can more easily learn about it.
To associate your repository with the books-dataset topic, visit your repo's landing page and select "manage topics."