Skip to content

trumandaniels/bookscraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bookscraper

a web scraper / web crawler for books.toscrape.com

This program takes online book data on books.toscrape.com and stores it in a local SQLite database. Developed with Linux Mint and Python 3.8. I've written a full write up on my website at https://trumandaniels.com/2022/04/15/building-a-webscraper-with-beautifulsoup/

Quick file rundown:

  • bookscraper.py ~ where all the magic happens: this file contains all the scraping logic and function documentation
  • test.py ~ a small unit test suite
  • books.db ~ the SQLite database where everything is stored

Set Up

First (Optional) Step: create and activate python virtual environemnt or anaconda environment first (skipping this step could cause package dependency/compatibility issues if your machine is used for many projects)

Changing the working directory to wherever you extract the downloaded folder to

truman@laptop ~ $ cd /path/to/extracted/bookscraper-main

Creating python virtual environment:

either (after installing virtualenv) choosing a specific python version

truman@laptop ~/bookscraper-main ~ $  virtualenv --python=/usr/bin/python3.9 /path/to/new/environment/VIRTUALENVNAME 

or using whatever your PATH is set to (probably fine)

truman@laptop ~/bookscraper-main ~ $ python3 -m venv /path/to/new/environment/VIRTUALENVNAME 

You can then activate the environment using:

truman@laptop ~/bookscraper-main ~ $ source /path/to/new/environment/VIRTUALENVNAME/bin/activate

Then install neccesary packages to run the scripts

(VIRTUALENVNAME) truman@laptop ~/bookscraper-main ~ $ pip install -r requirements.txt 

Run Tests:

(VIRTUALENVNAME) truman@laptop ~/bookscraper-main ~ $ pytests tests.py 
collected 2 items                                                              

tests.py ..                                                              [100%]
============================== 2 passed in 0.19s ==============================

Example use:

(VIRTUALENVNAME) truman@laptop ~/bookscraper-main ~ $ python bookscraper.py
Row appended: 1
Row appended: 2
Row appended: 3
...
Row appended: 1000
Done scraping

Sources:

About

Web Scraper / Web Crawler for books.toscrape.com

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages