A Python toolkit for scraping web novel chapters and generating self-contained offline HTML readers with compressed content.
This project consists of two main components:
-
Scraper (
scraper.py) — Scrapes novel chapters from web novel sites using Selenium with stealth capabilities to bypass anti-bot protection. The scraper navigates to a novel's web page, extracts chapter titles and content, then follows "next chapter" links automatically. Chapters are saved to a local SQLite database. -
Reader Generator (
generate_reader.py) — Loads chapters from the database, gzip-compresses each chapter's content, and embeds them as base64-encoded data inside a Jinja2-rendered HTML file. The resulting single-file HTML reader uses the Compression Streams API to decompress chapters on the fly in the browser — making it lightweight and fully offline.
- Python 3.8+
- Google Chrome (for scraping)
Jinja2==3.1.6
selenium==4.40.0
selenium_stealth==1.0.6
webdriver_manager==4.0.2
git clone https://github.com/veckencshtein/novel-reader-fork.git
cd novel-reader-fork
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt# Using command-line arguments
python scraper.py --title "my_novel" --url "https://example.com/novel/123"
# Resume from a specific chapter
python scraper.py -t "my_novel" -u "https://..." -s 50
# Debug mode (prints content without saving)
python scraper.py -t "my_novel" -u "https://..." --debug
# Use default values defined in the script
python scraper.py| Flag | Description |
|---|---|
-t, --title |
Novel title (used as the database table name) |
-u, --url |
Starting URL of the novel |
-s, --start |
Chapter number to resume from (default: 0) |
-d, --db |
Database file path (default: novel.db) |
--debug |
Print scraped content without saving |
-v, --verbose |
Enable debug-level logging |
# Interactive mode — pick a novel and configure options
python generate_reader.py
# Non-interactive with all options
python generate_reader.py -d novel.db -t "my_novel" -r pre -o output
# Split output by file size (default 10MB per file)
python generate_reader.py -t "my_novel" -s
# Split with a custom size limit (5MB)
python generate_reader.py -t "my_novel" -s 5
# Split by chapter count (100 chapters per file)
python generate_reader.py -t "my_novel" --split-by-chapters 100Generated HTML files are saved to the output/ directory by default.
- Stealth mode to bypass anti-bot protection
- Automatic ChromeDriver management
- Resumable scraping from any chapter
- File splitting by size or chapter count for upload-limited platforms
- Dark mode with theme persistence
- Adjustable font size (12px–32px)
- Searchable chapter dropdown with keyboard navigation
- Chapter prefetching for faster navigation
- Mobile-friendly responsive design
novel-reader-fork/
├── scraper.py # Web scraper for novel chapters
├── generate_reader.py # HTML reader generator
├── utils.py # Shared utilities (DatabaseManager, Chapter)
├── requirements.txt # Python dependencies
└── templates/
├── reader.html.j2 # Jinja2 HTML template
├── script.js # Reader JavaScript (navigation, decompression, UI)
└── styles.css # Reader styles
This project is currently not licensed.