-
Notifications
You must be signed in to change notification settings - Fork 0
Home
The goal of ficscraper
is to provide fanfiction readers with a way to generate & interpret stats on their reading habits on websites that provide none. For example:
- How many words of Harry Potter fanfiction did I read in the year 20XX?
- What is the ranking of authors I've read the most from (either word count or # of fics-wise)?
- For each fandom I read this year, what was the order I started reading them in, and which fic did I read from them first?
- Based on the tags of all the fics I've read, what would my "ideal fic" look like?
And so on. Fanfiction itself is a labor of love and I genuinely hope that ficscraper
can provide you with some interesting ways to investigate your own personal relationship with it.
- First follow the instructions on the Installation page.
- After that's all set up, follow the instructions on the How to Use page.
Currently ficscraper
only supports stats on Archive of Our Own (aka AO3). This is due to AO3's rich tagging system that allows significant more flexibility in finding patterns in fics read.
Other fanfiction websites such as FanFiction.net (FFN), Wattpad, and Tumblr blogs dedicated to fic writing are considered out-of-scope for this project until I feel ficscraper
's AO3 side is sufficiently developed. I more than welcome discussion on implementation of ficscraper
for other websites though!
ficscraper
works in three stages:
-
Extract user's interactions with fic, such as:
- reading history
- kudos history
- personal bookmarks & tags
-
Transfer & load the collected information into SQLite, an extremely handy no-installation-needed/in-memory/embedded database management system.
- One could actually stop at this stage if they want to begin running stats on their interactions. See here for example queries you can run against SQLite.
- Visualize certain types of interactions into something nicely readable for humans (and can be shared)!
Please submit legitimate bugs/errors with ficscraper
to Issues (don't forget to add the bug
label), and all other suggestsions/questions to Discussions.
Q. Why didn't you make a website and have it run ficscraper for me instead? I don't want to have to do all this coding work, and it'd be nice if I could just login and see my stats rather than have to do upkeep myself.
A. A couple key reasons.
- I'm setting up a session with AO3 by literally scraping the authentication token and using it for the whole session. Furthermore, I'm requiring plaintext username & passwords to even grab said token. This is frankly way out of my comfort zone to even think about putting on a website - I'm not fluent in implementing website security, and I don't want to be on the hook for your account getting hacked.
- AO3 rate limits approximately 20-80 requests per 10 minutes. This is perfectly fine when you're slowly reading through a multichapter fic - it's less fine when there are 200+ pages of bookmarks
ficscraper
is trying to grab. Multiply it out to multiple users and you can quickly see how the throughput of this falls through the floor.
Q. Why Python 3.9?
A. No real reason; it was newish at the time of implementation and bs4 is pretty straightforward to use in Python.