An end-to-end pipeline and dashboard of Fluffy Folks' statistics!
This pipeline:
-
Fetches and stores the historical FF's data from Anilist
-
Calculates many interesting statistics that can be viewed from a dashboard
-
Periodically exports a version of
#rankings
channel content (from Fluffy Folks' discord) to accessible sheets -
Created while learning (a lot) about
Airflow
,dbt
, andstreamlit
I had planned to deploy the dashboard-ish end result to streamlit cloud, but it turns out that the dashboard is full of images that viewing it would require lots of images to be downloaded. So, I cancelled that plan and the end result is the data in the sheets. I embed a video of how the streamlit dashboard look like (it's like a pretty simplistic version of Anichart - mostly because streamlit doesn't offer a way to compose complex structures)
-
The statistics seems super interesting!
-
Manual data aggregation in
#rankings
channel seems painful and not as complete -
I'm learning about pipelines, SQLs, and dashboarding stuffs
See TODOs.md
for details of unworked TODOs and bugs
Initially I was just curious about the question, so I ran a query with an already retrieved data (3 months), and at this point I haven't notified my circles yet. Then, BOOM. It just went public just like that, and got positive response
Moved here
-
Most of the commands I use are recorded in the Makefile.
-
Some custom terminal stuff that is quite useful
# Use below command to setup Airflow's command-line autocompletion eval "$(register-python-argcomplete airflow)" # Silence SQLAlchemy deprecation warning # https://stackoverflow.com/a/75109965 export SQLALCHEMY_SILENCE_UBER_WARNING=1
Aspects | Description |
---|---|
AOTY 2022 | Anime of the year; best of several pre-defined categories |
Favourites | Top favourited anime, manga, characters, staff, and studio |
Ranked Anime | Ranked anime based on aggregated ratings |
Ranked Manga | Ranked manga based on aggregated ratings |
Seasonals | Ranked anime but it's viewed from within season perspective |
Potentials | Titles that have potentials to be included in ranked |
Divisive | Some people rated it highly, others rated it badly (high stdev) |
Planning (anticipated) | What's people are eagerly waiting for? |
Dropped | What's most hated... ? |
Current (followed) | What's most followed/trending? |
Questionable | Questionable ratings and titles (e.g. high score given for low rated title) |
A note for my future-self as well!
-
I encountered some pip's dependency error when trying to install
jsonschema
which are used bydbt-core
andairflow
. Try to installairflow
first beforedbt-core
(as the former usesjsonschema
v3, and the latter default to v4.17 but could be made to work with v3) -
Be careful when testing stuff! make sure the
ENVIRONMENT_TYPE
are according to what you're going to be doing, e.g. when you trying to test and there's error when running the command, the env-vars might not be unset, and from that point on, it might cause bugs in your dag runs when you're actually don't want to test -
Careful when trying to explore
dbt
stuff while at the same time runningAirflow
operations, or doing the other two along withstreamlit
assqlite
are single-threaded and would throw an error if two operations are running on it at the same time