There was an attempt at
Hello friends and family and other creatures of the sea.
This is an end-to-end analytics pipeline for starcraft 2 matches using the Aligulac dataset from Aligulac.
Here we will attempt to explore the state of the game with regards to the pro-gamer's activity. Starcraft 2 have always had a sizeable audiance and activity around it but over the years players have noticed that its popularity is starting to wane. So here we will try to see if that is the case and if SC2 is indeed a dead game.
The pipeline comprises of multiple parts - most notably are:
- PostgreSQL to restore the Aligulac database locally (inside a docker container with Mage.AI)
- Mage.AI as the primary orchestrator and secondary data-transformation tool
- dbt (Mage-flavoured) as the primary data transformation tool running inside Mage.
- Terraformq as the IaC tool to manage cloud resources in GCP
Google Cloud Provider is the primary cloud resource being used (for now):
- BigQuery as the Data Warehouse solution with Raw, Bronze, Silver and Gold stages
- Cloud Storage bucket as the Data Lake to store various Parquet files and table
- and, Data Studio as the dashboarding tool
- The Aligulac Database extract gets downloaded and restored into the local postgresql installation
- Mage is used to funnel the extracted database into a parquet file and table format within Google Cloud Storage
- dbt-Mage is then used to transform the GCS stored data into BigQuery's Raw, Bronze, Silver and Gold layer
- Data in the gold layer can then be consumed by Streamlit cloud
Streamlit cloud is being used as a dashboarding solution (for now).
The streamlit dashboard can be accessed here Aligulac Analytics on Steamlit
Copy and rename the supplied dev.env
file to .env
Modify the content of the file if required (default setting listed below)
PROJECT_NAME=aligulac_analytics
POSTGRES_DB=aligulac_db
POSTGRES_USER=admin
POSTGRES_PASSWORD=aligulacdb
POSTGRES_HOST=postgres_aligulac
PG_HOST_PORT=5432
Create Service account(s) for Mage, Terraform and Streamlit.
For both Mage and Terraform Service Accounts uses the roles listed below:
- Cloud Storage Admin
- BigQuery Admin
Export the keys as .json
and drop them into their respective locations in
/keys/
for the Mage and dbt, next to themage-dbt_key_location.txt
placeholder/terraform/keys/
for Terraform, next to theterraform_key_location.txt
placeholder
Streamlit will require its own separate service account to access the datalake hosted on GCP's BigQuery.
Please follow the instructions in Connecting Streamlit to BigQuery
Terraform code are located in the /terraform/
directory and a terraform apply
will generate the requested resources listed below
- Google Cloud Storage bucket - aligulac_datalake used to store parquet files and table generated by the pipeline
- BigQuery datasets
- raw
- bronze
- silver
- gold
Note: In some installations - the BigQuery API may need to be activated
The Mage installation are located in /aligulac_analytics/
and a docker compose up
to activate the container.
Access the Mage interface via http://localhost:6789
In the very likely scenario of the PostgreSQL Db being empty you can generate the required data by downloading the database dump file from aligulac.sql or aligulac.sql.gz
Note: the /aligulac_export/
have been attached to the postgres installation as /tmp/aligulac_export/
wget -P /aligulac_export http://static.aligulac.com/aligulac.sql
And restoring the db via
docker exec -i postgres-aligulac psql -U admin aligulac_db < tmp/aligulac_export/aligulac.sql
or
psql -U admin aligulac_db < tmp/aligulac_export/aligulac.sql
Within Mage there should be two "Core" pipelines.
- Run the
Setup
pipeline to move the data from Local PostgreSQL to GCS - Run the
Primary
pipeline to build the Data Warehouse using the files inside GCS
The Mage-DAG uses a mix of MAGE based blocks and dbt (below)
Follow the Streamlit Cloud sign-up process. Streamlit Community edition (Free)
As mentioned earlier, follow the instructions in Connecting Streamlit to BigQuery
Use the streamlit__aligulac.py
file in the repo to generate the Streamlit dashboard.