Skip to content

Starcraft 2 matches analytics using Aligulac

Notifications You must be signed in to change notification settings

tanhtra/aligulac_analytics

Repository files navigation

There was an attempt at

Aligulac (Starcraft 2 matches) Analytics

Hello friends and family and other creatures of the sea.
This is an end-to-end analytics pipeline for starcraft 2 matches using the Aligulac dataset from Aligulac.

Description

Here we will attempt to explore the state of the game with regards to the pro-gamer's activity. Starcraft 2 have always had a sizeable audiance and activity around it but over the years players have noticed that its popularity is starting to wane. So here we will try to see if that is the case and if SC2 is indeed a dead game.

Tools

The pipeline comprises of multiple parts - most notably are:

  • PostgreSQL to restore the Aligulac database locally (inside a docker container with Mage.AI)
  • Mage.AI as the primary orchestrator and secondary data-transformation tool
  • dbt (Mage-flavoured) as the primary data transformation tool running inside Mage.
  • Terraformq as the IaC tool to manage cloud resources in GCP

Google Cloud Provider is the primary cloud resource being used (for now):

  • BigQuery as the Data Warehouse solution with Raw, Bronze, Silver and Gold stages
  • Cloud Storage bucket as the Data Lake to store various Parquet files and table
  • and, Data Studio as the dashboarding tool

Overall view and steps

Overall look at Aligulac Analytics

  1. The Aligulac Database extract gets downloaded and restored into the local postgresql installation
  2. Mage is used to funnel the extracted database into a parquet file and table format within Google Cloud Storage
  3. dbt-Mage is then used to transform the GCS stored data into BigQuery's Raw, Bronze, Silver and Gold layer
  4. Data in the gold layer can then be consumed by Streamlit cloud

Dashboard and Visuals

Streamlit cloud is being used as a dashboarding solution (for now).
The streamlit dashboard can be accessed here Aligulac Analytics on Steamlit

Installation

.ENV file

Copy and rename the supplied dev.env file to .env

Modify the content of the file if required (default setting listed below)

PROJECT_NAME=aligulac_analytics
POSTGRES_DB=aligulac_db
POSTGRES_USER=admin
POSTGRES_PASSWORD=aligulacdb
POSTGRES_HOST=postgres_aligulac
PG_HOST_PORT=5432

Google Cloud Platform

Create Service account(s) for Mage, Terraform and Streamlit.

For both Mage and Terraform Service Accounts uses the roles listed below:

  • Cloud Storage Admin
  • BigQuery Admin

Export the keys as .json and drop them into their respective locations in

  • /keys/ for the Mage and dbt, next to the mage-dbt_key_location.txt placeholder
  • /terraform/keys/ for Terraform, next to the terraform_key_location.txt placeholder

Streamlit Service Account (optional)

Streamlit will require its own separate service account to access the datalake hosted on GCP's BigQuery.

Please follow the instructions in Connecting Streamlit to BigQuery

Terraform (optional)

Terraform code are located in the /terraform/ directory and a terraform apply will generate the requested resources listed below

  • Google Cloud Storage bucket - aligulac_datalake used to store parquet files and table generated by the pipeline
  • BigQuery datasets
    • raw
    • bronze
    • silver
    • gold

Note: In some installations - the BigQuery API may need to be activated

Mage

The Mage installation are located in /aligulac_analytics/ and a docker compose up to activate the container.

Access the Mage interface via http://localhost:6789

Local Database Seed

In the very likely scenario of the PostgreSQL Db being empty you can generate the required data by downloading the database dump file from aligulac.sql or aligulac.sql.gz

Note: the /aligulac_export/ have been attached to the postgres installation as /tmp/aligulac_export/

wget -P /aligulac_export http://static.aligulac.com/aligulac.sql

And restoring the db via

docker exec -i postgres-aligulac psql -U admin aligulac_db < tmp/aligulac_export/aligulac.sql

or

psql -U admin aligulac_db < tmp/aligulac_export/aligulac.sql

Pipelines

Within Mage there should be two "Core" pipelines.

  • Run the Setup pipeline to move the data from Local PostgreSQL to GCS
  • Run the Primarypipeline to build the Data Warehouse using the files inside GCS

The Mage-DAG uses a mix of MAGE based blocks and dbt (below) Primary Mage-DAG

Primary Mage-DAG completion

Streamlit Cloud (optional)

Follow the Streamlit Cloud sign-up process. Streamlit Community edition (Free)

As mentioned earlier, follow the instructions in Connecting Streamlit to BigQuery

Use the streamlit__aligulac.py file in the repo to generate the Streamlit dashboard.

About

Starcraft 2 matches analytics using Aligulac

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published