Spotify Airflow Batch Ingestion Project

Tools & Tech Stack: Python, Apache Airflow, Terraform, AWS

TLDR: Ingest data from Spotify's API(no SDK used) using Apache Airflow as Orchestrator

If you listen to songs on Spotify, how would you like to know more about your listening traits or habits such as:

what song do you listen to the most
Which artist comes up more in the songs you listen to
What are the audio characters of the songs you listen to
and possibly what kind of lyrics are contained in the songs you listen to.

This is what this project deals with, by ingesting data daily from Spotify's API(no SDK used) and orchestrating the data pipeline with Apache Airflow. This Airflow server is to be deployed on AWS EC2, terraform is the IaaC tool used to setup our AWS resources.

Architecture

Database ER Diagram

Directory Tree

assets (Images and draw.io diagrams) ├── dags │ └── spotify │ ├── spotify-dag.py (file where dags are saved) ├── docker-compose.yml (Docker Compose file to setup airflow) ├── Dockerfile.airflow (Dockerfile for airflow) ├── LICENSE ├── logs (Airflow logs folder) ├── Makefile ├── notebooks │ ├── listens.csv │ ├── Testing API.ipynb │ └── Test Writer.ipynb ├── plugins (Airflow plugins folder) ├── README.md ├── requirements.txt (Project dependency not present in the airflow docker image) ├── sql_scripts (SQL scripts used) │ └── create_tables.sql └── terraform (terraform files)

Terraform

Ensure to create a backend.conf file if you choose to change the backend used, an example of the backend.conf is given below.

bucket  = "my-buket-name"
key     = "my-key-name"
region  = "my-region"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spotify Airflow Batch Ingestion Project

Tools & Tech Stack: Python, Apache Airflow, Terraform, AWS

Architecture

Database ER Diagram

Directory Tree

Terraform

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.github/workflows		.github/workflows
assets		assets
dags/spotify		dags/spotify
sql_scripts		sql_scripts
terraform		terraform
.gitignore		.gitignore
Dockerfile.airflow		Dockerfile.airflow
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

License

temmyzeus/spotify_airflow_batch_ingestion

Folders and files

Latest commit

History

Repository files navigation

Spotify Airflow Batch Ingestion Project

Tools & Tech Stack: Python, Apache Airflow, Terraform, AWS

Architecture

Database ER Diagram

Directory Tree

Terraform

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages