⚠️ Disclaimer:
This repository is not recommended for use in academic or production environments.
It contains several bad practices used solely to meet coursework constraints and work within the AWS Free Tier limits.
Design and deploy a cloud-native ETL/ELT pipeline using only free-tier AWS components. This project serves both as a university coursework and a pet-project.
- Identifying a domain of interest.
- Creating
.csv
dataset related to the chosen domain. - Ingesting the datasets into an OLTP database using ETL/ELT processes.
- Transfer the data from OLTP into a DWH using ETL/ELT for further querying.
- Performing data analysis and generating visualizations based on the data from DWH.
- Terraform – Infrastructure as Code (IaC)
- AWS – S3, Lambda, RDS (OLTP and OLAP), IAM, Amazon QuickSight
- SQL – For schema creation and ETL logic
- GitHub – Project repository and version control
.
├── terraform/
│ ├── main.tf
│ ├── etl2.sql.tpl
│ ├── variables.tf
│ ├── outputs.tf
│ └── modules/
│ ├── s3/
│ ├── vpc/
│ ├── ec2-basion/
│ ├── rds_oltp/
│ ├── rds_olap/
│ └── lambda/
├── sql/
│ ├── init-olap-tables.sql
│ ├── init-oltp-tables.sql
│ ├── etl1.sql
│ ├── to_show_etl2.sql
│ └── selects
│ ├── olap/
│ │ └── selects.sql
│ └── oltp/
│ └── selects.sql
├── lambda/
│ ├── lambda-to-oltp/
│ │ └── lambda_function.py
│ └── lambda-to-oltp/
│ └── lambda_function.py
├── data/
│ └── *.csv
└── README.md
The OLTP database is based on a simplified version of the Chinook schema, fully normalized (3NF), and designed for efficient transactional operations.
Included Tables:
Artist
— stores artist metadataAlbum
— each album belongs to one artistTrack
— each track belongs to an album and contains pricing/media infoGenre
— genre classification for tracksMediaType
— file format metadataCustomer
— customer info (location, contact, etc.)Invoice
— purchases made by customersInvoiceLine
— line items per invoice (track purchases)
Notes:
- Some unused tables and attributes were removed to reduce complexity
- The schema preserves referential integrity with foreign keys
- Suitable for ingestion via CSV and Lambda
For analytical workloads, data is transformed into a star schema within the OLAP database. It supports efficient aggregation and filtering for BI tools.
Fact Tables:
fact_sales
— transactional grain, representing each track salefact_invoice_summary
— invoice-level aggregates
Dimension Tables:
dim_customer
— slowly changing dimension (Type 2)dim_track
— track details including artist, album, genredim_time
— calendar dimension (day, month, quarter, year)
Bridge Table:
bridge_invoice_track
— resolves many-to-many betweeninvoice
andtrack
This OLAP schema powers Amazon QuickSight dashboards through custom SQL queries.
This section showcases final visualizations built using Amazon QuickSight connected to the OLAP database.
All dashboards use custom SQL datasets, include interactive filters.