This repository demonstrates how to build a data warehouse using Google BigQuery and dbt, with a star schema design inspired by Kimball’s Dimensional Modeling. It showcases an end-to-end pipeline—from raw data ingestion to a curated dimensional model—focused on the Sales Orders domain from the Wide World Importers sample dataset.
This project demonstrates core data warehousing skills and best practices:
- Dimensional Modeling using facts and dimensions.
- End-to-End Pipeline using dbt for data transformation.
- Data Quality Checks leveraging dbt’s testing framework.
The final output is a star schema centered around the fact_sales_order table, with related dimension tables such as dim_customer, dim_person, dim_product, and more.
-
Kimball Dimensional Modeling
Designed conformed dimensions (e.g., Customer, Product, Person) and fact tables (Sales Orders) to support analytical queries. -
dbt (Data Build Tool)
- Automates SQL transformations.
- Manages dependencies between staging, intermediate, and analytics layers via
{{ ref('...') }}. - Simplifies documentation and testing.
-
Data Quality & Tests
- Utilizes dbt’s built-in testing features to enforce not null, unique, and referential integrity constraints.
- Supports custom tests for additional data quality checks.
-
Deployment on BigQuery
- Leverages BigQuery’s scalable infrastructure for efficient data processing.
- Integrates easily with Google Cloud Storage for raw data ingestion.
Below is a high-level star schema diagram of the Sales Orders model:
Fact Table
fact_sales_order– Contains metrics and foreign keys to related dimensions (e.g.,customer_key,product_key).
Dimension Tables
dim_customer– Contains customer attributes (e.g., name, credit limit, address).dim_person– Contains details about people involved (e.g., salespersons, contact persons).dim_product– Contains product attributes (e.g., brand, supplier, category).dim_category,dim_supplier,dim_color, etc. – Additional dimensions for product hierarchy and reference data.dim_date– Provides date-related attributes for time-based analysis.dim_sales_order_line_indicator– A junk dimension used to store miscellaneous sales order line indicators.dim_customer_attribute– Aggregates customer segmentation attributes from the fact sales order line data.- This dimension calculates key performance metrics (e.g., lifetime monetary amount, lifetime sales order quantity, last month metrics) and assigns segmentation labels (High, Medium, Low) based on percentile ranks.
-
Diagrams:
Find the project diagrams in thediagramsfolder. -
Data Models:
The data models are located in themodels/analyticsfolder. -
Data Quality Tests:
Explore the tests in themodels/analytics/testssfolder.
-
Raw Dataset:
Wide World Importers -
Database:
BigQuery -
Course:
Data Warehouse by Vitlamdata
- Kimball Dimensional Modeling:
The Data Warehouse Toolkit by Ralph Kimball. - dbt Documentation:
dbt (Data Build Tool) - Vitlamdata Course:
Data Warehouse by Vitlamdata
