This repository contains the development process and artifacts for building a data warehouse. The data warehouse is designed to store and manage data related to customers, channels, products, dates, purchase history, and visit history.
The project is organized as follows:
models/: Contains YAML files defining the data model including dimensions and fact tables.README.md: This README file providing an overview of the project.- Other relevant files and documentation.
The data model consists of the following components:
- dim_customers: Contains details of all customers including anonymous users who used guest checkout.
- dim_channels: Stores data related to different channels.
- dim_products: Holds information about products.
- dim_date: Stores date-related data.
- fct_purchase_history: Records customers' order history.
- fct_visit_history: Tracks customers' visit history.
- Model Definition: Define the data model including dimensions and fact tables in YAML format under the
models/directory. - Validation: Validate the data model for correctness, completeness, and consistency.
- Implementation: Implement the data model in the chosen database system, ensuring proper schema design and indexing for performance.
- Data Loading: Load data into the warehouse from various sources such as transactional databases, CSV files, APIs, etc.
- Testing: Test the data warehouse to ensure data integrity, accuracy, and performance.
- Documentation: Document the data warehouse schema, ETL processes, and any other relevant information.
- Deployment: Deploy the data warehouse to the production environment.
Contributions to the development process and improvement of the data warehouse are welcome. Please fork the repository, make your changes, and submit a pull request.
This project is licensed under the MIT License.