Project: Data Pipeline

Building an ETL pipeline that extracts their data from S3, stages them in Redshift, and transforms data into a set of dimensional tables. This allows Data Scientists to continue finding insights from the data stored in the Data warehouse.

In this project we are going to use two Amazon Web Services resources:

ETL

Data Pipeline design: At a high-level the pipeline does the following tasks.

Extract data from multiple S3 locations.
Load the data into Redshift cluster.
Transform the data into a star schema.

Data Warehouse Schema Definition

This is the schema of the database

This is the schema of the data warehouse (star schema)

Project structure

The structure is:

create_tables.py - This script will drop old tables (if exist) ad re-create new tables
etl.py - This script orchestrate ETL.
sql_queries.py - This is the ETL. All the transformatios in SQL are done here.
/img - Directory with images that are used in this markdown document

We need an extra file with the credentials an information about AWS resources named dhw.cfg

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
img		img
README.md		README.md
create_tables.py		create_tables.py
dwh.cfg		dwh.cfg
etl.py		etl.py
sql_queries.py		sql_queries.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project: Data Pipeline

ETL

Data Warehouse Schema Definition

Project structure

About

Releases

Packages

Languages

ysfesr/Data_Pipeline

Folders and files

Latest commit

History

Repository files navigation

Project: Data Pipeline

ETL

Data Warehouse Schema Definition

Project structure

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages