Skip to content

erdemah/NanoDegree_DataEngineering_DataModelling_Postgres

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This project includes data modelling and an ETL (Extract, Transform, Load) job for song data and log data. The purpose of the project is to analyze data easily coming from the new music streaming app called Sparkify. Any insight from such analyses is important to businesses which can help them understand their customers better, provide better services. To make data analysis and reporting easier, the data are pipelined through a Postgres database. During this job, the Postgres database model has been created using a star schema with a total of 5 tables - 1 Fact, 4 dimensional tables.

sql_queries.py: PostgreSQL drop, create, insert tables. Data modelling occurs here.
create_tables.py: A wrapper built on top sql_queries.py to make the database console commands easier.
test.ipynb: Every time you manipulate the database, run this file to check if the code is doing what is supposed to be doing.Restart Jupyter Kernel for cleanup in every run.
etl.ipynb: A prototype of etl.py
etl.py: Data is extracted, transformed and loaded into PostgreSQL database.

Build Environment: Python 3.7
PostgreSQL 12.2

About

Udacity Nano Degree Data Engineering Program

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published