GitHub - timshin43/data-modeling-rdb: RDB data modeling project for Udacity

Database Description

The database sparkifydb is designed to keep track of what songs users play. The database is build as a star schema and consists of the following tables:

songplays (Fact Table) - records associated with song plays, users and artists etc.
users (Dimension Table) - users in the app
songs (Dimension Table) - songs in music database
artists (Dimension Table) - artists in music database
time (Dimension Table) - timestamps of records in songplays broken down into specific units The database sparkifydb allows to query all neccesary data about users, songs, artists, duration etc. and build reports based on it

How to run Python scripts

In order to create the database and launch the ETL process you need to run 2 py. scripts in the follwing order:

create_tables.py
etl.py To run a python (py.) script open a terminal (File->New->Terminal) and make sure you are in the right directory by using a command ls. If you are in the directory with the script that you want to run then type Python3 create_tables.py or Python3 etl.py

Files in the repo

test.ipynb displays the first few rows of each table to let you check your database.
create_tables.py drops and creates your tables. You run this file to reset your tables before each time you run your ETL scripts.
etl.ipynb reads and processes a single file from song_data and log_data and loads the data into your tables. This notebook contains detailed instructions on the ETL process for each of the tables.
etl.py reads and processes files from song_data and log_data and loads them into your tables.
sql_queries.py contains all your sql queries, and is imported into the last three files above.
README.md provides discussion on your project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Database Description

How to run Python scripts

Files in the repo

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
data		data
.DS_Store		.DS_Store
.workspace-config.json		.workspace-config.json
README.md		README.md
create_tables.py		create_tables.py
etl.ipynb		etl.ipynb
etl.py		etl.py
sql_queries.py		sql_queries.py
test.ipynb		test.ipynb

timshin43/data-modeling-rdb

Folders and files

Latest commit

History

Repository files navigation

Database Description

How to run Python scripts

Files in the repo

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages