SONG PLAY ANALYSIS WAREHOUSE

Propose of the project

This project has been developed to shift from data warehouse to data lake due to the tremendous growth of sparkify data

About datasets

The songs data & log data are in JSON format that are stored in Amazon S3

Tech stack

Python
Spark
Amazon RDS
Notion for project management
Github for Version control

Usage manual

Create dwh.cfg file to store aws credentials
Execute etl.py to transform the json file stored in aws to aws parquet, which will be stored in S3

Files description

The project consists of a single file names etl.py, which is responsible for transformation of semi-structured json files. The project also include etl.ipynb which was used for testing purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.vscode		.vscode
.gitignore		.gitignore
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
etl.ipynb		etl.ipynb
etl.py		etl.py
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SONG PLAY ANALYSIS WAREHOUSE

Propose of the project

About datasets

Tech stack

Usage manual

Files description

About

Releases

Packages

Languages

yugyesh/song_play_analysis_data_lake

Folders and files

Latest commit

History

Repository files navigation

SONG PLAY ANALYSIS WAREHOUSE

Propose of the project

About datasets

Tech stack

Usage manual

Files description

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages