Skip to content

yugyesh/song_play_analysis_data_lake

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SONG PLAY ANALYSIS WAREHOUSE

Propose of the project

This project has been developed to shift from data warehouse to data lake due to the tremendous growth of sparkify data

About datasets

The songs data & log data are in JSON format that are stored in Amazon S3

Tech stack

  • Python
  • Spark
  • Amazon RDS
  • Notion for project management
  • Github for Version control

Usage manual

  • Create dwh.cfg file to store aws credentials
  • Execute etl.py to transform the json file stored in aws to aws parquet, which will be stored in S3

Files description

The project consists of a single file names etl.py, which is responsible for transformation of semi-structured json files. The project also include etl.ipynb which was used for testing purposes.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published