"Alexa, can you handle big data?"

Background

Amazon's shoppers depend on product reviews when making a purchase. While Amazon makes these datasets publicly available, it can exceed the capacity of average local machines to handle the data that has over 1.5 million rows; with over 40 datasets. The goal is to perform the ETL process in the cloud and upload a DataFrame to an RDS instance, and to perform a statistical analysis of selected data.

Instructions

Level 1

Use the furnished schema to create tables in RDS database.
Create two separate Google Colab notebooks.
Extract two datasets from the Review Dataset, and count the number of reviews.
Transform the dataset to fit the tables in the Schema File.
Load the DataFrames that correspond to tables into an RDS instance.

Level 2

Investigate whether Amazon's Vine Customer Reviews are free of bias by using either PySpark or SQL to analyze the data.

Number of records (number of vine reviews): 1. Video Games: 1,785,886 (4,290) 2. Software: 341,913 (10,415)
Despite the number of records for the video games were higher than that of the software, the total Vine reviews for video games was lower than the software. The result analysis indicates that the Vine reviewers tend to provide high star ratings (4 or 5) while there are limited records that are labled as "helpful votes". It can be concluded that the Amazon's Vine program may not always be trustworthy.

Further Considerations

Delete RDS password and endpoint when uploading the Jupyter Notebooks: Level 1 and Level 2
Make sure to clean up all AWS resources by referring to AWS cleanup guide and AWS check billing guide

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
Images		Images
Resources		Resources
level-1		level-1
level-2		level-2
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

"Alexa, can you handle big data?"

Background

Instructions

Level 1

Level 2

Further Considerations

About

Releases

Packages

Languages

toshitorihara/big-data-challenge

Folders and files

Latest commit

History

Repository files navigation

"Alexa, can you handle big data?"

Background

Instructions

Level 1

Level 2

Further Considerations

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages