Skip to content

waddahmoghram/BigDataNetflixProject2018

Repository files navigation

Netflix Final Project

Rebecca Buerger, Jack Ewert, Nicholas Litterio, Waddah Moghram, Dhruv Vyas, Yuchen Zhang

Abstract

Over the last decade, Netflix has cemented its position as a leading streaming media provider. To maintain its dominance, Netflix has commissioned a one-million-dollar prize in 2009 for the code that improved rating predictions for previously collected real-life customer. Our team has been tasked with the same task for two months. As of the end of the first month, we successfully read and visualized the original dataset provided by Netflix and identified strategies to proceed with the project. Some of these strategies included K-means clustering and Pearsons’ R correlation. Approaching the conclusion of our project, we were able to supplement about 60% of the movie titles with IMDB online database. In addition, we included some time-series analysis of movie and user trends. This paper has been submitted as part of a class entitled Big Data Analytics (IE:4172) on December 7, 2018.

Please refer to the included report in BigDataNetFlixProjectFinalReport_Group_3_Alpaca.pdf for more details

Please note that some data files could not be uploaded to GitHub due to the size limit of 250 MB allowed by server. However, these files are available by request and the result-producing code can be obtained by running the existing data files and source files if that the needed python libraries are installed properly.

Update: the GitHub repository URL mentioned in the report is no longer available. The updated URL is: https://github.com/waddahmoghram/BigDataNetflixProject2018

About

Netflix Project for IE 4172 Big Data Analytics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published