Analysing Movies

In this project, I will be analysing movies on iTunes and other open movie databases as part of my capstone project at under a data science course at General Assembly.

Should I see this movie in the cinema or on iTunes?

I watch a lot of movies but I can't always afford to go to the movies and Netflix doesn't always have a good selection of the latest movies. So I've been watching more on Video On Demand (VOD) platforms such as iTunes, Amazon Prime Video or Hooq.

So is it really worth it for me to wait for the DVD release? How can I predict the price a movie will be listed on iTunes?

For a VOD provider, how can I predict the price of a movie of that a rival VOD provider, like iTunes, would list at?

Data Collection

The first step in my project is to collect a bunch of data. You can access the movie data here. I've gotten data from:

iTunes RSS feed - Supplying the latest and hottest movies on the iTunes store across different countries.
iTunes Search API - Allows us to search through the iTunes store by movie or movie ID, return data such as the listing price on the iTunes store
Open Movie Database API (OMdb) - Contains information on movies such as the IMdb ID, IMdb Rating, Cast, Directors etc.
The Movie Database API (TMdb) - Similar kind of thing to OMdb, but less complete information and supports more queries per minute.
Box Office Mojo - Details on box office earnings from tons of movies over the years.

If you'd like to follow my data collection process, you can clone my repository and run the python collect_data.py in the cloned local repository. However, before you do that, you'll need to do a few things:

Obtain the OMdb and TMdb API keys from their websites.
Create a file in your local repo, private.py that contains two keys as TMDB_API_KEY="yourkeyhere" and OMDB_API_KEY='yourkeyhere'.
Tweak settings.py to set up your data collection parameters.
Run the collect_data.py script!

Exploration

Over the next few weeks, I'll be exploring the data and deciding on how I'd like to model the data. Check out my exploration on the iTunes movie dataset and on the OMdb movie dataset.

Check back for more updates in this repository.

Model Training and Selection

Check back here for updates on my progress for modelling. I'll be updating this section as I learn more about new machine learning models every week.

Prediction results

Check back soon for updates to whether I can successfully predict the prices of movies on the iTunes store!

Feedback

Your feedback on my data collection, exploration and modelling are much appreciated! Get in touch with me on LinkedIn or on my blog

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
__pycache__		__pycache__
data		data
data_exploration		data_exploration
images		images
#.gitignore#		#.gitignore#
.gitignore		.gitignore
README.md		README.md
collect_box_office.py		collect_box_office.py
collect_data.py		collect_data.py
settings.py		settings.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analysing Movies

Should I see this movie in the cinema or on iTunes?

Data Collection

Exploration

Model Training and Selection

Prediction results

Feedback

About

Releases

Packages

Languages

zacharyang/movies-project

Folders and files

Latest commit

History

Repository files navigation

Analysing Movies

Should I see this movie in the cinema or on iTunes?

Data Collection

Exploration

Model Training and Selection

Prediction results

Feedback

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages