Skip to content

Analysing movies on iTunes and other open movie databases

Notifications You must be signed in to change notification settings

zacharyang/movies-project

Repository files navigation

Analysing Movies

In this project, I will be analysing movies on iTunes and other open movie databases as part of my capstone project at under a data science course at General Assembly.

Should I see this movie in the cinema or on iTunes?

Deer Popcorn

I watch a lot of movies but I can't always afford to go to the movies and Netflix doesn't always have a good selection of the latest movies. So I've been watching more on Video On Demand (VOD) platforms such as iTunes, Amazon Prime Video or Hooq.

So is it really worth it for me to wait for the DVD release? How can I predict the price a movie will be listed on iTunes?

For a VOD provider, how can I predict the price of a movie of that a rival VOD provider, like iTunes, would list at?

Data Collection

The first step in my project is to collect a bunch of data. You can access the movie data here. I've gotten data from:

  • iTunes RSS feed - Supplying the latest and hottest movies on the iTunes store across different countries.
  • iTunes Search API - Allows us to search through the iTunes store by movie or movie ID, return data such as the listing price on the iTunes store
  • Open Movie Database API (OMdb) - Contains information on movies such as the IMdb ID, IMdb Rating, Cast, Directors etc.
  • The Movie Database API (TMdb) - Similar kind of thing to OMdb, but less complete information and supports more queries per minute.
  • Box Office Mojo - Details on box office earnings from tons of movies over the years.

If you'd like to follow my data collection process, you can clone my repository and run the python collect_data.py in the cloned local repository. However, before you do that, you'll need to do a few things:

  1. Obtain the OMdb and TMdb API keys from their websites.
  2. Create a file in your local repo, private.py that contains two keys as TMDB_API_KEY="yourkeyhere" and OMDB_API_KEY='yourkeyhere'.
  3. Tweak settings.py to set up your data collection parameters.
  4. Run the collect_data.py script!

Exploration

Over the next few weeks, I'll be exploring the data and deciding on how I'd like to model the data. Check out my exploration on the iTunes movie dataset and on the OMdb movie dataset.

Check back for more updates in this repository.

Model Training and Selection

Check back here for updates on my progress for modelling. I'll be updating this section as I learn more about new machine learning models every week.

Prediction results

Check back soon for updates to whether I can successfully predict the prices of movies on the iTunes store!

Feedback

Your feedback on my data collection, exploration and modelling are much appreciated! Get in touch with me on LinkedIn or on my blog

About

Analysing movies on iTunes and other open movie databases

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published