In this project, I will be analysing movies on iTunes and other open movie databases as part of my capstone project at under a data science course at General Assembly.
I watch a lot of movies but I can't always afford to go to the movies and Netflix doesn't always have a good selection of the latest movies. So I've been watching more on Video On Demand (VOD) platforms such as iTunes, Amazon Prime Video or Hooq.
So is it really worth it for me to wait for the DVD release? How can I predict the price a movie will be listed on iTunes?
For a VOD provider, how can I predict the price of a movie of that a rival VOD provider, like iTunes, would list at?
The first step in my project is to collect a bunch of data. You can access the movie data here. I've gotten data from:
- iTunes RSS feed - Supplying the latest and hottest movies on the iTunes store across different countries.
- iTunes Search API - Allows us to search through the iTunes store by movie or movie ID, return data such as the listing price on the iTunes store
- Open Movie Database API (OMdb) - Contains information on movies such as the IMdb ID, IMdb Rating, Cast, Directors etc.
- The Movie Database API (TMdb) - Similar kind of thing to OMdb, but less complete information and supports more queries per minute.
- Box Office Mojo - Details on box office earnings from tons of movies over the years.
If you'd like to follow my data collection process, you can clone my repository and run the python collect_data.py
in the cloned local repository. However, before you do that, you'll need to do a few things:
- Obtain the OMdb and TMdb API keys from their websites.
- Create a file in your local repo,
private.py
that contains two keys asTMDB_API_KEY="yourkeyhere"
andOMDB_API_KEY='yourkeyhere'
. - Tweak
settings.py
to set up your data collection parameters. - Run the
collect_data.py
script!
Over the next few weeks, I'll be exploring the data and deciding on how I'd like to model the data. Check out my exploration on the iTunes movie dataset and on the OMdb movie dataset.
Check back for more updates in this repository.
Check back here for updates on my progress for modelling. I'll be updating this section as I learn more about new machine learning models every week.
Check back soon for updates to whether I can successfully predict the prices of movies on the iTunes store!
Your feedback on my data collection, exploration and modelling are much appreciated! Get in touch with me on LinkedIn or on my blog