Skip to content
Branch: master
Find file History
Latest commit e3656d1 Oct 30, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information. lab imgs Oct 30, 2019
genre_count.png p5-p8 and labs Sep 20, 2019
mapping.csv p5-p8 and labs Sep 20, 2019
movies.csv p5-p8 and labs Sep 20, 2019 p5-p8 and labs Sep 20, 2019
small_mapping.csv p5-p8 and labs Sep 20, 2019
small_movies.csv typos Oct 29, 2019 lab imgs Oct 30, 2019

Project 8: Going to the Movies


  • Oct 27: we cover plotting in the labs on Oct 31 or Nov 1, so just leave the relevant questions until Lab-P8b is released.
  • Oct 29: fixed typo in hint for Q16
  • Oct 29: fixed hint for Q29 and Q30 to include axis label


Having worked our way through soccer and hurricanes, we are now going to work on the IMDB Movies Dataset. A very exciting fortnight lies ahead where we find out some cool facts about our favorite movies, actors, and directors.

You'll hand in a main.ipynb file for this project; use the usual #qN format. Start by downloading the following files:, small_mapping.csv, small_movies.csv, mapping.csv, and movies.csv.

The Data

By stage 2, you will be mostly working mainly with movies.csv and mapping.csv. The small_movies.csv and small_mapping.csv have been provided to help you get your core logic working in stage 1 with some simpler data.

small_movies.csv and movies.csv have 6 columns: title, year, rating, directors, actors, and genres

Here are a few rows from movies.csv:


small_mapping.csv and mapping.csv have 2 columns: id and name

Here are a few rows from mapping.csv:

nm0000001,Fred Astaire
nm0000004,John Belushi
nm0000007,Humphrey Bogart
tt0110997,The River Wild

Each of those weird alphanumeric sequence is a unique identifier for either an actor or a director or a movie title.

The Stages

This project is bigger than usual, so its broken into two parts, and you have more time. We recommend trying to complete stage 1 within one week so you have time for stage two.

  • Stage 1: combine the data from the movie and mapping files into a more useful format.
  • Stage 2: use the combined data to answer questions about movies, directors, and actors.
You can’t perform that action at this time.