## **This is a Jupyter Notebook Showing How we Obtained the IMDB IDs of Best Animated Feature Nominees**

To determine the Imdb ids of Best Animated Feature Nominees we first read in a text file corresponding to the titles of the nominees taken directly from the academy award database **(list_of_nominees.txt)**. The url of the nominee list is given in the **award_url.txt** file located in the Data folder.

In [1]:
# importing packages

import numpy as np
import pandas as pd
from imdb import Cinemagoer


In [2]:
with open("../Data/list_of_nominees.txt", "r") as file:
    nominees = file.read()
    nominees = nominees.split('\n')


In [3]:
# sample nominee 
nominees[3]

'Monsters, Inc.'

Next we read in the imdb ids corresponding to animated films released from 2001 to 2021 **(animated_films_ids1.csv)**. This is the window of time that the Best Animated Feature Academy Award began.

In [6]:
ids = pd.read_csv('../Data/df_of_animated_movies.csv',
                           dtype = {'movie_id':'str'})

In [7]:
ids.head()

Unnamed: 0,movie_id
0,388130
1,273772
2,243017
3,291559
4,277909


In [8]:
imdb_ids = ids['movie_id'].values.tolist() # change ids to list

Now that we have a dataframe of the animated films ids and a list of nominee names we need to convert the nominee names to imdb ids. This can be accomplished with the Cinemagoer object from the Cinemagoer package. Info about the package is available here: https://cinemagoer.github.io/

In [9]:
# for loop to create a list of nominee ids and an array of 0's and 1's
ia = Cinemagoer()
is_nominee = np.zeros((len(imdb_ids),),dtype=int)
nominee_ids = []


for nom in nominees:
    movie = ia.search_movie(nom)
    if movie: # checking if search returned something
        movie = movie[0] # get first result returned by search_movie
        #print(movie)
        movie_id = movie.getID()
        if movie_id in imdb_ids:
            idx = imdb_ids.index(movie_id)
            is_nominee[idx] = 1
            nominee_ids.append(movie.getID())
            print(nom,' is a nominee!')
            

Jimmy Neutron: Boy Genius  is a nominee!
Monsters, Inc.  is a nominee!
Shrek  is a nominee!
Ice Age  is a nominee!
Lilo & Stitch  is a nominee!
Spirit: Stallion of the Cimarron  is a nominee!
Spirited Away  is a nominee!
Treasure Planet  is a nominee!
Brother Bear  is a nominee!
Finding Nemo  is a nominee!
The Triplets of Belleville  is a nominee!
The Incredibles  is a nominee!
Shark Tale  is a nominee!
Shrek 2  is a nominee!
Howl's Moving Castle  is a nominee!
Corpse Bride  is a nominee!
Wallace & Gromit in The Curse of the Were-Rabbit  is a nominee!
Cars  is a nominee!
Happy Feet  is a nominee!
Monster House  is a nominee!
Persepolis  is a nominee!
Ratatouille  is a nominee!
Surf's Up  is a nominee!
Bolt  is a nominee!
Kung Fu Panda  is a nominee!
WALL-E  is a nominee!
Coraline  is a nominee!
Fantastic Mr. Fox  is a nominee!
The Princess and the Frog  is a nominee!
The Secret of Kells  is a nominee!
Up  is a nominee!
How to Train Your Dragon  is a nominee!
The Illusionist (2010)  is 

In [10]:
# There are 89 nominees so len(nominee_ids) = 89 
len(nominee_ids)


89

In [11]:
# There should be 89 1's in is_nominee so its sum should be 89
is_nominee.sum()

89

In [13]:
# create a dataframe for nominees
nominee_df = pd.DataFrame({'imdb_id': ids['movie_id'], 'is_nominee': is_nominee})

In [14]:
nominee_df.loc[nominee_df['is_nominee'] == 1]

Unnamed: 0,imdb_id,is_nominee
15,0126029,1
26,0245429,1
47,0198781,1
51,0268397,1
74,0268380,1
...,...,...
3632,5109280,1
3656,7979580,1
3702,12801262,1
3829,2953050,1


In [15]:
# output dataframe to csv 
nominee_df.to_csv('../Data/df_of_nominees.csv',index=False)

In [16]:
# check to make sure df is correctly reading in 
nominee_df_test = pd.read_csv('../Data/df_of_nominees.csv',
                dtype = {'imdb_id':'str', 'is_nominee':'int'})

In [20]:
check = nominee_df_test == nominee_df

In [21]:
check.value_counts()

imdb_id  is_nominee
True     True          3901
dtype: int64