## Preparing the Oscar Dataset

Please find the attached dataset with the csv files (mentioned below) and the Final Dataset (in progress to reach the *perfect* dataset, I hope).

data_csv -> https://datahub.io/rufuspollock/oscars-nominees-and-winners <br>
ratings_movieslens, movies_movieslens ->  http://files.grouplens.org/datasets/movielens/ml-latest-README.html<br>
title.ratings, title.basics -> https://www.imdb.com/interfaces/<br>
https://archive.ics.uci.edu/ml/datasets/Movie<br>
https://www.imdb.com/awards-central/?ref_=nv_sr_1?ref_=nv_sr_1<br>
We also used data from iMDb,rotten tomato,movielens and the which movie got awards like BAFTA,SAG,SA etc.

*Clean, cut and modify datasets, to get required dataset*

In [1]:
import pandas as pd
import numpy as np

In [2]:
oscar_data = pd.read_csv("data_csv.csv")

In [3]:
oscar_data.head()

Unnamed: 0,year,category,winner,entity
0,1927,ACTOR,False,Richard Barthelmess
1,1927,ACTOR,True,Emil Jannings
2,1927,ACTRESS,False,Louise Dresser
3,1927,ACTRESS,True,Janet Gaynor
4,1927,ACTRESS,False,Gloria Swanson


In [4]:
oscar_data.dtypes

year         int64
category    object
winner        bool
entity      object
dtype: object

In [5]:
oscar_data.category.unique()

array(['ACTOR', 'ACTRESS', 'ART DIRECTION', 'CINEMATOGRAPHY',
       'DIRECTING (Comedy Picture)', 'DIRECTING (Dramatic Picture)',
       'ENGINEERING EFFECTS', 'OUTSTANDING PICTURE',
       'UNIQUE AND ARTISTIC PICTURE', 'WRITING (Adaptation)',
       'WRITING (Original Story)', 'WRITING (Title Writing)',
       'SPECIAL AWARD', 'DIRECTING', 'WRITING', 'OUTSTANDING PRODUCTION',
       'SOUND RECORDING', 'SCIENTIFIC OR TECHNICAL AWARD (Class I)',
       'SCIENTIFIC OR TECHNICAL AWARD (Class II)',
       'SCIENTIFIC OR TECHNICAL AWARD (Class III)',
       'SHORT SUBJECT (Cartoon)', 'SHORT SUBJECT (Comedy)',
       'SHORT SUBJECT (Novelty)', 'ASSISTANT DIRECTOR', 'FILM EDITING',
       'MUSIC (Scoring)', 'MUSIC (Song)', 'DANCE DIRECTION',
       'WRITING (Screenplay)', 'ACTOR IN A SUPPORTING ROLE',
       'ACTRESS IN A SUPPORTING ROLE', 'SHORT SUBJECT (Color)',
       'SHORT SUBJECT (One-reel)', 'SHORT SUBJECT (Two-reel)',
       'IRVING G. THALBERG MEMORIAL AWARD', 'MUSIC (Original Scor

In [6]:
old_movies = oscar_data[oscar_data.category == 'BEST MOTION PICTURE']
new_movies = oscar_data[oscar_data.category == 'BEST PICTURE']
oscars = pd.concat([old_movies, new_movies])
oscars.category = 'BEST PICTURE'
oscars = oscars.reset_index(drop = True)

In [7]:
oscars['title'] = oscars['entity'] + " (" + oscars['year'].map(str) + ")"

In [8]:
oscars

Unnamed: 0,year,category,winner,entity,title
0,1944,BEST PICTURE,False,Double Indemnity,Double Indemnity (1944)
1,1944,BEST PICTURE,False,Gaslight,Gaslight (1944)
2,1944,BEST PICTURE,True,Going My Way,Going My Way (1944)
3,1944,BEST PICTURE,False,Since You Went Away,Since You Went Away (1944)
4,1944,BEST PICTURE,False,Wilson,Wilson (1944)
5,1945,BEST PICTURE,False,Anchors Aweigh,Anchors Aweigh (1945)
6,1945,BEST PICTURE,False,The Bells of St. Mary's,The Bells of St. Mary's (1945)
7,1945,BEST PICTURE,True,The Lost Weekend,The Lost Weekend (1945)
8,1945,BEST PICTURE,False,Mildred Pierce,Mildred Pierce (1945)
9,1945,BEST PICTURE,False,Spellbound,Spellbound (1945)


In [9]:
movie = pd.read_csv("movies_movieslens.csv")
movierating = pd.read_csv("ratings_movieslens.csv")

In [10]:
movie.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


In [11]:
movierating.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,307,3.5,1256677221
1,1,481,3.5,1256677456
2,1,1091,1.5,1256677471
3,1,1257,4.5,1256677460
4,1,1449,4.5,1256677264


In [12]:
movierating = movierating.drop("userId", axis=1)

In [13]:
movierating = pd.DataFrame(movierating.groupby("movieId")['rating'].mean())
movierating['movieId'] = movierating.index
movierating.index.names = ['index']

In [14]:
movierating.head()

Unnamed: 0_level_0,rating,movieId
index,Unnamed: 1_level_1,Unnamed: 2_level_1
1,3.886649,1
2,3.246583,2
3,3.173981,3
4,2.87454,4
5,3.077291,5


In [15]:
movie_and_rating = movie.merge(movierating, on = 'movieId', how = 'inner')
movie_and_rating.rating = round(movie_and_rating.rating, 2)
movie_and_rating.head()

Unnamed: 0,movieId,title,genres,rating
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,3.89
1,2,Jumanji (1995),Adventure|Children|Fantasy,3.25
2,3,Grumpier Old Men (1995),Comedy|Romance,3.17
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance,2.87
4,5,Father of the Bride Part II (1995),Comedy,3.08


In [16]:
oscars_and_movielens = oscars.merge(movie_and_rating, on = 'title', how = 'left')
oscars_and_movielens

Unnamed: 0,year,category,winner,entity,title,movieId,genres,rating
0,1944,BEST PICTURE,False,Double Indemnity,Double Indemnity (1944),3435.0,Crime|Drama|Film-Noir,4.20
1,1944,BEST PICTURE,False,Gaslight,Gaslight (1944),906.0,Drama|Thriller,4.03
2,1944,BEST PICTURE,True,Going My Way,Going My Way (1944),1937.0,Comedy|Drama|Musical,3.56
3,1944,BEST PICTURE,False,Since You Went Away,Since You Went Away (1944),9014.0,Drama|War,3.65
4,1944,BEST PICTURE,False,Wilson,Wilson (1944),32999.0,Drama,3.21
5,1945,BEST PICTURE,False,Anchors Aweigh,Anchors Aweigh (1945),3599.0,Comedy|Musical,3.56
6,1945,BEST PICTURE,False,The Bells of St. Mary's,The Bells of St. Mary's (1945),,,
7,1945,BEST PICTURE,True,The Lost Weekend,The Lost Weekend (1945),,,
8,1945,BEST PICTURE,False,Mildred Pierce,Mildred Pierce (1945),2612.0,Drama|Film-Noir,3.93
9,1945,BEST PICTURE,False,Spellbound,Spellbound (1945),931.0,Mystery|Romance|Thriller,3.89


In [17]:
imdb_title = pd.read_csv("title.basics.tsv", sep='\t')
imdb_rating = pd.read_csv("title.ratings.tsv", sep='\t')

In [18]:
imdb_title.head()

Unnamed: 0,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres
0,tt0000001,short,Carmencita,Carmencita,0,1894,\N,1,"Documentary,Short"
1,tt0000002,short,Le clown et ses chiens,Le clown et ses chiens,0,1892,\N,5,"Animation,Short"
2,tt0000003,short,Pauvre Pierrot,Pauvre Pierrot,0,1892,\N,4,"Animation,Comedy,Romance"
3,tt0000004,short,Un bon bock,Un bon bock,0,1892,\N,\N,"Animation,Short"
4,tt0000005,short,Blacksmith Scene,Blacksmith Scene,0,1893,\N,1,"Comedy,Short"


In [19]:
imdb_rating.head()

Unnamed: 0,tconst,averageRating,numVotes
0,tt0000001,5.8,1486
1,tt0000002,6.4,179
2,tt0000003,6.6,1119
3,tt0000004,6.4,109
4,tt0000005,6.2,1822


In [20]:
imdb_data = imdb_title.merge(imdb_rating, on = "tconst", how="left")

In [21]:
imdb_data['title'] = imdb_data['primaryTitle'] + " (" + imdb_data['startYear'].map(str) + ")"
temp1 = imdb_data[imdb_data['titleType'] == 'movie'] 
temp2 = imdb_data[imdb_data['titleType'] == 'tvMovie']
fin = pd.concat([temp1, temp2])

In [22]:
imdb_data.titleType.unique()

array(['short', 'movie', 'tvMovie', 'tvSeries', 'tvEpisode', 'tvShort',
       'tvMiniSeries', 'tvSpecial', 'video', 'videoGame'], dtype=object)

In [23]:
oscars_and_movielens_and_imdb = oscars_and_movielens.merge(fin, on = "title", how = 'left')

In [24]:
oscars_and_movielens_and_imdb.drop_duplicates(keep='first', inplace=True)

In [25]:
oscars_and_movielens_and_imdb = oscars_and_movielens_and_imdb.reset_index(drop = True)
oscars_and_movielens_and_imdb

Unnamed: 0,year,category,winner,entity,title,movieId,genres_x,rating,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres_y,averageRating,numVotes
0,1944,BEST PICTURE,False,Double Indemnity,Double Indemnity (1944),3435.0,Crime|Drama|Film-Noir,4.20,tt0036775,movie,Double Indemnity,Double Indemnity,0.0,1944,\N,107,"Crime,Drama,Film-Noir",8.3,125662.0
1,1944,BEST PICTURE,False,Gaslight,Gaslight (1944),906.0,Drama|Thriller,4.03,tt0036855,movie,Gaslight,Gaslight,0.0,1944,\N,114,"Crime,Drama,Mystery",7.8,20733.0
2,1944,BEST PICTURE,True,Going My Way,Going My Way (1944),1937.0,Comedy|Drama|Musical,3.56,tt0036872,movie,Going My Way,Going My Way,0.0,1944,\N,126,"Comedy,Drama,Music",7.2,9228.0
3,1944,BEST PICTURE,False,Since You Went Away,Since You Went Away (1944),9014.0,Drama|War,3.65,tt0037280,movie,Since You Went Away,Since You Went Away,0.0,1944,\N,177,"Drama,Romance,War",7.6,3879.0
4,1944,BEST PICTURE,False,Wilson,Wilson (1944),32999.0,Drama,3.21,tt0037465,movie,Wilson,Wilson,0.0,1944,\N,154,"Biography,Drama,History",6.8,1314.0
5,1945,BEST PICTURE,False,Anchors Aweigh,Anchors Aweigh (1945),3599.0,Comedy|Musical,3.56,tt0037514,movie,Anchors Aweigh,Anchors Aweigh,0.0,1945,\N,140,"Comedy,Fantasy,Music",7.2,7177.0
6,1945,BEST PICTURE,False,The Bells of St. Mary's,The Bells of St. Mary's (1945),,,,tt0037536,movie,The Bells of St. Mary's,The Bells of St. Mary's,0.0,1945,\N,126,Drama,7.4,6375.0
7,1945,BEST PICTURE,True,The Lost Weekend,The Lost Weekend (1945),,,,tt0037884,movie,The Lost Weekend,The Lost Weekend,0.0,1945,\N,101,"Drama,Film-Noir",8.0,30176.0
8,1945,BEST PICTURE,False,Mildred Pierce,Mildred Pierce (1945),2612.0,Drama|Film-Noir,3.93,tt0037913,movie,Mildred Pierce,Mildred Pierce,0.0,1945,\N,111,"Crime,Drama,Film-Noir",8.0,19982.0
9,1945,BEST PICTURE,False,Spellbound,Spellbound (1945),931.0,Mystery|Romance|Thriller,3.89,tt0038109,movie,Spellbound,Spellbound,0.0,1945,\N,111,"Film-Noir,Mystery,Romance",7.6,39065.0


In [26]:
oscars_and_movielens_and_imdb.drop('title',1)

Unnamed: 0,year,category,winner,entity,movieId,genres_x,rating,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres_y,averageRating,numVotes
0,1944,BEST PICTURE,False,Double Indemnity,3435.0,Crime|Drama|Film-Noir,4.20,tt0036775,movie,Double Indemnity,Double Indemnity,0.0,1944,\N,107,"Crime,Drama,Film-Noir",8.3,125662.0
1,1944,BEST PICTURE,False,Gaslight,906.0,Drama|Thriller,4.03,tt0036855,movie,Gaslight,Gaslight,0.0,1944,\N,114,"Crime,Drama,Mystery",7.8,20733.0
2,1944,BEST PICTURE,True,Going My Way,1937.0,Comedy|Drama|Musical,3.56,tt0036872,movie,Going My Way,Going My Way,0.0,1944,\N,126,"Comedy,Drama,Music",7.2,9228.0
3,1944,BEST PICTURE,False,Since You Went Away,9014.0,Drama|War,3.65,tt0037280,movie,Since You Went Away,Since You Went Away,0.0,1944,\N,177,"Drama,Romance,War",7.6,3879.0
4,1944,BEST PICTURE,False,Wilson,32999.0,Drama,3.21,tt0037465,movie,Wilson,Wilson,0.0,1944,\N,154,"Biography,Drama,History",6.8,1314.0
5,1945,BEST PICTURE,False,Anchors Aweigh,3599.0,Comedy|Musical,3.56,tt0037514,movie,Anchors Aweigh,Anchors Aweigh,0.0,1945,\N,140,"Comedy,Fantasy,Music",7.2,7177.0
6,1945,BEST PICTURE,False,The Bells of St. Mary's,,,,tt0037536,movie,The Bells of St. Mary's,The Bells of St. Mary's,0.0,1945,\N,126,Drama,7.4,6375.0
7,1945,BEST PICTURE,True,The Lost Weekend,,,,tt0037884,movie,The Lost Weekend,The Lost Weekend,0.0,1945,\N,101,"Drama,Film-Noir",8.0,30176.0
8,1945,BEST PICTURE,False,Mildred Pierce,2612.0,Drama|Film-Noir,3.93,tt0037913,movie,Mildred Pierce,Mildred Pierce,0.0,1945,\N,111,"Crime,Drama,Film-Noir",8.0,19982.0
9,1945,BEST PICTURE,False,Spellbound,931.0,Mystery|Romance|Thriller,3.89,tt0038109,movie,Spellbound,Spellbound,0.0,1945,\N,111,"Film-Noir,Mystery,Romance",7.6,39065.0


In [27]:
final = oscars_and_movielens_and_imdb[['year', 'category', 'winner', 'entity', 'genres_x', 'rating', 'runtimeMinutes',
                                      'averageRating', 'numVotes']]
final.columns = ['year', 'category', 'winner', 'title', 'genres', 'MovieLensrating',
                 'runtimeMinutes', 'IMDBRating', 'numVotes']
final.to_csv("Final_Dataset.csv")
final

Unnamed: 0,year,category,winner,title,genres,MovieLensrating,runtimeMinutes,IMDBRating,numVotes
0,1944,BEST PICTURE,False,Double Indemnity,Crime|Drama|Film-Noir,4.20,107,8.3,125662.0
1,1944,BEST PICTURE,False,Gaslight,Drama|Thriller,4.03,114,7.8,20733.0
2,1944,BEST PICTURE,True,Going My Way,Comedy|Drama|Musical,3.56,126,7.2,9228.0
3,1944,BEST PICTURE,False,Since You Went Away,Drama|War,3.65,177,7.6,3879.0
4,1944,BEST PICTURE,False,Wilson,Drama,3.21,154,6.8,1314.0
5,1945,BEST PICTURE,False,Anchors Aweigh,Comedy|Musical,3.56,140,7.2,7177.0
6,1945,BEST PICTURE,False,The Bells of St. Mary's,,,126,7.4,6375.0
7,1945,BEST PICTURE,True,The Lost Weekend,,,101,8.0,30176.0
8,1945,BEST PICTURE,False,Mildred Pierce,Drama|Film-Noir,3.93,111,8.0,19982.0
9,1945,BEST PICTURE,False,Spellbound,Mystery|Romance|Thriller,3.89,111,7.6,39065.0


## Added the rotten tomato ratings to the dataset which is shown below

In [28]:
rotten_tomato_ratings = pd.read_csv("Final_with_rotten_ratings.csv")

In [29]:
rotten_tomato_ratings.tail(10)

Unnamed: 0.1,Unnamed: 0,year,category,winner,title,genres,MovieLensrating,runtimeMinutes,IMDBRating,imdb_numVotes,RottenCriticRating,rottencritic_numVotes,RottenRating,rotten_numVotes
402,402,2017,BEST PICTURE,True,The Shape of Water,Adventure|Drama|Fantasy,3.6,123,7.3,299111,92.0,403.0,73.0,24730.0
403,403,2017,BEST PICTURE,False,"Three Billboards Outside Ebbing, Missouri",Crime|Drama,4.01,115,8.2,332659,90.0,373.0,86.0,21942.0
404,404,2018,BEST PICTURE,False,Black Panther,Fantasy|Science Fiction,3.66,134,7.3,490200,97.0,462.0,79.0,87455.0
405,405,2018,BEST PICTURE,False,BlacKkKlansman,Drama|Crime,3.71,135,7.5,133426,96.0,391.0,83.0,9054.0
406,406,2018,BEST PICTURE,False,Bohemian Rhapsody,Drama|Biography,3.79,135,8.1,319377,61.0,371.0,86.0,20705.0
407,407,2018,BEST PICTURE,False,The Favourite,Drama|Comedy,3.82,119,7.7,103011,,,85.0,127.0
408,408,2018,BEST PICTURE,True,Green Book,Drama|Comedy,3.94,130,8.3,170827,78.0,316.0,92.0,7859.0
409,409,2018,BEST PICTURE,False,Roma,Drama,3.76,135,7.8,100158,96.0,353.0,71.0,4667.0
410,410,2018,BEST PICTURE,False,A Star Is Born,Drama|Romance,3.67,135,7.8,229903,89.0,472.0,80.0,18019.0
411,411,2018,BEST PICTURE,False,Vice,Drama|Comedy,3.49,132,7.2,58938,66.0,326.0,58.0,3902.0


## Awards Dataset Categorisation

<div>
We used 6 movie awards that are held before Oscars as factors, including:</br>
– National Board of Review Award for Best Film (1962-2018)</br>
– Satellite Award for Best Film (1996-2018)</br>
– Directors Guild of America Award for Outstanding Directing (1962-2018)</br>
– BAFTA Award for Best Film (1962-2018)</br>
– Screen Actors Guild Award (1995-2018)</br>
– Critics’ Choice Movie Award for Best Picture (1995-2018)</br>
</div>

In [30]:
awards_dataset = pd.read_csv("award_tag.csv")

In [31]:
awards_dataset.head(10)

Unnamed: 0,Film,Rater,Year,Win,Genres
0,Lawrence of Arabia,Oscars,1962,1,Adventure
1,The Longest Day,Oscars,1962,0,
2,Meredith Willson's The Music Man,Oscars,1962,0,
3,Mutiny on the Bounty,Oscars,1962,0,Adventure
4,To Kill a Mockingbird,Oscars,1962,0,Drama
5,America America,Oscars,1963,0,
6,Cleopatra,Oscars,1963,0,Drama
7,How the West Was Won,Oscars,1963,0,Adventure
8,Lilies of the Field,Oscars,1963,0,Drama
9,Tom Jones,Oscars,1963,1,Adventure


Win is a dummy variable that kept track of whether the film won that year’s award. If it won, I labelled it as “1”, and if not, “0”.

In [32]:
awards_dataset = awards_dataset.drop(['Genres','Year'], 1)

In [33]:
awards_dataset.columns = ['title', 'nominated_award','won_nomination']

In [34]:
awards_dataset.head(10)

Unnamed: 0,title,nominated_award,won_nomination
0,Lawrence of Arabia,Oscars,1
1,The Longest Day,Oscars,0
2,Meredith Willson's The Music Man,Oscars,0
3,Mutiny on the Bounty,Oscars,0
4,To Kill a Mockingbird,Oscars,0
5,America America,Oscars,0
6,Cleopatra,Oscars,0
7,How the West Was Won,Oscars,0
8,Lilies of the Field,Oscars,0
9,Tom Jones,Oscars,1


In [35]:
#df1 = awards_dataset.loc[awards_dataset['title'] == 'Hamlet']
#df1

In [36]:
rater_oscars = awards_dataset['nominated_award'] == 'Oscars'
df_oscars = awards_dataset[rater_oscars]
df_oscars.columns = [ 'title','nominated_for_oscars','won_oscar']
df_oscars.head()

Unnamed: 0,title,nominated_for_oscars,won_oscar
0,Lawrence of Arabia,Oscars,1
1,The Longest Day,Oscars,0
2,Meredith Willson's The Music Man,Oscars,0
3,Mutiny on the Bounty,Oscars,0
4,To Kill a Mockingbird,Oscars,0


In [37]:
rater_bafta = awards_dataset['nominated_award'] == 'BAFTA'
df_bafta = awards_dataset[rater_bafta]
df_bafta.columns = [ 'title','nominated_for_bafta','won_bafta']
df_bafta.head()

Unnamed: 0,title,nominated_for_bafta,won_bafta
664,Lawrence of Arabia,BAFTA,1
665,Tom Jones,BAFTA,1
666,Dr. Strangelove,BAFTA,1
667,My Fair Lady,BAFTA,1
668,Who's Afraid of Virginia Woolf?,BAFTA,1


In [38]:
rater_sa = awards_dataset['nominated_award'] == 'SA'
df_sa = awards_dataset[rater_sa]
df_sa.columns = [ 'title','nominated_for_sa','won_sa']
df_sa.head()

Unnamed: 0,title,nominated_for_sa,won_sa
266,Fargo,SA,1
267,Titanic,SA,1
268,The Thin Red Line,SA,1
269,The Insider,SA,1
270,Traffic,SA,1


In [39]:
rater_sag = awards_dataset['nominated_award'] == 'SAG'
df_sag = awards_dataset[rater_sag]
df_sag.columns = [ 'title','nominated_for_sag','won_sag']
df_sag.head()

Unnamed: 0,title,nominated_for_sag,won_sag
938,Apollo 13,SAG,1
939,The Birdcage,SAG,1
940,The Full Monty,SAG,1
941,Shakespeare in Love,SAG,1
942,American Beauty,SAG,1


In [40]:
rater_cc = awards_dataset['nominated_award'] == 'CC'
df_cc = awards_dataset[rater_cc]
df_cc.columns = [ 'title','nominated_for_cc','won_cc']
df_cc.head()

Unnamed: 0,title,nominated_for_cc,won_cc
1059,Sense and Sensibility,CC,1
1060,Fargo,CC,1
1061,L.A. Confidential,CC,1
1062,Saving Private Ryan,CC,1
1063,American Beauty,CC,1


In [41]:
rater_dga = awards_dataset['nominated_award'] == 'DGA'
df_dga = awards_dataset[rater_dga]
df_dga.columns = [ 'title','nominated_for_dga','won_dga']
df_dga.head()

Unnamed: 0,title,nominated_for_dga,won_dga
355,Lawrence of Arabia,DGA,1
356,Tom Jones,DGA,1
357,My Fair Lady,DGA,1
358,The Sound of Music,DGA,1
359,A Man for All Seasons,DGA,1


In [42]:
rater_nbra = awards_dataset['nominated_award'] == 'NBRA'
df_nbra = awards_dataset[rater_nbra]
df_nbra.columns = [ 'title','nominated_for_nbra','won_nbra']
df_nbra.head()

Unnamed: 0,title,nominated_for_nbra,won_nbra
205,The Longest Day,NBRA,1
206,Tom Jones,NBRA,1
207,Becket,NBRA,1
208,The Eleanor Roosevelt Story,NBRA,1
209,A Man for All Seasons,NBRA,1


In [43]:
#all_raters_with_wins = [df_oscars,df_bafta,df_sa,df_sag,df_cc,df_dga,df_nbra]

In [44]:
all_raters_with_wins = df_oscars.merge(df_bafta,on='title',how='outer').merge(df_sa,on='title',how='outer').merge(df_sag,on='title',how='outer').merge(df_cc,on='title',how='outer').merge(df_dga,on='title',how='outer').merge(df_nbra,on='title',how='outer')
all_raters_with_wins

Unnamed: 0,title,nominated_for_oscars,won_oscar,nominated_for_bafta,won_bafta,nominated_for_sa,won_sa,nominated_for_sag,won_sag,nominated_for_cc,won_cc,nominated_for_dga,won_dga,nominated_for_nbra,won_nbra
0,Lawrence of Arabia,Oscars,1.0,BAFTA,1.0,,,,,,,DGA,1.0,,
1,The Longest Day,Oscars,0.0,,,,,,,,,,,NBRA,1.0
2,Meredith Willson's The Music Man,Oscars,0.0,,,,,,,,,,,,
3,Mutiny on the Bounty,Oscars,0.0,,,,,,,,,DGA,0.0,,
4,To Kill a Mockingbird,Oscars,0.0,BAFTA,0.0,,,,,,,DGA,0.0,,
5,America America,Oscars,0.0,,,,,,,,,DGA,0.0,,
6,Cleopatra,Oscars,0.0,,,,,,,,,,,,
7,How the West Was Won,Oscars,0.0,,,,,,,,,,,,
8,Lilies of the Field,Oscars,0.0,,,,,,,,,DGA,0.0,,
9,Tom Jones,Oscars,1.0,BAFTA,1.0,,,,,,,DGA,1.0,NBRA,1.0


In [45]:
final_dataset_with_raters_nom = pd.merge(rotten_tomato_ratings,all_raters_with_wins,on = 'title',how='left')
final_dataset_with_raters_nom

Unnamed: 0.1,Unnamed: 0,year,category,winner,title,genres,MovieLensrating,runtimeMinutes,IMDBRating,imdb_numVotes,...,nominated_for_sa,won_sa,nominated_for_sag,won_sag,nominated_for_cc,won_cc,nominated_for_dga,won_dga,nominated_for_nbra,won_nbra
0,0,1944,BEST PICTURE,False,Double Indemnity,Crime|Drama|Film-Noir,4.20,107,8.3,125662,...,,,,,,,,,,
1,1,1944,BEST PICTURE,False,Gaslight,Drama|Thriller,4.03,114,7.8,20733,...,,,,,,,,,,
2,2,1944,BEST PICTURE,True,Going My Way,Comedy|Drama|Musical,3.56,126,7.2,9228,...,,,,,,,,,,
3,3,1944,BEST PICTURE,False,Since You Went Away,Drama|War,3.65,177,7.6,3879,...,,,,,,,,,,
4,4,1944,BEST PICTURE,False,Wilson,Drama,3.21,154,6.8,1314,...,,,,,,,,,,
5,5,1945,BEST PICTURE,False,Anchors Aweigh,Comedy|Musical,3.56,140,7.2,7177,...,,,,,,,,,,
6,6,1945,BEST PICTURE,False,The Bells of St. Mary's,Drama,3.79,126,7.4,6375,...,,,,,,,,,,
7,7,1945,BEST PICTURE,True,The Lost Weekend,Drama|Film-Noir,3.89,101,8.0,30176,...,,,,,,,,,,
8,8,1945,BEST PICTURE,False,Mildred Pierce,Drama|Film-Noir,3.93,111,8.0,19982,...,,,,,,,,,,
9,9,1945,BEST PICTURE,False,Spellbound,Mystery|Romance|Thriller,3.89,111,7.6,39065,...,,,,,,,,,,


In [46]:
final_dataset_with_raters = final_dataset_with_raters_nom.drop(['nominated_for_oscars','nominated_for_bafta','nominated_for_sa','nominated_for_sag','nominated_for_cc','nominated_for_dga','nominated_for_nbra'], 1)
final_dataset_with_raters

Unnamed: 0.1,Unnamed: 0,year,category,winner,title,genres,MovieLensrating,runtimeMinutes,IMDBRating,imdb_numVotes,...,rottencritic_numVotes,RottenRating,rotten_numVotes,won_oscar,won_bafta,won_sa,won_sag,won_cc,won_dga,won_nbra
0,0,1944,BEST PICTURE,False,Double Indemnity,Crime|Drama|Film-Noir,4.20,107,8.3,125662,...,57.0,95.0,35629.0,,,,,,,
1,1,1944,BEST PICTURE,False,Gaslight,Drama|Thriller,4.03,114,7.8,20733,...,28.0,90.0,8460.0,,,,,,,
2,2,1944,BEST PICTURE,True,Going My Way,Comedy|Drama|Musical,3.56,126,7.2,9228,...,24.0,76.0,5748.0,,,,,,,
3,3,1944,BEST PICTURE,False,Since You Went Away,Drama|War,3.65,177,7.6,3879,...,10.0,82.0,1305.0,,,,,,,
4,4,1944,BEST PICTURE,False,Wilson,Drama,3.21,154,6.8,1314,...,8.0,41.0,363.0,,,,,,,
5,5,1945,BEST PICTURE,False,Anchors Aweigh,Comedy|Musical,3.56,140,7.2,7177,...,13.0,79.0,8276.0,,,,,,,
6,6,1945,BEST PICTURE,False,The Bells of St. Mary's,Drama,3.79,126,7.4,6375,...,18.0,77.0,7097.0,,,,,,,
7,7,1945,BEST PICTURE,True,The Lost Weekend,Drama|Film-Noir,3.89,101,8.0,30176,...,33.0,90.0,8800.0,,,,,,,
8,8,1945,BEST PICTURE,False,Mildred Pierce,Drama|Film-Noir,3.93,111,8.0,19982,...,90.0,90.0,8201.0,,,,,,,
9,9,1945,BEST PICTURE,False,Spellbound,Mystery|Romance|Thriller,3.89,111,7.6,39065,...,36.0,82.0,16880.0,,,,,,,


In [47]:
#final_dataset_with_raters.to_csv("Final_Dataset_with_rater_wins.csv")

In [48]:
final_dataset_with_raters["winner"] *= 1
final_dataset_with_raters.to_csv("Final_Dataset_with_rater_wins.csv")
final_dataset_with_raters

Unnamed: 0.1,Unnamed: 0,year,category,winner,title,genres,MovieLensrating,runtimeMinutes,IMDBRating,imdb_numVotes,...,rottencritic_numVotes,RottenRating,rotten_numVotes,won_oscar,won_bafta,won_sa,won_sag,won_cc,won_dga,won_nbra
0,0,1944,BEST PICTURE,0,Double Indemnity,Crime|Drama|Film-Noir,4.20,107,8.3,125662,...,57.0,95.0,35629.0,,,,,,,
1,1,1944,BEST PICTURE,0,Gaslight,Drama|Thriller,4.03,114,7.8,20733,...,28.0,90.0,8460.0,,,,,,,
2,2,1944,BEST PICTURE,1,Going My Way,Comedy|Drama|Musical,3.56,126,7.2,9228,...,24.0,76.0,5748.0,,,,,,,
3,3,1944,BEST PICTURE,0,Since You Went Away,Drama|War,3.65,177,7.6,3879,...,10.0,82.0,1305.0,,,,,,,
4,4,1944,BEST PICTURE,0,Wilson,Drama,3.21,154,6.8,1314,...,8.0,41.0,363.0,,,,,,,
5,5,1945,BEST PICTURE,0,Anchors Aweigh,Comedy|Musical,3.56,140,7.2,7177,...,13.0,79.0,8276.0,,,,,,,
6,6,1945,BEST PICTURE,0,The Bells of St. Mary's,Drama,3.79,126,7.4,6375,...,18.0,77.0,7097.0,,,,,,,
7,7,1945,BEST PICTURE,1,The Lost Weekend,Drama|Film-Noir,3.89,101,8.0,30176,...,33.0,90.0,8800.0,,,,,,,
8,8,1945,BEST PICTURE,0,Mildred Pierce,Drama|Film-Noir,3.93,111,8.0,19982,...,90.0,90.0,8201.0,,,,,,,
9,9,1945,BEST PICTURE,0,Spellbound,Mystery|Romance|Thriller,3.89,111,7.6,39065,...,36.0,82.0,16880.0,,,,,,,


## Conclusion

Post merging the awards dataset and the dataset we got with ratings, we got our final dataset. We had to manually add in data for few years under the various awards category like BAFTA,SAG,SA etc.Our final OscarsData.csv was achieved with manual intervention as well, cause we are taking data from 1944 all the way to 2018. 

*more data, the better the machine will predict*

</br>
<center>
    <img src =tenor.gif width = 400 />
    *hoping for the best...*
</center>