## You have been hired by a rookie movie producer to help him decide what type of movies to produce and which actors to cast. You have to back your recommendations based on thorough analysis of the data he shared with you which has the list of 3000 movies and the corresponding details.

## As a data scientist, you have to first explore the data and check its sanity.

## Further, you have to answer the following questions:
1. ### <b> Which movie made the highest profit? Who were its producer and director? Identify the actors in that film.</b>
2. ### <b>This data has information about movies made in different languages. Which language has the highest average ROI (return on investment)? </b>
3. ### <b> Find out the unique genres of movies in this dataset.</b>
4. ### <b> Make a table of all the producers and directors of each movie. Find the top 3 producers who have produced movies with the highest average RoI? </b>
5. ### <b> Which actor has acted in the most number of movies? Deep dive into the movies, genres and profits corresponding to this actor. </b>
6. ### <b>Top 3 directors prefer which actors the most? </b>



# Data Exploration

In [1]:
#Import package
import pandas as pd
import numpy as np

In [2]:
df=pd.read_csv('imdb_data.csv')

In [3]:
df.shape

(3000, 23)

In [4]:
df.size

69000

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3000 entries, 0 to 2999
Data columns (total 23 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   id                     3000 non-null   int64  
 1   belongs_to_collection  604 non-null    object 
 2   budget                 3000 non-null   int64  
 3   genres                 2993 non-null   object 
 4   homepage               946 non-null    object 
 5   imdb_id                3000 non-null   object 
 6   original_language      3000 non-null   object 
 7   original_title         3000 non-null   object 
 8   overview               2992 non-null   object 
 9   popularity             3000 non-null   float64
 10  poster_path            2999 non-null   object 
 11  production_companies   2844 non-null   object 
 12  production_countries   2945 non-null   object 
 13  release_date           3000 non-null   object 
 14  runtime                2998 non-null   float64
 15  spok

In [6]:
df.columns

Index(['id', 'belongs_to_collection', 'budget', 'genres', 'homepage',
       'imdb_id', 'original_language', 'original_title', 'overview',
       'popularity', 'poster_path', 'production_companies',
       'production_countries', 'release_date', 'runtime', 'spoken_languages',
       'status', 'tagline', 'title', 'Keywords', 'cast', 'crew', 'revenue'],
      dtype='object')

# 1. Which movie made the highest profit? Who were its producer and director? Identify the actors in that film.

In [7]:
#now we need to select the features that we need to solve this question 1.
# we need title of the movie, 
#profit feature
#to get the producer and directors that are in crew feature so we need crew also
# the actor is in the cast feature so we take cast also
#now we make a seprate dataframe including these features only for coniniency

df1=df[['original_title','cast','crew','budget','revenue','genres']]

In [8]:
df1.head()

Unnamed: 0,original_title,cast,crew,budget,revenue,genres
0,Hot Tub Time Machine 2,"[{'cast_id': 4, 'character': 'Lou', 'credit_id...","[{'credit_id': '59ac067c92514107af02c8c8', 'de...",14000000,12314651,"[{'id': 35, 'name': 'Comedy'}]"
1,The Princess Diaries 2: Royal Engagement,"[{'cast_id': 1, 'character': 'Mia Thermopolis'...","[{'credit_id': '52fe43fe9251416c7502563d', 'de...",40000000,95149435,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam..."
2,Whiplash,"[{'cast_id': 5, 'character': 'Andrew Neimann',...","[{'credit_id': '54d5356ec3a3683ba0000039', 'de...",3300000,13092000,"[{'id': 18, 'name': 'Drama'}]"
3,Kahaani,"[{'cast_id': 1, 'character': 'Vidya Bagchi', '...","[{'credit_id': '52fe48779251416c9108d6eb', 'de...",1200000,16000000,"[{'id': 53, 'name': 'Thriller'}, {'id': 18, 'n..."
4,마린보이,"[{'cast_id': 3, 'character': 'Chun-soo', 'cred...","[{'credit_id': '52fe464b9251416c75073b43', 'de...",0,3923970,"[{'id': 28, 'name': 'Action'}, {'id': 53, 'nam..."


In [9]:
df1.describe()
# in the budget the min value is 0 that is not possible and the for revenue also the min value is 1 that is also 
#not possible so we need to fill this value
#so we need to correct this data
# to fill these value its upto us what we need to do either mean median or mode here say we take median to fill the 0 with median

Unnamed: 0,budget,revenue
count,3000.0,3000.0
mean,22531330.0,66725850.0
std,37026090.0,137532300.0
min,0.0,1.0
25%,0.0,2379808.0
50%,8000000.0,16807070.0
75%,29000000.0,68919200.0
max,380000000.0,1519558000.0


In [10]:
df1.dtypes

original_title    object
cast              object
crew              object
budget             int64
revenue            int64
genres            object
dtype: object

In [11]:
# for replacing the 0 value for budget we need to identify the movies having 0 budget first

df1.loc[df1['budget'] < 1000,'budget']=df1[df1['budget']>=1000]['budget'].median()
# here we are locating those lines using df1.loc[df1['budget'] < 1000 where the budget is extremely low like say less than 1000 dollars means below this
#how a movie is possible in short give any suitable value u want and then by adding 'budget' we are locating the column of 
#those rows and then we are replacing them with the median of the budget for the other rows


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(loc, value, pi)


In [12]:
#similarly we need to do this for revenue also'same thing 
df1.loc[df1['revenue']<1000,'revenue']=df1[df1['revenue']>=1000]['revenue'].median()

In [13]:
df1.describe() # see we have grater values for min 

Unnamed: 0,budget,revenue
count,3000.0,3000.0
mean,27082500.0,67058110.0
std,34927730.0,137391700.0
min,2500.0,1404.0
25%,10000000.0,2947600.0
50%,16450000.0,17487530.0
75%,29000000.0,68919200.0
max,380000000.0,1519558000.0


In [14]:
# since we dont have profit feature so we need to add a new col profit by calculatind profit from budget and revenue
#see weather any null nalues are there in thsese columns
df1['profit']=df['revenue']-df['budget']
df1.head() # a profit column is added at the last

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df1['profit']=df['revenue']-df['budget']


Unnamed: 0,original_title,cast,crew,budget,revenue,genres,profit
0,Hot Tub Time Machine 2,"[{'cast_id': 4, 'character': 'Lou', 'credit_id...","[{'credit_id': '59ac067c92514107af02c8c8', 'de...",14000000,12314651,"[{'id': 35, 'name': 'Comedy'}]",-1685349
1,The Princess Diaries 2: Royal Engagement,"[{'cast_id': 1, 'character': 'Mia Thermopolis'...","[{'credit_id': '52fe43fe9251416c7502563d', 'de...",40000000,95149435,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",55149435
2,Whiplash,"[{'cast_id': 5, 'character': 'Andrew Neimann',...","[{'credit_id': '54d5356ec3a3683ba0000039', 'de...",3300000,13092000,"[{'id': 18, 'name': 'Drama'}]",9792000
3,Kahaani,"[{'cast_id': 1, 'character': 'Vidya Bagchi', '...","[{'credit_id': '52fe48779251416c9108d6eb', 'de...",1200000,16000000,"[{'id': 53, 'name': 'Thriller'}, {'id': 18, 'n...",14800000
4,마린보이,"[{'cast_id': 3, 'character': 'Chun-soo', 'cred...","[{'credit_id': '52fe464b9251416c75073b43', 'de...",16450000,3923970,"[{'id': 28, 'name': 'Action'}, {'id': 53, 'nam...",3923970


# the first part of the question1 is to find the movie that has the max profit

In [15]:
df1['profit'].max() # this is the highest profit now we want the movie name for this profit

1316249360

In [16]:
#to get the movie name that is having highest profit see the code , from the below line we get the entire row of that movie
# from the row we have to get the title of the movie
#to get the title first we save the index value of the row that is having highest profit here it is 1761
df1[df1['profit']==df1['profit'].max()]

Unnamed: 0,original_title,cast,crew,budget,revenue,genres,profit
1761,Furious 7,"[{'cast_id': 17, 'character': 'Dominic Toretto...","[{'credit_id': '52fe4cc8c3a36847f823e681', 'de...",190000000,1506249360,"[{'id': 28, 'name': 'Action'}]",1316249360


In [17]:
# to get the index we use simple
df1[df1['profit']==df1['profit'].max()].index

# the index we got is the integer and we will save it as a list so to do that see below code

Int64Index([1761], dtype='int64')

In [18]:
list(df1[df1['profit']==df1['profit'].max()].index) #we got the index value as list

[1761]

In [19]:
# then how to excess the list element is simple give the index value of the list element that is 0 in this case
list(df1[df1['profit']==df1['profit'].max()].index)[0]

#here we got the actual element of the list

1761

In [20]:
# we can save the index value in a variable 
max_profit_index=list(df1[df1['profit']==df1['profit'].max()].index)[0]

In [21]:
max_profit_index

1761

In [22]:
# now we have the index value of the row that has movie with highest profit
highest_profit_movie=df1.loc[max_profit_index,'original_title']
print('the movie with the highest profit is',highest_profit_movie)

the movie with the highest profit is Furious 7


# now the other part of the question is to find the producer and director and actor of that movie

In [23]:
# the info about the producer and the director is available in the crew. so we need to retrive the crew of the movie Furious 7
#lets see
df1['crew']

0       [{'credit_id': '59ac067c92514107af02c8c8', 'de...
1       [{'credit_id': '52fe43fe9251416c7502563d', 'de...
2       [{'credit_id': '54d5356ec3a3683ba0000039', 'de...
3       [{'credit_id': '52fe48779251416c9108d6eb', 'de...
4       [{'credit_id': '52fe464b9251416c75073b43', 'de...
                              ...                        
2995    [{'credit_id': '52fe4494c3a368484e02ac7d', 'de...
2996    [{'credit_id': '5716b72ac3a3686678012c84', 'de...
2997    [{'credit_id': '52fe443a9251416c7502d579', 'de...
2998    [{'credit_id': '556f817b9251410866000a63', 'de...
2999    [{'credit_id': '5391990d0e0a260fb5001629', 'de...
Name: crew, Length: 3000, dtype: object

In [24]:
#we are taking the info about crew for the index value at which we find the highest earning movie
df1.loc[max_profit_index,'crew']

#here we have the info about the crew members of the our highest earning movie which is at index 1761 that is stored in
#variable max_profit_index

#so basically it is a string 

'[{\'credit_id\': \'52fe4cc8c3a36847f823e681\', \'department\': \'Production\', \'gender\': 2, \'id\': 12835, \'job\': \'Producer\', \'name\': \'Vin Diesel\', \'profile_path\': \'/7rwSXluNWZAluYMOEWBxkPmckES.jpg\'}, {\'credit_id\': \'52fe4cc8c3a36847f823e687\', \'department\': \'Production\', \'gender\': 2, \'id\': 11874, \'job\': \'Producer\', \'name\': \'Neal H. Moritz\', \'profile_path\': \'/cNcsEYmoS4niCz3UkVAA09dUIob.jpg\'}, {\'credit_id\': \'52fe4cc8c3a36847f823e68d\', \'department\': \'Writing\', \'gender\': 2, \'id\': 58191, \'job\': \'Writer\', \'name\': \'Chris Morgan\', \'profile_path\': \'/dUGxIwFBLrSFLImxjeda1krndMO.jpg\'}, {\'credit_id\': \'52fe4cc8c3a36847f823e693\', \'department\': \'Writing\', \'gender\': 0, \'id\': 8162, \'job\': \'Characters\', \'name\': \'Gary Scott Thompson\', \'profile_path\': \'/e2dMfqFvRsOXgWZ1VToYLmos17y.jpg\'}, {\'credit_id\': \'52fe4cc8c3a36847f823e699\', \'department\': \'Directing\', \'gender\': 2, \'id\': 2127, \'job\': \'Director\', \'nam

In [25]:
# we need to convert the string into list, because basically it is the list of dictionaries
#first we need to convert the string into list

# to convert the string into list we will use eval() function

crew_members=eval(df1.loc[max_profit_index,'crew'])

#here we have got the list of dictionaries now
crew_members

[{'credit_id': '52fe4cc8c3a36847f823e681',
  'department': 'Production',
  'gender': 2,
  'id': 12835,
  'job': 'Producer',
  'name': 'Vin Diesel',
  'profile_path': '/7rwSXluNWZAluYMOEWBxkPmckES.jpg'},
 {'credit_id': '52fe4cc8c3a36847f823e687',
  'department': 'Production',
  'gender': 2,
  'id': 11874,
  'job': 'Producer',
  'name': 'Neal H. Moritz',
  'profile_path': '/cNcsEYmoS4niCz3UkVAA09dUIob.jpg'},
 {'credit_id': '52fe4cc8c3a36847f823e68d',
  'department': 'Writing',
  'gender': 2,
  'id': 58191,
  'job': 'Writer',
  'name': 'Chris Morgan',
  'profile_path': '/dUGxIwFBLrSFLImxjeda1krndMO.jpg'},
 {'credit_id': '52fe4cc8c3a36847f823e693',
  'department': 'Writing',
  'gender': 0,
  'id': 8162,
  'job': 'Characters',
  'name': 'Gary Scott Thompson',
  'profile_path': '/e2dMfqFvRsOXgWZ1VToYLmos17y.jpg'},
 {'credit_id': '52fe4cc8c3a36847f823e699',
  'department': 'Directing',
  'gender': 2,
  'id': 2127,
  'job': 'Director',
  'name': 'James Wan',
  'profile_path': '/d1LSKfzi5J6QngW

In [26]:
# now we use a for loop to access the each element of the list
for member in crew_members:
    print(member)

#we have accessed each dictinary 

{'credit_id': '52fe4cc8c3a36847f823e681', 'department': 'Production', 'gender': 2, 'id': 12835, 'job': 'Producer', 'name': 'Vin Diesel', 'profile_path': '/7rwSXluNWZAluYMOEWBxkPmckES.jpg'}
{'credit_id': '52fe4cc8c3a36847f823e687', 'department': 'Production', 'gender': 2, 'id': 11874, 'job': 'Producer', 'name': 'Neal H. Moritz', 'profile_path': '/cNcsEYmoS4niCz3UkVAA09dUIob.jpg'}
{'credit_id': '52fe4cc8c3a36847f823e68d', 'department': 'Writing', 'gender': 2, 'id': 58191, 'job': 'Writer', 'name': 'Chris Morgan', 'profile_path': '/dUGxIwFBLrSFLImxjeda1krndMO.jpg'}
{'credit_id': '52fe4cc8c3a36847f823e693', 'department': 'Writing', 'gender': 0, 'id': 8162, 'job': 'Characters', 'name': 'Gary Scott Thompson', 'profile_path': '/e2dMfqFvRsOXgWZ1VToYLmos17y.jpg'}
{'credit_id': '52fe4cc8c3a36847f823e699', 'department': 'Directing', 'gender': 2, 'id': 2127, 'job': 'Director', 'name': 'James Wan', 'profile_path': '/d1LSKfzi5J6QngWS7niN1zPJdud.jpg'}
{'credit_id': '52fe4cc8c3a36847f823e6a7', 'departm

In [27]:
# from the list of dictionaries we have to retrive who is producer AND director of this movie Furious 7
# for that we just check if the  name is producer we will save it as producer and if the name is director than we will save it as 
#director
producer=[]
director=[]
for member in crew_members:
    if member['job']=='Director':
        director.append(member['name'])
    elif member['job']=='Producer':
        producer.append(member['name'])
        


In [28]:
print('the director for the {} movie is {}'.format(highest_profit_movie,director))
print('the producer for the {} movie is {}'.format(highest_profit_movie,producer))

the director for the Furious 7 movie is ['James Wan']
the producer for the Furious 7 movie is ['Vin Diesel', 'Neal H. Moritz', 'Michael Fottrell', 'Brandon Birtell']


## now we need to find the actors of the movie Furious 7

In [29]:
# we have the info about the actors in the cast feature

df1['cast']

0       [{'cast_id': 4, 'character': 'Lou', 'credit_id...
1       [{'cast_id': 1, 'character': 'Mia Thermopolis'...
2       [{'cast_id': 5, 'character': 'Andrew Neimann',...
3       [{'cast_id': 1, 'character': 'Vidya Bagchi', '...
4       [{'cast_id': 3, 'character': 'Chun-soo', 'cred...
                              ...                        
2995    [{'cast_id': 2, 'character': 'Rock Reilly', 'c...
2996    [{'cast_id': 5, 'character': 'Bobo', 'credit_i...
2997    [{'cast_id': 10, 'character': 'Samantha Caine ...
2998    [{'cast_id': 8, 'character': 'Reuben Feffer', ...
2999    [{'cast_id': 2, 'character': 'Nathan Harper', ...
Name: cast, Length: 3000, dtype: object

In [30]:
df1.loc[max_profit_index,'cast']
#same here also we have string we need to convert it into list

'[{\'cast_id\': 17, \'character\': \'Dominic Toretto\', \'credit_id\': \'5431dfd10e0a265915002c34\', \'gender\': 2, \'id\': 12835, \'name\': \'Vin Diesel\', \'order\': 0, \'profile_path\': \'/7rwSXluNWZAluYMOEWBxkPmckES.jpg\'}, {\'cast_id\': 19, \'character\': "Brian O\'Conner", \'credit_id\': \'5431dfe4c3a3681143002b98\', \'gender\': 2, \'id\': 8167, \'name\': \'Paul Walker\', \'order\': 1, \'profile_path\': \'/iqvYezRoEY5k8wnlfHriHQfl5dX.jpg\'}, {\'cast_id\': 18, \'character\': \'Hobbs\', \'credit_id\': \'5431dfdbc3a36831a6004376\', \'gender\': 2, \'id\': 18918, \'name\': \'Dwayne Johnson\', \'order\': 2, \'profile_path\': \'/kuqFzlYMc2IrsOyPznMd1FroeGq.jpg\'}, {\'cast_id\': 20, \'character\': \'Letty\', \'credit_id\': \'5431dfeec3a36811ef002c75\', \'gender\': 1, \'id\': 17647, \'name\': \'Michelle Rodriguez\', \'order\': 3, \'profile_path\': \'/v37VK0MNuRuJOCKPKJcZAJXRA5r.jpg\'}, {\'cast_id\': 28, \'character\': \'Roman\', \'credit_id\': \'5431e0310e0a2656e2002c95\', \'gender\': 2, 

In [31]:
cast_members=eval(df1.loc[max_profit_index,'cast']) #the string is converted into list which is having dictionaries
cast_members

[{'cast_id': 17,
  'character': 'Dominic Toretto',
  'credit_id': '5431dfd10e0a265915002c34',
  'gender': 2,
  'id': 12835,
  'name': 'Vin Diesel',
  'order': 0,
  'profile_path': '/7rwSXluNWZAluYMOEWBxkPmckES.jpg'},
 {'cast_id': 19,
  'character': "Brian O'Conner",
  'credit_id': '5431dfe4c3a3681143002b98',
  'gender': 2,
  'id': 8167,
  'name': 'Paul Walker',
  'order': 1,
  'profile_path': '/iqvYezRoEY5k8wnlfHriHQfl5dX.jpg'},
 {'cast_id': 18,
  'character': 'Hobbs',
  'credit_id': '5431dfdbc3a36831a6004376',
  'gender': 2,
  'id': 18918,
  'name': 'Dwayne Johnson',
  'order': 2,
  'profile_path': '/kuqFzlYMc2IrsOyPznMd1FroeGq.jpg'},
 {'cast_id': 20,
  'character': 'Letty',
  'credit_id': '5431dfeec3a36811ef002c75',
  'gender': 1,
  'id': 17647,
  'name': 'Michelle Rodriguez',
  'order': 3,
  'profile_path': '/v37VK0MNuRuJOCKPKJcZAJXRA5r.jpg'},
 {'cast_id': 28,
  'character': 'Roman',
  'credit_id': '5431e0310e0a2656e2002c95',
  'gender': 2,
  'id': 8169,
  'name': 'Tyrese Gibson',
 

In [32]:
#from this list of dictionaries we need to access the characters 

for actor in cast_members:
    print(actor)
    
#in actor there is seprate dictionaries one by one see below
#from each dictionary we need to access the characters

{'cast_id': 17, 'character': 'Dominic Toretto', 'credit_id': '5431dfd10e0a265915002c34', 'gender': 2, 'id': 12835, 'name': 'Vin Diesel', 'order': 0, 'profile_path': '/7rwSXluNWZAluYMOEWBxkPmckES.jpg'}
{'cast_id': 19, 'character': "Brian O'Conner", 'credit_id': '5431dfe4c3a3681143002b98', 'gender': 2, 'id': 8167, 'name': 'Paul Walker', 'order': 1, 'profile_path': '/iqvYezRoEY5k8wnlfHriHQfl5dX.jpg'}
{'cast_id': 18, 'character': 'Hobbs', 'credit_id': '5431dfdbc3a36831a6004376', 'gender': 2, 'id': 18918, 'name': 'Dwayne Johnson', 'order': 2, 'profile_path': '/kuqFzlYMc2IrsOyPznMd1FroeGq.jpg'}
{'cast_id': 20, 'character': 'Letty', 'credit_id': '5431dfeec3a36811ef002c75', 'gender': 1, 'id': 17647, 'name': 'Michelle Rodriguez', 'order': 3, 'profile_path': '/v37VK0MNuRuJOCKPKJcZAJXRA5r.jpg'}
{'cast_id': 28, 'character': 'Roman', 'credit_id': '5431e0310e0a2656e2002c95', 'gender': 2, 'id': 8169, 'name': 'Tyrese Gibson', 'order': 4, 'profile_path': '/a3tyF7QXgeEH0QuEuIzNZZ8oLNS.jpg'}
{'cast_id': 

In [33]:
# accessing the characters from the list cast_members
actors=[]
for actor in cast_members:
    actors.append(actor['name'])
print(actors)    

['Vin Diesel', 'Paul Walker', 'Dwayne Johnson', 'Michelle Rodriguez', 'Tyrese Gibson', 'Ludacris', 'Jordana Brewster', 'Djimon Hounsou', 'Tony Jaa', 'Ronda Rousey', 'Nathalie Emmanuel', 'Kurt Russell', 'Jason Statham', 'Sung Kang', 'Gal Gadot', 'Lucas Black', 'Elsa Pataky', 'Noel Gugliemi', 'John Brotherton', 'Luke Evans', 'Ali Fazal', 'Miller Kimsey', 'Charlie Kimsey', 'Eden Estrella', 'Gentry White', 'Iggy Azalea', 'Jon Lee Brody', 'Levy Tran', 'Anna Colwell', 'Viktor Hernandez', 'Steve Coulter', 'Robert Pralgo', 'Antwan Mills', 'J.J. Phillips', 'Jorge Ferragut', 'Sara Sohn', 'Benjamin Blankenship', 'D.J. Hapa', 'T-Pain', 'Brian Mahoney', 'Brittney Alger', 'Romeo Santos', 'Jocelin Donahue', 'Stephanie Langston', 'Jorge-Luis Pallo', 'Tego Calder√≥n', 'Nathalie Kelley', 'Shad Moss', 'Don Omar', 'Klement Tinaj', 'Caleb Walker', 'Cody Walker']


# first question is answered perfectly!!!!!

# Question 2:
# This data has information about movies made in different languages. Which language has the highest average ROI (return on investment)? ¶


In [34]:
# first find out the ROI  which is percentage profit

df1['roi']=100*df1['profit']/df1['budget']


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df1['roi']=100*df1['profit']/df1['budget']


In [35]:
df1

Unnamed: 0,original_title,cast,crew,budget,revenue,genres,profit,roi
0,Hot Tub Time Machine 2,"[{'cast_id': 4, 'character': 'Lou', 'credit_id...","[{'credit_id': '59ac067c92514107af02c8c8', 'de...",14000000,12314651,"[{'id': 35, 'name': 'Comedy'}]",-1685349,-12.038207
1,The Princess Diaries 2: Royal Engagement,"[{'cast_id': 1, 'character': 'Mia Thermopolis'...","[{'credit_id': '52fe43fe9251416c7502563d', 'de...",40000000,95149435,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",55149435,137.873588
2,Whiplash,"[{'cast_id': 5, 'character': 'Andrew Neimann',...","[{'credit_id': '54d5356ec3a3683ba0000039', 'de...",3300000,13092000,"[{'id': 18, 'name': 'Drama'}]",9792000,296.727273
3,Kahaani,"[{'cast_id': 1, 'character': 'Vidya Bagchi', '...","[{'credit_id': '52fe48779251416c9108d6eb', 'de...",1200000,16000000,"[{'id': 53, 'name': 'Thriller'}, {'id': 18, 'n...",14800000,1233.333333
4,마린보이,"[{'cast_id': 3, 'character': 'Chun-soo', 'cred...","[{'credit_id': '52fe464b9251416c75073b43', 'de...",16450000,3923970,"[{'id': 28, 'name': 'Action'}, {'id': 53, 'nam...",3923970,23.853921
...,...,...,...,...,...,...,...,...
2995,Chasers,"[{'cast_id': 2, 'character': 'Rock Reilly', 'c...","[{'credit_id': '52fe4494c3a368484e02ac7d', 'de...",16450000,1596687,"[{'id': 35, 'name': 'Comedy'}, {'id': 10749, '...",1596687,9.706304
2996,Vi är bäst!,"[{'cast_id': 5, 'character': 'Bobo', 'credit_i...","[{'credit_id': '5716b72ac3a3686678012c84', 'de...",16450000,180590,"[{'id': 18, 'name': 'Drama'}, {'id': 10402, 'n...",180590,1.097812
2997,The Long Kiss Goodnight,"[{'cast_id': 10, 'character': 'Samantha Caine ...","[{'credit_id': '52fe443a9251416c7502d579', 'de...",65000000,89456761,"[{'id': 80, 'name': 'Crime'}, {'id': 28, 'name...",24456761,37.625786
2998,Along Came Polly,"[{'cast_id': 8, 'character': 'Reuben Feffer', ...","[{'credit_id': '556f817b9251410866000a63', 'de...",42000000,171963386,"[{'id': 35, 'name': 'Comedy'}, {'id': 10749, '...",129963386,309.436633


In [36]:
# we will add language to our df1 dataframe from df
df1['original_language']=df['original_language']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df1['original_language']=df['original_language']


In [37]:
df1

Unnamed: 0,original_title,cast,crew,budget,revenue,genres,profit,roi,original_language
0,Hot Tub Time Machine 2,"[{'cast_id': 4, 'character': 'Lou', 'credit_id...","[{'credit_id': '59ac067c92514107af02c8c8', 'de...",14000000,12314651,"[{'id': 35, 'name': 'Comedy'}]",-1685349,-12.038207,en
1,The Princess Diaries 2: Royal Engagement,"[{'cast_id': 1, 'character': 'Mia Thermopolis'...","[{'credit_id': '52fe43fe9251416c7502563d', 'de...",40000000,95149435,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",55149435,137.873588,en
2,Whiplash,"[{'cast_id': 5, 'character': 'Andrew Neimann',...","[{'credit_id': '54d5356ec3a3683ba0000039', 'de...",3300000,13092000,"[{'id': 18, 'name': 'Drama'}]",9792000,296.727273,en
3,Kahaani,"[{'cast_id': 1, 'character': 'Vidya Bagchi', '...","[{'credit_id': '52fe48779251416c9108d6eb', 'de...",1200000,16000000,"[{'id': 53, 'name': 'Thriller'}, {'id': 18, 'n...",14800000,1233.333333,hi
4,마린보이,"[{'cast_id': 3, 'character': 'Chun-soo', 'cred...","[{'credit_id': '52fe464b9251416c75073b43', 'de...",16450000,3923970,"[{'id': 28, 'name': 'Action'}, {'id': 53, 'nam...",3923970,23.853921,ko
...,...,...,...,...,...,...,...,...,...
2995,Chasers,"[{'cast_id': 2, 'character': 'Rock Reilly', 'c...","[{'credit_id': '52fe4494c3a368484e02ac7d', 'de...",16450000,1596687,"[{'id': 35, 'name': 'Comedy'}, {'id': 10749, '...",1596687,9.706304,en
2996,Vi är bäst!,"[{'cast_id': 5, 'character': 'Bobo', 'credit_i...","[{'credit_id': '5716b72ac3a3686678012c84', 'de...",16450000,180590,"[{'id': 18, 'name': 'Drama'}, {'id': 10402, 'n...",180590,1.097812,sv
2997,The Long Kiss Goodnight,"[{'cast_id': 10, 'character': 'Samantha Caine ...","[{'credit_id': '52fe443a9251416c7502d579', 'de...",65000000,89456761,"[{'id': 80, 'name': 'Crime'}, {'id': 28, 'name...",24456761,37.625786,en
2998,Along Came Polly,"[{'cast_id': 8, 'character': 'Reuben Feffer', ...","[{'credit_id': '556f817b9251410866000a63', 'de...",42000000,171963386,"[{'id': 35, 'name': 'Comedy'}, {'id': 10749, '...",129963386,309.436633,en


In [38]:
df1.groupby('original_language')['roi'].mean() 


original_language
ar        8.192991
bn        3.260571
cn      146.002479
cs        0.105733
da       84.702661
de      312.279086
el     5198.013245
en      952.871319
es      163.898594
fa       83.235168
fi      -30.319235
fr       99.174226
he      456.292450
hi      253.929828
hu      -28.868718
id      -41.617578
it       67.970080
ja      171.241274
ko    11322.633937
ml      370.833333
mr      193.333333
nb       18.847943
nl       25.857992
no       74.155275
pl      395.816987
pt        4.041874
ro      -37.182109
ru      105.975708
sr      -99.960400
sv       87.564781
ta      177.614250
te      440.076812
tr      625.262867
ur      350.673773
vi      -50.846154
zh      324.218912
Name: roi, dtype: float64

In [39]:
# addition to the above statement we convert it into data frame

df1.groupby('original_language')['roi'].mean().reset_index() #by adding rest_index it will convert the series into dataframe

Unnamed: 0,original_language,roi
0,ar,8.192991
1,bn,3.260571
2,cn,146.002479
3,cs,0.105733
4,da,84.702661
5,de,312.279086
6,el,5198.013245
7,en,952.871319
8,es,163.898594
9,fa,83.235168


In [40]:
#then we need to sort it to get the highest average roi and its language

avg_roi=df1.groupby('original_language')['roi'].mean().reset_index().sort_values(by='roi',ascending=False).head(3)

#this is the maximum average roi for the language KO
avg_roi

Unnamed: 0,original_language,roi
18,ko,11322.633937
6,el,5198.013245
7,en,952.871319


In [41]:
# dtore it in the variable higest_avg_roi

highest_avg_roi_lang=avg_roi.iloc[0,0]
highest_avg_roi_lang
print('the langauage with highest average roi is {}'.format(highest_avg_roi_lang))

the langauage with highest average roi is ko


# Q.3 Find out the unique genres of movies in this dataset.

In [42]:
df1.head(3)

Unnamed: 0,original_title,cast,crew,budget,revenue,genres,profit,roi,original_language
0,Hot Tub Time Machine 2,"[{'cast_id': 4, 'character': 'Lou', 'credit_id...","[{'credit_id': '59ac067c92514107af02c8c8', 'de...",14000000,12314651,"[{'id': 35, 'name': 'Comedy'}]",-1685349,-12.038207,en
1,The Princess Diaries 2: Royal Engagement,"[{'cast_id': 1, 'character': 'Mia Thermopolis'...","[{'credit_id': '52fe43fe9251416c7502563d', 'de...",40000000,95149435,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",55149435,137.873588,en
2,Whiplash,"[{'cast_id': 5, 'character': 'Andrew Neimann',...","[{'credit_id': '54d5356ec3a3683ba0000039', 'de...",3300000,13092000,"[{'id': 18, 'name': 'Drama'}]",9792000,296.727273,en


In [43]:
df1['genres'].isnull().sum() #seven null values 

7

In [44]:
df[df1['genres'].isna()] #these are the only NaN values of genres

Unnamed: 0,id,belongs_to_collection,budget,genres,homepage,imdb_id,original_language,original_title,overview,popularity,...,release_date,runtime,spoken_languages,status,tagline,title,Keywords,cast,crew,revenue
470,471,,2000000,,,tt0349159,en,"The Book of Mormon Movie, Volume 1: The Journey",The story of Lehi and his wife Sariah and thei...,0.079856,...,9/12/03,120.0,,Released,"2600 years ago, one family began a remarkable ...","The Book of Mormon Movie, Volume 1: The Journey",,"[{'cast_id': 1, 'character': 'Sam', 'credit_id...",,1672730
1622,1623,,400000,,,tt0261755,en,Jackpot,"Sunny Holiday, an aspiring singing star, aband...",0.218588,...,7/26/01,97.0,,Released,,Jackpot,,"[{'cast_id': 4, 'character': '', 'credit_id': ...","[{'credit_id': '52fe4d3c9251416c9110f319', 'de...",43719
1814,1815,,2700000,,,tt0110289,it,Курочка Ряба,In Soviet days an old peasant woman's hen begi...,0.677253,...,10/1/94,117.0,"[{'iso_639_1': 'ru', 'name': 'Pусский'}]",Released,,"Ryaba, My Chicken",,[],"[{'credit_id': '52fe4c139251416c910eeee3', 'de...",4635143
1819,1820,,0,,,tt0352622,ru,Небо. Самолёт. Девушка.,"The tale of a brief, life-altering love affair...",0.518078,...,9/2/02,91.0,"[{'iso_639_1': 'ru', 'name': 'Pусский'}]",Released,,Sky. Plane. Girl.,"[{'id': 187056, 'name': 'woman director'}]","[{'cast_id': 2, 'character': '', 'credit_id': ...","[{'credit_id': '52fe4728c3a368484e0b7f53', 'de...",314195
2423,2424,,500000,,,tt0984177,en,Amarkalam,Vasu is a tough street crook who lives at a mo...,0.493342,...,8/25/99,157.0,"[{'iso_639_1': 'ta', 'name': 'தமிழ்'}]",Released,,Amarkalam,,"[{'cast_id': 1, 'character': 'Vaasu', 'credit_...","[{'credit_id': '53b42af80e0a26598c00cea3', 'de...",500000
2686,2687,,0,,,tt0833448,ru,Лифт,A psychological thriller. One quite ordinary s...,0.158207,...,7/1/06,88.0,,Released,,Lift,,[],"[{'credit_id': '57b8a5d19251411bc6000587', 'de...",123182
2900,2901,,200000,,http://ritaslastfairytale.ru/,tt1766044,en,Poslednyaya skazka Rity,The film speaks about universal themes of love...,0.560685,...,11/1/12,100.0,"[{'iso_639_1': 'ru', 'name': 'Pусский'}]",Released,,Rita's Last Fairy Tale,"[{'id': 187056, 'name': 'woman director'}]","[{'cast_id': 3, 'character': '', 'credit_id': ...","[{'credit_id': '52fe4ab89251416c750ebaab', 'de...",486937


In [45]:
#taking the compliment of above statement will give the non NaN values using ~ tilda operator we get the complement
new_df=df[~df1['genres'].isna()]

In [46]:
new_df # these are all non null values of genres stored in a new data frame

Unnamed: 0,id,belongs_to_collection,budget,genres,homepage,imdb_id,original_language,original_title,overview,popularity,...,release_date,runtime,spoken_languages,status,tagline,title,Keywords,cast,crew,revenue
0,1,"[{'id': 313576, 'name': 'Hot Tub Time Machine ...",14000000,"[{'id': 35, 'name': 'Comedy'}]",,tt2637294,en,Hot Tub Time Machine 2,"When Lou, who has become the ""father of the In...",6.575393,...,2/20/15,93.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,The Laws of Space and Time are About to be Vio...,Hot Tub Time Machine 2,"[{'id': 4379, 'name': 'time travel'}, {'id': 9...","[{'cast_id': 4, 'character': 'Lou', 'credit_id...","[{'credit_id': '59ac067c92514107af02c8c8', 'de...",12314651
1,2,"[{'id': 107674, 'name': 'The Princess Diaries ...",40000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",,tt0368933,en,The Princess Diaries 2: Royal Engagement,Mia Thermopolis is now a college graduate and ...,8.248895,...,8/6/04,113.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,It can take a lifetime to find true love; she'...,The Princess Diaries 2: Royal Engagement,"[{'id': 2505, 'name': 'coronation'}, {'id': 42...","[{'cast_id': 1, 'character': 'Mia Thermopolis'...","[{'credit_id': '52fe43fe9251416c7502563d', 'de...",95149435
2,3,,3300000,"[{'id': 18, 'name': 'Drama'}]",http://sonyclassics.com/whiplash/,tt2582802,en,Whiplash,"Under the direction of a ruthless instructor, ...",64.299990,...,10/10/14,105.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,The road to greatness can take you to the edge.,Whiplash,"[{'id': 1416, 'name': 'jazz'}, {'id': 1523, 'n...","[{'cast_id': 5, 'character': 'Andrew Neimann',...","[{'credit_id': '54d5356ec3a3683ba0000039', 'de...",13092000
3,4,,1200000,"[{'id': 53, 'name': 'Thriller'}, {'id': 18, 'n...",http://kahaanithefilm.com/,tt1821480,hi,Kahaani,Vidya Bagchi (Vidya Balan) arrives in Kolkata ...,3.174936,...,3/9/12,122.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,,Kahaani,"[{'id': 10092, 'name': 'mystery'}, {'id': 1054...","[{'cast_id': 1, 'character': 'Vidya Bagchi', '...","[{'credit_id': '52fe48779251416c9108d6eb', 'de...",16000000
4,5,,0,"[{'id': 28, 'name': 'Action'}, {'id': 53, 'nam...",,tt1380152,ko,마린보이,Marine Boy is the story of a former national s...,1.148070,...,2/5/09,118.0,"[{'iso_639_1': 'ko', 'name': '한국어/조선말'}]",Released,,Marine Boy,,"[{'cast_id': 3, 'character': 'Chun-soo', 'cred...","[{'credit_id': '52fe464b9251416c75073b43', 'de...",3923970
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2995,2996,,0,"[{'id': 35, 'name': 'Comedy'}, {'id': 10749, '...",,tt0109403,en,Chasers,Military men Rock Reilly and Eddie Devane are ...,9.853270,...,4/22/94,102.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,It was supposed to be a routine prisoner trans...,Chasers,"[{'id': 378, 'name': 'prison'}, {'id': 572, 'n...","[{'cast_id': 2, 'character': 'Rock Reilly', 'c...","[{'credit_id': '52fe4494c3a368484e02ac7d', 'de...",1596687
2996,2997,,0,"[{'id': 18, 'name': 'Drama'}, {'id': 10402, 'n...",,tt2364975,sv,Vi är bäst!,Three girls in 1980s Stockholm decide to form ...,3.727996,...,3/28/13,102.0,"[{'iso_639_1': 'sv', 'name': 'svenska'}]",Released,,We Are the Best!,"[{'id': 1192, 'name': 'sweden'}, {'id': 4470, ...","[{'cast_id': 5, 'character': 'Bobo', 'credit_i...","[{'credit_id': '5716b72ac3a3686678012c84', 'de...",180590
2997,2998,,65000000,"[{'id': 80, 'name': 'Crime'}, {'id': 28, 'name...",,tt0116908,en,The Long Kiss Goodnight,"Samantha Caine, suburban homemaker, is the ide...",14.482345,...,10/11/96,120.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,What's forgotten is not always gone.,The Long Kiss Goodnight,"[{'id': 441, 'name': 'assassination'}, {'id': ...","[{'cast_id': 10, 'character': 'Samantha Caine ...","[{'credit_id': '52fe443a9251416c7502d579', 'de...",89456761
2998,2999,,42000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 10749, '...",http://www.alongcamepolly.com/,tt0343135,en,Along Came Polly,Reuben Feffer is a guy who's spent his entire ...,15.725542,...,1/16/04,90.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,"For the most cautious man on Earth, life is ab...",Along Came Polly,"[{'id': 966, 'name': 'beach'}, {'id': 2676, 'n...","[{'cast_id': 8, 'character': 'Reuben Feffer', ...","[{'credit_id': '556f817b9251410866000a63', 'de...",171963386


### see above the rows in the data set is 2993 means no 7 rows of NaN included

In [47]:
#now we will see how the genres columns look like
new_df.loc[0,'genres']

#again this is a string so first we convert it into list

"[{'id': 35, 'name': 'Comedy'}]"

In [48]:
eval(new_df.loc[0,'genres']) #we have converted the first row of genres into list 

[{'id': 35, 'name': 'Comedy'}]

In [51]:
# converting the second row of genres into list
eval(new_df.loc[1,'genres']) #again using eval function we have converted the string into list

#one more thing we can observe that there may be more than one genres for a movie. see below the movie at row 1
# has four different genres

[{'id': 35, 'name': 'Comedy'},
 {'id': 18, 'name': 'Drama'},
 {'id': 10751, 'name': 'Family'},
 {'id': 10749, 'name': 'Romance'}]

In [52]:
# now we need to iterate through all of the rows to convert the string into the list 
# to iterate through all rows we have iterrows() function that easily iteratyes through all rows

for index,row in new_df[:5].iterrows(): # we need to use index,row mandatory with iterrows() method
    print(row['genres'])

#here suppose for initial five rows we are iterating    

[{'id': 35, 'name': 'Comedy'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}, {'id': 10751, 'name': 'Family'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 53, 'name': 'Thriller'}, {'id': 18, 'name': 'Drama'}]
[{'id': 28, 'name': 'Action'}, {'id': 53, 'name': 'Thriller'}]


In [53]:
# the obove 5 rows we see are actually string ,we need to conver them into list using eval ()

for index,row in new_df[:5].iterrows():
    genres=eval(row['genres'])
    print(genres)

[{'id': 35, 'name': 'Comedy'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}, {'id': 10751, 'name': 'Family'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 18, 'name': 'Drama'}]
[{'id': 53, 'name': 'Thriller'}, {'id': 18, 'name': 'Drama'}]
[{'id': 28, 'name': 'Action'}, {'id': 53, 'name': 'Thriller'}]


In [54]:
type(genres) #now we have converted the first five rows into list ,that was initially genres

list

In [56]:
# the same thing we have to do for all the rows of genres
genres=[]
gen=[]
for index,row in new_df.iterrows():
    genres=eval(row['genres'])
    #print(genres) #the genres is the list of dictionaries, we need to access the key 'name' from each dictionary
   
    for genre in genres:# the variable genre is dict type 
            gen.append(genre['name'])
print(gen)    #this will print the list of different genre names    

['Comedy', 'Comedy', 'Drama', 'Family', 'Romance', 'Drama', 'Thriller', 'Drama', 'Action', 'Thriller', 'Animation', 'Adventure', 'Family', 'Horror', 'Thriller', 'Documentary', 'Action', 'Comedy', 'Music', 'Family', 'Adventure', 'Comedy', 'Music', 'Drama', 'Comedy', 'Drama', 'Comedy', 'Crime', 'Action', 'Thriller', 'Science Fiction', 'Mystery', 'Action', 'Crime', 'Drama', 'Horror', 'Thriller', 'Drama', 'Romance', 'Comedy', 'Romance', 'Action', 'Thriller', 'Crime', 'Adventure', 'Family', 'Science Fiction', 'Horror', 'Thriller', 'Thriller', 'Horror', 'Thriller', 'Mystery', 'Foreign', 'Horror', 'Comedy', 'Comedy', 'Horror', 'Mystery', 'Thriller', 'Crime', 'Drama', 'Mystery', 'Thriller', 'Drama', 'Comedy', 'Romance', 'Animation', 'Action', 'Adventure', 'Crime', 'Thriller', 'Drama', 'Comedy', 'Mystery', 'Drama', 'Thriller', 'Fantasy', 'Action', 'Adventure', 'Horror', 'Action', 'Comedy', 'Crime', 'Thriller', 'Action', 'Crime', 'Thriller', 'Comedy', 'Romance', 'Action', 'Drama', 'Science Ficti

In [58]:
#now we want to find the unique geners

unique_genres=list(set(gen)) #these are unique genres)

unique_genres #list of unique genres

['Foreign',
 'War',
 'History',
 'Science Fiction',
 'Action',
 'Fantasy',
 'Documentary',
 'Thriller',
 'Horror',
 'Drama',
 'Western',
 'Romance',
 'Crime',
 'Comedy',
 'Adventure',
 'Music',
 'Family',
 'Mystery',
 'Animation',
 'TV Movie']

# Q.4Make a table of all the producers and directors of each movie. Find the top 3 producers who have produced movies with the highest average RoI? 

In [59]:
df1.head()

Unnamed: 0,original_title,cast,crew,budget,revenue,genres,profit,roi,original_language
0,Hot Tub Time Machine 2,"[{'cast_id': 4, 'character': 'Lou', 'credit_id...","[{'credit_id': '59ac067c92514107af02c8c8', 'de...",14000000,12314651,"[{'id': 35, 'name': 'Comedy'}]",-1685349,-12.038207,en
1,The Princess Diaries 2: Royal Engagement,"[{'cast_id': 1, 'character': 'Mia Thermopolis'...","[{'credit_id': '52fe43fe9251416c7502563d', 'de...",40000000,95149435,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",55149435,137.873588,en
2,Whiplash,"[{'cast_id': 5, 'character': 'Andrew Neimann',...","[{'credit_id': '54d5356ec3a3683ba0000039', 'de...",3300000,13092000,"[{'id': 18, 'name': 'Drama'}]",9792000,296.727273,en
3,Kahaani,"[{'cast_id': 1, 'character': 'Vidya Bagchi', '...","[{'credit_id': '52fe48779251416c9108d6eb', 'de...",1200000,16000000,"[{'id': 53, 'name': 'Thriller'}, {'id': 18, 'n...",14800000,1233.333333,hi
4,마린보이,"[{'cast_id': 3, 'character': 'Chun-soo', 'cred...","[{'credit_id': '52fe464b9251416c75073b43', 'de...",16450000,3923970,"[{'id': 28, 'name': 'Action'}, {'id': 53, 'nam...",3923970,23.853921,ko


In [60]:
df1.isnull().sum() # there are 16 null value we need to not consider

original_title        0
cast                 13
crew                 16
budget                0
revenue               0
genres                7
profit                0
roi                   0
original_language     0
dtype: int64

In [61]:
df1[df1['crew'].isna()] # these are the rows having NaN in crew so we take the complement using tilda ~

Unnamed: 0,original_title,cast,crew,budget,revenue,genres,profit,roi,original_language
470,"The Book of Mormon Movie, Volume 1: The Journey","[{'cast_id': 1, 'character': 'Sam', 'credit_id...",,2000000,1672730,,-327270,-16.3635,en
518,Wonder Woman,,,149000000,820580447,"[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...",671580447,450.725132,en
680,The Day After Tomorrow,,,125000000,544272402,"[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...",419272402,335.417922,en
906,The Dark Knight Rises,,,250000000,1084939099,"[{'id': 28, 'name': 'Action'}, {'id': 80, 'nam...",834939099,333.97564,en
934,John Wick: Chapter 2,,,40000000,171539887,"[{'id': 53, 'name': 'Thriller'}, {'id': 28, 'n...",131539887,328.849717,en
1303,Mr. Smith Goes to Washington,,,1500000,9600000,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",8100000,540.0,en
1617,The Assassination of Richard Nixon,,,16450000,3537961,"[{'id': 18, 'name': 'Drama'}, {'id': 36, 'name...",3537961,21.507362,en
1783,Logan,,,97000000,616801808,"[{'id': 28, 'name': 'Action'}, {'id': 18, 'nam...",519801808,535.878153,en
2014,The Wolf of Wall Street,,,100000000,392000694,"[{'id': 80, 'name': 'Crime'}, {'id': 18, 'name...",292000694,292.000694,en
2302,Happy Weekend,"[{'cast_id': 0, 'character': 'Joachim Krippo',...",,16450000,65335,"[{'id': 35, 'name': 'Comedy'}]",65335,0.397173,de


In [62]:
crew=df1[~df1['crew'].isna()] #these are the non null values for crew

In [63]:
crew

Unnamed: 0,original_title,cast,crew,budget,revenue,genres,profit,roi,original_language
0,Hot Tub Time Machine 2,"[{'cast_id': 4, 'character': 'Lou', 'credit_id...","[{'credit_id': '59ac067c92514107af02c8c8', 'de...",14000000,12314651,"[{'id': 35, 'name': 'Comedy'}]",-1685349,-12.038207,en
1,The Princess Diaries 2: Royal Engagement,"[{'cast_id': 1, 'character': 'Mia Thermopolis'...","[{'credit_id': '52fe43fe9251416c7502563d', 'de...",40000000,95149435,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",55149435,137.873588,en
2,Whiplash,"[{'cast_id': 5, 'character': 'Andrew Neimann',...","[{'credit_id': '54d5356ec3a3683ba0000039', 'de...",3300000,13092000,"[{'id': 18, 'name': 'Drama'}]",9792000,296.727273,en
3,Kahaani,"[{'cast_id': 1, 'character': 'Vidya Bagchi', '...","[{'credit_id': '52fe48779251416c9108d6eb', 'de...",1200000,16000000,"[{'id': 53, 'name': 'Thriller'}, {'id': 18, 'n...",14800000,1233.333333,hi
4,마린보이,"[{'cast_id': 3, 'character': 'Chun-soo', 'cred...","[{'credit_id': '52fe464b9251416c75073b43', 'de...",16450000,3923970,"[{'id': 28, 'name': 'Action'}, {'id': 53, 'nam...",3923970,23.853921,ko
...,...,...,...,...,...,...,...,...,...
2995,Chasers,"[{'cast_id': 2, 'character': 'Rock Reilly', 'c...","[{'credit_id': '52fe4494c3a368484e02ac7d', 'de...",16450000,1596687,"[{'id': 35, 'name': 'Comedy'}, {'id': 10749, '...",1596687,9.706304,en
2996,Vi är bäst!,"[{'cast_id': 5, 'character': 'Bobo', 'credit_i...","[{'credit_id': '5716b72ac3a3686678012c84', 'de...",16450000,180590,"[{'id': 18, 'name': 'Drama'}, {'id': 10402, 'n...",180590,1.097812,sv
2997,The Long Kiss Goodnight,"[{'cast_id': 10, 'character': 'Samantha Caine ...","[{'credit_id': '52fe443a9251416c7502d579', 'de...",65000000,89456761,"[{'id': 80, 'name': 'Crime'}, {'id': 28, 'name...",24456761,37.625786,en
2998,Along Came Polly,"[{'cast_id': 8, 'character': 'Reuben Feffer', ...","[{'credit_id': '556f817b9251410866000a63', 'de...",42000000,171963386,"[{'id': 35, 'name': 'Comedy'}, {'id': 10749, '...",129963386,309.436633,en


In [64]:
list_producer=[]
list_director=[]
list_movie=[]
for index,row in crew.iterrows():
    #list_movie.append(row['original_title'])
    #list_movie.append(row['roi'])
    prod=eval(row['crew'])
    for i in prod:
        if i['job']=='Producer':
            list_producer.append(i['name'])
            list_movie.append([row['original_title'],i['name'],row['roi']])
            
            
    
           
   

In [65]:
list_movie[1] # this is the list of movies with producer and roi

['The Princess Diaries 2: Royal Engagement', 'Whitney Houston', 137.8735875]

In [66]:
# we will create a dataframe

movie_producers=pd.DataFrame(list_movie,columns=['movie_title','producers','roi'])

In [67]:
movie_producers

Unnamed: 0,movie_title,producers,roi
0,Hot Tub Time Machine 2,Andrew Panay,-12.038207
1,The Princess Diaries 2: Royal Engagement,Whitney Houston,137.873588
2,The Princess Diaries 2: Royal Engagement,Mario Iscovich,137.873588
3,The Princess Diaries 2: Royal Engagement,Debra Martin Chase,137.873588
4,Whiplash,David Lancaster,296.727273
...,...,...,...
6006,Abduction,Doug Davison,134.534729
6007,Abduction,Roy Lee,134.534729
6008,Abduction,Ellen Goldsmith-Vein,134.534729
6009,Abduction,Dan Lautner,134.534729


# Find the top 3 producers who have produced movies with the highest average RoI?

In [68]:
# find the average roi by producers

movie_producers.groupby('producers')['roi'].mean()

producers
50 Cent                -84.582270
A. M. Rathnam           64.130435
A. V. M. Saravanan     257.723458
A.J. Dix               269.986308
Aamir Khan            1631.688963
                         ...     
Zhang Weiping            1.395143
Zhongjun Wang            0.001891
Ïû•ÏõêÏÑù              512.500000
√Ålvaro August√≠n      228.958895
√á. Duygu Bing√∂l      368.541184
Name: roi, Length: 3546, dtype: float64

In [69]:
#we want top three so we will sort it in descending order # reset_index is used to convert it into dataframe

prd_hi_roi=movie_producers.groupby('producers')['roi'].mean().reset_index().sort_values(by='roi',ascending=False).head(3)

In [70]:
prd_hi_roi.reset_index()

Unnamed: 0,index,producers,roi
0,146,Amir Zbeda,1288939.0
1,2878,Robin Cowie,413233.3
2,1185,Gregg Hale,413233.3


In [71]:
# the movie with director

for index,row in crew.iterrows():
   
    prod=eval(row['crew'])
    for i in prod:
        if i['job']=='Director':
            
            list_movie.append([row['original_title'],i['name'],row['roi']])
            

In [72]:
movie_directors=pd.DataFrame(list_movie,columns=['movie_title','directors','roi'])

In [73]:
movie_directors

Unnamed: 0,movie_title,directors,roi
0,Hot Tub Time Machine 2,Andrew Panay,-12.038207
1,The Princess Diaries 2: Royal Engagement,Whitney Houston,137.873588
2,The Princess Diaries 2: Royal Engagement,Mario Iscovich,137.873588
3,The Princess Diaries 2: Royal Engagement,Debra Martin Chase,137.873588
4,Whiplash,David Lancaster,296.727273
...,...,...,...
9231,Chasers,Dennis Hopper,9.706304
9232,Vi är bäst!,Lukas Moodysson,1.097812
9233,The Long Kiss Goodnight,Renny Harlin,37.625786
9234,Along Came Polly,John Hamburg,309.436633


In [74]:
dir_hi_roi=movie_directors.groupby('directors')['roi'].mean().reset_index().sort_values(by='roi',ascending=False).head(3)

In [75]:
dir_hi_roi

Unnamed: 0,directors,roi
202,Amir Zbeda,1288939.0
4129,Robin Cowie,413233.3
1006,Daniel Myrick,413233.3


In [76]:
movie_producers['directors']=movie_directors['directors']

In [77]:
movie_producers

Unnamed: 0,movie_title,producers,roi,directors
0,Hot Tub Time Machine 2,Andrew Panay,-12.038207,Andrew Panay
1,The Princess Diaries 2: Royal Engagement,Whitney Houston,137.873588,Whitney Houston
2,The Princess Diaries 2: Royal Engagement,Mario Iscovich,137.873588,Mario Iscovich
3,The Princess Diaries 2: Royal Engagement,Debra Martin Chase,137.873588,Debra Martin Chase
4,Whiplash,David Lancaster,296.727273,David Lancaster
...,...,...,...,...
6006,Abduction,Doug Davison,134.534729,Doug Davison
6007,Abduction,Roy Lee,134.534729,Roy Lee
6008,Abduction,Ellen Goldsmith-Vein,134.534729,Ellen Goldsmith-Vein
6009,Abduction,Dan Lautner,134.534729,Dan Lautner


In [78]:
table_pro_dir=movie_producers[['movie_title','producers','directors','roi']]

In [79]:
table_pro_dir # is the final table containing all info movie wise

Unnamed: 0,movie_title,producers,directors,roi
0,Hot Tub Time Machine 2,Andrew Panay,Andrew Panay,-12.038207
1,The Princess Diaries 2: Royal Engagement,Whitney Houston,Whitney Houston,137.873588
2,The Princess Diaries 2: Royal Engagement,Mario Iscovich,Mario Iscovich,137.873588
3,The Princess Diaries 2: Royal Engagement,Debra Martin Chase,Debra Martin Chase,137.873588
4,Whiplash,David Lancaster,David Lancaster,296.727273
...,...,...,...,...
6006,Abduction,Doug Davison,Doug Davison,134.534729
6007,Abduction,Roy Lee,Roy Lee,134.534729
6008,Abduction,Ellen Goldsmith-Vein,Ellen Goldsmith-Vein,134.534729
6009,Abduction,Dan Lautner,Dan Lautner,134.534729


In [80]:
#again to find highest roi
dir1_hi_roi=table_pro_dir.groupby('directors')['roi'].mean().reset_index().sort_values(by='roi',ascending=False).head(3)

In [81]:
dir1_hi_roi

Unnamed: 0,directors,roi
146,Amir Zbeda,1288939.0
2878,Robin Cowie,413233.3
1185,Gregg Hale,413233.3


In [82]:
prdu1_hi_roi=table_pro_dir.groupby('producers')['roi'].mean().reset_index().sort_values(by='roi',ascending=False).head(3)

In [83]:
prdu1_hi_roi

Unnamed: 0,producers,roi
146,Amir Zbeda,1288939.0
2878,Robin Cowie,413233.3
1185,Gregg Hale,413233.3


# Q.5 Which actor has acted in the most number of movies? Deep dive into the movies, genres and profits corresponding to this actor. 

In [150]:
df1.head(2)

Unnamed: 0,original_title,cast,crew,budget,revenue,genres,profit,roi,original_language
0,Hot Tub Time Machine 2,"[{'cast_id': 4, 'character': 'Lou', 'credit_id...","[{'credit_id': '59ac067c92514107af02c8c8', 'de...",14000000,12314651,"[{'id': 35, 'name': 'Comedy'}]",-1685349,-12.038207,en
1,The Princess Diaries 2: Royal Engagement,"[{'cast_id': 1, 'character': 'Mia Thermopolis'...","[{'credit_id': '52fe43fe9251416c7502563d', 'de...",40000000,95149435,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",55149435,137.873588,en


In [151]:
#first we will seeis there any null value in cast
df1[df1['cast'].isna()]

Unnamed: 0,original_title,cast,crew,budget,revenue,genres,profit,roi,original_language
518,Wonder Woman,,,149000000,820580447,"[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...",671580447,450.725132,en
680,The Day After Tomorrow,,,125000000,544272402,"[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...",419272402,335.417922,en
906,The Dark Knight Rises,,,250000000,1084939099,"[{'id': 28, 'name': 'Action'}, {'id': 80, 'nam...",834939099,333.97564,en
934,John Wick: Chapter 2,,,40000000,171539887,"[{'id': 53, 'name': 'Thriller'}, {'id': 28, 'n...",131539887,328.849717,en
1303,Mr. Smith Goes to Washington,,,1500000,9600000,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",8100000,540.0,en
1617,The Assassination of Richard Nixon,,,16450000,3537961,"[{'id': 18, 'name': 'Drama'}, {'id': 36, 'name...",3537961,21.507362,en
1783,Logan,,,97000000,616801808,"[{'id': 28, 'name': 'Action'}, {'id': 18, 'nam...",519801808,535.878153,en
2014,The Wolf of Wall Street,,,100000000,392000694,"[{'id': 80, 'name': 'Crime'}, {'id': 18, 'name...",292000694,292.000694,en
2448,15 Minutes,,,60000000,56359980,"[{'id': 28, 'name': 'Action'}, {'id': 80, 'nam...",-3640020,-6.0667,en
2518,You Don't Mess with the Zohan,,,90000000,201596308,"[{'id': 35, 'name': 'Comedy'}, {'id': 28, 'nam...",111596308,123.995898,en


In [152]:
# then we will take compliment of it using tilda and save it into new df2

df2=df1[~df1['cast'].isna()]
df2

Unnamed: 0,original_title,cast,crew,budget,revenue,genres,profit,roi,original_language
0,Hot Tub Time Machine 2,"[{'cast_id': 4, 'character': 'Lou', 'credit_id...","[{'credit_id': '59ac067c92514107af02c8c8', 'de...",14000000,12314651,"[{'id': 35, 'name': 'Comedy'}]",-1685349,-12.038207,en
1,The Princess Diaries 2: Royal Engagement,"[{'cast_id': 1, 'character': 'Mia Thermopolis'...","[{'credit_id': '52fe43fe9251416c7502563d', 'de...",40000000,95149435,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",55149435,137.873588,en
2,Whiplash,"[{'cast_id': 5, 'character': 'Andrew Neimann',...","[{'credit_id': '54d5356ec3a3683ba0000039', 'de...",3300000,13092000,"[{'id': 18, 'name': 'Drama'}]",9792000,296.727273,en
3,Kahaani,"[{'cast_id': 1, 'character': 'Vidya Bagchi', '...","[{'credit_id': '52fe48779251416c9108d6eb', 'de...",1200000,16000000,"[{'id': 53, 'name': 'Thriller'}, {'id': 18, 'n...",14800000,1233.333333,hi
4,마린보이,"[{'cast_id': 3, 'character': 'Chun-soo', 'cred...","[{'credit_id': '52fe464b9251416c75073b43', 'de...",16450000,3923970,"[{'id': 28, 'name': 'Action'}, {'id': 53, 'nam...",3923970,23.853921,ko
...,...,...,...,...,...,...,...,...,...
2995,Chasers,"[{'cast_id': 2, 'character': 'Rock Reilly', 'c...","[{'credit_id': '52fe4494c3a368484e02ac7d', 'de...",16450000,1596687,"[{'id': 35, 'name': 'Comedy'}, {'id': 10749, '...",1596687,9.706304,en
2996,Vi är bäst!,"[{'cast_id': 5, 'character': 'Bobo', 'credit_i...","[{'credit_id': '5716b72ac3a3686678012c84', 'de...",16450000,180590,"[{'id': 18, 'name': 'Drama'}, {'id': 10402, 'n...",180590,1.097812,sv
2997,The Long Kiss Goodnight,"[{'cast_id': 10, 'character': 'Samantha Caine ...","[{'credit_id': '52fe443a9251416c7502d579', 'de...",65000000,89456761,"[{'id': 80, 'name': 'Crime'}, {'id': 28, 'name...",24456761,37.625786,en
2998,Along Came Polly,"[{'cast_id': 8, 'character': 'Reuben Feffer', ...","[{'credit_id': '556f817b9251410866000a63', 'de...",42000000,171963386,"[{'id': 35, 'name': 'Comedy'}, {'id': 10749, '...",129963386,309.436633,en


In [153]:
df2.loc[0,'cast']

"[{'cast_id': 4, 'character': 'Lou', 'credit_id': '52fe4ee7c3a36847f82afae7', 'gender': 2, 'id': 52997, 'name': 'Rob Corddry', 'order': 0, 'profile_path': '/k2zJL0V1nEZuFT08xUdOd3ucfXz.jpg'}, {'cast_id': 5, 'character': 'Nick', 'credit_id': '52fe4ee7c3a36847f82afaeb', 'gender': 2, 'id': 64342, 'name': 'Craig Robinson', 'order': 1, 'profile_path': '/tVaRMkJXOEVhYxtnnFuhqW0Rjzz.jpg'}, {'cast_id': 6, 'character': 'Jacob', 'credit_id': '52fe4ee7c3a36847f82afaef', 'gender': 2, 'id': 54729, 'name': 'Clark Duke', 'order': 2, 'profile_path': '/oNzK0umwm5Wn0wyEbOy6TVJCSBn.jpg'}, {'cast_id': 7, 'character': 'Adam Jr.', 'credit_id': '52fe4ee7c3a36847f82afaf3', 'gender': 2, 'id': 36801, 'name': 'Adam Scott', 'order': 3, 'profile_path': '/5gb65xz8bzd42yjMAl4zwo4cvKw.jpg'}, {'cast_id': 8, 'character': 'Hot Tub Repairman', 'credit_id': '52fe4ee7c3a36847f82afaf7', 'gender': 2, 'id': 54812, 'name': 'Chevy Chase', 'order': 4, 'profile_path': '/svjpyYtPwtjvRxX9IZnOmOkhDOt.jpg'}, {'cast_id': 9, 'character

In [154]:
eval(df2.loc[0,'cast'])

[{'cast_id': 4,
  'character': 'Lou',
  'credit_id': '52fe4ee7c3a36847f82afae7',
  'gender': 2,
  'id': 52997,
  'name': 'Rob Corddry',
  'order': 0,
  'profile_path': '/k2zJL0V1nEZuFT08xUdOd3ucfXz.jpg'},
 {'cast_id': 5,
  'character': 'Nick',
  'credit_id': '52fe4ee7c3a36847f82afaeb',
  'gender': 2,
  'id': 64342,
  'name': 'Craig Robinson',
  'order': 1,
  'profile_path': '/tVaRMkJXOEVhYxtnnFuhqW0Rjzz.jpg'},
 {'cast_id': 6,
  'character': 'Jacob',
  'credit_id': '52fe4ee7c3a36847f82afaef',
  'gender': 2,
  'id': 54729,
  'name': 'Clark Duke',
  'order': 2,
  'profile_path': '/oNzK0umwm5Wn0wyEbOy6TVJCSBn.jpg'},
 {'cast_id': 7,
  'character': 'Adam Jr.',
  'credit_id': '52fe4ee7c3a36847f82afaf3',
  'gender': 2,
  'id': 36801,
  'name': 'Adam Scott',
  'order': 3,
  'profile_path': '/5gb65xz8bzd42yjMAl4zwo4cvKw.jpg'},
 {'cast_id': 8,
  'character': 'Hot Tub Repairman',
  'credit_id': '52fe4ee7c3a36847f82afaf7',
  'gender': 2,
  'id': 54812,
  'name': 'Chevy Chase',
  'order': 4,
  'prof

In [172]:
# plan is to create a dictionary that will have the name of each actor as a key and the value as number of movie in which he 
#acted

# we will take an empty dictionary first actor_dict={}

actor_dict1={}
# we will iterate through the df2 and access the each row 

for index,row in df2[:5].iterrows():
    cast=eval(df2.loc[index,'cast'])
    for actor in cast:
        actor_dict1[actor['name']]=actor_dict1.get(actor['name'],0)+1 # the .get() method usually returns the value of the key
        #if the key is not present in the dictionary then by default it returns 0. what we are doing here is if the actor name
        # is not present then we are adding the key (actor name and giving a value as +1)
        #second time when the same actor key will come the again it will add +1 that becomes 2 and so on
print(actor_dict1)

{'Rob Corddry': 1, 'Craig Robinson': 1, 'Clark Duke': 1, 'Adam Scott': 1, 'Chevy Chase': 1, 'Gillian Jacobs': 1, 'Bianca Haase': 1, 'Collette Wolfe': 1, 'Kumail Nanjiani': 1, 'Kellee Stewart': 1, 'Josh Heald': 1, 'Gretchen Koerner': 1, 'Lisa Loeb': 1, 'Jessica Williams': 1, 'Bruce Buffer': 1, 'Mariana Paola Vicente': 1, 'Christian Slater': 1, 'Jason Jones': 1, 'Olivia Jordan': 1, 'Christine Bently': 1, 'Stacey Asaro': 1, 'John Cusack': 1, 'Adam Herschman': 1, 'Kisha Sierra': 1, 'Anne Hathaway': 1, 'Julie Andrews': 1, 'H√©ctor Elizondo': 1, 'John Rhys-Davies': 1, 'Heather Matarazzo': 1, 'Chris Pine': 1, 'Callum Blue': 1, 'Larry Miller': 1, 'Raven-Symon√©': 1, 'Kathleen Marshall': 1, 'Caroline Goodall': 1, 'Lorraine Nicholson': 1, 'Shannon Wilcox': 1, 'Greg Lewis': 1, 'Abigail Breslin': 1, 'Paul Vogt': 1, 'Joseph Leo Bwarie': 1, 'Hope Alexander-Willis': 1, 'Rowan Joseph': 1, 'Jeffrey Scott Jensen': 1, 'Miles Teller': 1, 'J.K. Simmons': 1, 'Melissa Benoist': 1, 'Austin Stowell': 1, 'Jayso

In [174]:
#will do it for all the rows 
for index,row in df2.iterrows():
    cast=eval(row['cast'])
    for actor in cast:
        actor_dict[actor['name']]=actor_dict.get(actor['name'],0)+1 # the .get() method usually returns the value of the key
        #if the key is not present in the dictionary then by default it returns 0. what we are doing here is if the actor name
        # is not present then we are adding the key (actor name and giving a value as +1)
        #second time when the same actor key will come the again it will add +1 that becomes 2 and so on
print(actor_dict)

#these are the actors and the value represents the number of movies the actor has acted
# now we want the actor who has acted in the most movies

#will check weather we can sort the dict in ascending order
#google to see how we can sort a dict by values in python

{'Rob Corddry': 5, 'Craig Robinson': 7, 'Clark Duke': 6, 'Adam Scott': 13, 'Chevy Chase': 8, 'Gillian Jacobs': 3, 'Bianca Haase': 1, 'Collette Wolfe': 2, 'Kumail Nanjiani': 3, 'Kellee Stewart': 2, 'Josh Heald': 2, 'Gretchen Koerner': 3, 'Lisa Loeb': 3, 'Jessica Williams': 3, 'Bruce Buffer': 1, 'Mariana Paola Vicente': 1, 'Christian Slater': 9, 'Jason Jones': 3, 'Olivia Jordan': 2, 'Christine Bently': 2, 'Stacey Asaro': 1, 'John Cusack': 17, 'Adam Herschman': 3, 'Kisha Sierra': 1, 'Anne Hathaway': 7, 'Julie Andrews': 7, 'H√©ctor Elizondo': 9, 'John Rhys-Davies': 7, 'Heather Matarazzo': 4, 'Chris Pine': 7, 'Callum Blue': 2, 'Larry Miller': 9, 'Raven-Symon√©': 4, 'Kathleen Marshall': 3, 'Caroline Goodall': 9, 'Lorraine Nicholson': 2, 'Shannon Wilcox': 4, 'Greg Lewis': 3, 'Abigail Breslin': 9, 'Paul Vogt': 2, 'Joseph Leo Bwarie': 3, 'Hope Alexander-Willis': 2, 'Rowan Joseph': 1, 'Jeffrey Scott Jensen': 1, 'Miles Teller': 6, 'J.K. Simmons': 25, 'Melissa Benoist': 2, 'Austin Stowell': 3, 'Ja

In [181]:
sorted_actor_dict={k: v for k, v in sorted(actor_dict.items(), key=lambda item: item[1],reverse=True)}

sorted_actor_dict

#so these are the actors who hace acted in most number of movies

{'Samuel L. Jackson': 30,
 'Robert De Niro': 30,
 'Morgan Freeman': 27,
 'J.K. Simmons': 25,
 'Bruce Willis': 25,
 'Liam Neeson': 25,
 'Susan Sarandon': 25,
 'Bruce McGill': 24,
 'John Turturro': 24,
 'Forest Whitaker': 23,
 'Willem Dafoe': 23,
 'Bill Murray': 22,
 'Owen Wilson': 22,
 'Nicolas Cage': 22,
 'Sylvester Stallone': 21,
 'Jason Statham': 21,
 'Keith David': 21,
 'John Goodman': 21,
 'Mel Gibson': 21,
 'Sigourney Weaver': 21,
 'Frank Welker': 20,
 'Michael Caine': 20,
 'George Clooney': 20,
 'Denzel Washington': 20,
 'Robert Duvall': 20,
 'Ed Harris': 20,
 'Dennis Quaid': 20,
 'Richard Jenkins': 20,
 'Matt Damon': 20,
 'Christopher Plummer': 19,
 'Gene Hackman': 19,
 'Christopher Walken': 19,
 'William H. Macy': 19,
 'James Franco': 19,
 'Jim Broadbent': 19,
 'John C. Reilly': 19,
 'Kevin Bacon': 19,
 'Christian Bale': 19,
 'Alec Baldwin': 19,
 'Allison Janney': 18,
 'Brian Cox': 18,
 'John Leguizamo': 18,
 'Julianne Moore': 18,
 'Robert Downey Jr.': 18,
 'Michael Shannon': 1

# Q.6  Top 3 directors prefer which actors the most? 