# James Bond Movie Data

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## Hands on! 

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline

Matplotlib is building the font cache; this may take a moment.


Pandas can easily read data stored in different file formats like CSV, JSON, XML or even Excel. Parsing always involves specifying the correct structure, encoding and other details. The `read_csv` method reads CSV files and accepts many parameters.

In [4]:
# james_bond_data = pd.read_excel("james_bond_data.xlsx").convert_dtypes()
# james_bond_data = pd.read_parquet("james_bond_data.parquet").convert_dtypes()

# james_bond_tables = pd.read_html(
#       "https://en.wikipedia.org/wiki/List_of_James_Bond_novels_and_short_stories")
# james_bond_data = james_bond_tables[1].convert_dtypes()



james_bond_data = pd.read_csv("./data/james_bond_data.csv").convert_dtypes()

In [5]:
james_bond_data.head()

Unnamed: 0,Release,Movie,Bond,Bond_Car_MFG,US_Gross,World_Gross,Budget ($ 000s),Film_Length,Avg_User_IMDB,Avg_User_Rtn_Tom,Martinis,Kills_Bond
0,"June, 1962",Dr. No,Sean Connery,Sunbeam,"$16,067,035.00","$59,567,035.00","$1,000.00",110 mins,7.3,7.7,2,4
1,"August, 1963",From Russia with Love,Sean Connery,Bentley,"$24,800,000.00","$78,900,000.00","$2,000.00",115 mins,7.5,8.0,0,11
2,"May, 1964",Goldfinger,Sean Connery,Aston Martin,"$51,100,000.00","$124,900,000.00","$3,000.00",110 mins,7.8,8.4,1,9
3,"September, 1965",Thunderball,Sean Connery,Aston Martin,"$63,600,000.00","$141,200,000.00","$9,000.00",130 mins,7.0,6.8,0,20
4,"November, 1967",You Only Live Twice,Sean Connery,Toyota,"$43,100,000.00","$111,600,000.00","$9,500.00",117 mins,6.9,6.3,1,21


## Creating meaningful columns

In [6]:
new_column_names = {
  "Release": "release_date",
  "Movie": "movie_title",
  "Bond": "bond_actor",
  "Bond_Car_MFG": "car_manufacturer",
  "US_Gross": "income_usa",
  "World_Gross": "income_world",
  "Budget ($ 000s)": "movie_budget",
 "Film_Length": "film_length",
   "Avg_User_IMDB": "imdb",
 "Avg_User_Rtn_Tom": "rotten_tomatoes",
  "Martinis": "martinis_consumed",
  "Kills_Bond": "bond_kills",
 }

data = james_bond_data.rename(columns=new_column_names)

In [7]:
data.columns

Index(['release_date', 'movie_title', 'bond_actor', 'car_manufacturer',
       'income_usa', 'income_world', 'movie_budget', 'film_length', 'imdb',
       'rotten_tomatoes', 'martinis_consumed', 'bond_kills'],
      dtype='object')

## Dealing with missing data

Use .info() method to detect missing data with your data frame

In [9]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27 entries, 0 to 26
Data columns (total 12 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   release_date       27 non-null     string 
 1   movie_title        27 non-null     string 
 2   bond_actor         27 non-null     string 
 3   car_manufacturer   27 non-null     string 
 4   income_usa         27 non-null     string 
 5   income_world       27 non-null     string 
 6   movie_budget       27 non-null     string 
 7   film_length        27 non-null     string 
 8   imdb               26 non-null     Float64
 9   rotten_tomatoes    26 non-null     Float64
 10  martinis_consumed  27 non-null     Int64  
 11  bond_kills         27 non-null     Int64  
dtypes: Float64(2), Int64(2), string(8)
memory usage: 1.9 KB


Get columns with missing data

In [10]:
data.loc[data.isna().any(axis="columns")]

Unnamed: 0,release_date,movie_title,bond_actor,car_manufacturer,income_usa,income_world,movie_budget,film_length,imdb,rotten_tomatoes,martinis_consumed,bond_kills
10,"April, 1977",The Spy Who Loved Me,Roger Moore,Lotus,"$46,800,000.00","$185,400,000.00","$14,000.00",125 mins,,,1,31


In [12]:
data = james_bond_data.rename(columns=new_column_names).combine_first(
 pd.DataFrame({"imdb": {10: 7.1}, "rotten_tomatoes": {10: 6.8}})
)

In [13]:
pd.DataFrame({"imdb": {10: 7.1}, "rotten_tomatoes": {10: 6.8}})

Unnamed: 0,imdb,rotten_tomatoes
10,7.1,6.8


## Handling financial columns

We want to remove $ symbols so the figures are treated as floats

In [14]:
data[["income_usa", "income_world", "movie_budget", "film_length"]].head()

Unnamed: 0,income_usa,income_world,movie_budget,film_length
0,"$16,067,035.00","$59,567,035.00","$1,000.00",110 mins
1,"$24,800,000.00","$78,900,000.00","$2,000.00",115 mins
2,"$51,100,000.00","$124,900,000.00","$3,000.00",110 mins
3,"$63,600,000.00","$141,200,000.00","$9,000.00",130 mins
4,"$43,100,000.00","$111,600,000.00","$9,500.00",117 mins


Replace columns in in income_usa

In [16]:
 data = (
  james_bond_data.rename(columns=new_column_names)
.combine_first(
    pd.DataFrame({"imdb": {10: 7.1}, "rotten_tomatoes": {10: 6.8}})
 )
  .assign(
   income_usa=lambda data: (
       data["income_usa"]
      .replace("[$,]", "", regex=True)
      .astype("Float64")
   ),
  )
 )

In [17]:
data.head()

Unnamed: 0,bond_actor,bond_kills,car_manufacturer,film_length,imdb,income_usa,income_world,martinis_consumed,movie_budget,movie_title,release_date,rotten_tomatoes
0,Sean Connery,4,Sunbeam,110 mins,7.3,16067035.0,"$59,567,035.00",2,"$1,000.00",Dr. No,"June, 1962",7.7
1,Sean Connery,11,Bentley,115 mins,7.5,24800000.0,"$78,900,000.00",0,"$2,000.00",From Russia with Love,"August, 1963",8.0
2,Sean Connery,9,Aston Martin,110 mins,7.8,51100000.0,"$124,900,000.00",1,"$3,000.00",Goldfinger,"May, 1964",8.4
3,Sean Connery,20,Aston Martin,130 mins,7.0,63600000.0,"$141,200,000.00",0,"$9,000.00",Thunderball,"September, 1965",6.8
4,Sean Connery,21,Toyota,117 mins,6.9,43100000.0,"$111,600,000.00",1,"$9,500.00",You Only Live Twice,"November, 1967",6.3


In [19]:
data = (
    james_bond_data.rename(columns=new_column_names)
    .combine_first(
        pd.DataFrame({"imdb": {10: 7.1}, "rotten_tomatoes": {10: 6.8}})
    )
    .assign(
        income_usa=lambda data: (
            data["income_usa"]
            .replace("[$,]", "", regex=True)
            .astype("Float64")
        ),
        income_world=lambda data: (
            data["income_world"]
            .replace("[$,]", "", regex=True)
            .astype("Float64")
        ),
        movie_budget=lambda data: (
            data["movie_budget"]
            .replace("[$,]", "", regex=True)
            .astype("Float64")
        ),
    )
)

In [20]:
data.head()

Unnamed: 0,bond_actor,bond_kills,car_manufacturer,film_length,imdb,income_usa,income_world,martinis_consumed,movie_budget,movie_title,release_date,rotten_tomatoes
0,Sean Connery,4,Sunbeam,110 mins,7.3,16067035.0,59567035.0,2,1000.0,Dr. No,"June, 1962",7.7
1,Sean Connery,11,Bentley,115 mins,7.5,24800000.0,78900000.0,0,2000.0,From Russia with Love,"August, 1963",8.0
2,Sean Connery,9,Aston Martin,110 mins,7.8,51100000.0,124900000.0,1,3000.0,Goldfinger,"May, 1964",8.4
3,Sean Connery,20,Aston Martin,130 mins,7.0,63600000.0,141200000.0,0,9000.0,Thunderball,"September, 1965",6.8
4,Sean Connery,21,Toyota,117 mins,6.9,43100000.0,111600000.0,1,9500.0,You Only Live Twice,"November, 1967",6.3




## Correcting Invalid Data Types

In [27]:
data = (
    james_bond_data.rename(columns=new_column_names)
    .combine_first(
        pd.DataFrame({"imdb": {10: 7.1}, "rotten_tomatoes": {10: 6.8}})
    )
    .assign(
        income_usa=lambda data: (
            data["income_usa"]
            .replace("[$,]", "", regex=True)
            .astype("Float64")
        ),
        income_world=lambda data: (
            data["income_world"]
            .replace("[$,]", "", regex=True)
            .astype("Float64")
        ),
        movie_budget=lambda data: (
            data["movie_budget"]
            .replace("[$,]", "", regex=True)
            .astype("Float64")
        ),
        film_length=lambda data: (
            data["film_length"]
            .str.removesuffix("mins")
            .astype("Int64")
        ),
    )
)


In [23]:
data.head()

Unnamed: 0,bond_actor,bond_kills,car_manufacturer,film_length,imdb,income_usa,income_world,martinis_consumed,movie_budget,movie_title,release_date,rotten_tomatoes
0,Sean Connery,4,Sunbeam,110,7.3,"$16,067,035.00","$59,567,035.00",2,"$1,000.00",Dr. No,"June, 1962",7.7
1,Sean Connery,11,Bentley,115,7.5,"$24,800,000.00","$78,900,000.00",0,"$2,000.00",From Russia with Love,"August, 1963",8.0
2,Sean Connery,9,Aston Martin,110,7.8,"$51,100,000.00","$124,900,000.00",1,"$3,000.00",Goldfinger,"May, 1964",8.4
3,Sean Connery,20,Aston Martin,130,7.0,"$63,600,000.00","$141,200,000.00",0,"$9,000.00",Thunderball,"September, 1965",6.8
4,Sean Connery,21,Toyota,117,6.9,"$43,100,000.00","$111,600,000.00",1,"$9,500.00",You Only Live Twice,"November, 1967",6.3


Changing Dates

In [28]:
data = (
    james_bond_data.rename(columns=new_column_names)
    .combine_first(
        pd.DataFrame({"imdb": {10: 7.1}, "rotten_tomatoes": {10: 6.8}})
    )
    .assign(
        income_usa=lambda data: (
            data["income_usa"]
            .replace("[$,]", "", regex=True)
            .astype("Float64")
        ),
        income_world=lambda data: (
            data["income_world"]
            .replace("[$,]", "", regex=True)
            .astype("Float64")
        ),
        movie_budget=lambda data: (
            data["movie_budget"]
            .replace("[$,]", "", regex=True)
            .astype("Float64")
        ),
        film_length=lambda data: (
            data["film_length"]
            .str.removesuffix("mins")
            .astype("Int64")
        ),
        release_date=lambda data: pd.to_datetime(
            data["release_date"], format="%B, %Y"
        ),
        release_year=lambda data: (
            data["release_date"]
            .dt.year
            .astype("Int64")
        ),
    )
)

In [29]:
data.head()

Unnamed: 0,bond_actor,bond_kills,car_manufacturer,film_length,imdb,income_usa,income_world,martinis_consumed,movie_budget,movie_title,release_date,rotten_tomatoes,release_year
0,Sean Connery,4,Sunbeam,110,7.3,16067035.0,59567035.0,2,1000.0,Dr. No,1962-06-01,7.7,1962
1,Sean Connery,11,Bentley,115,7.5,24800000.0,78900000.0,0,2000.0,From Russia with Love,1963-08-01,8.0,1963
2,Sean Connery,9,Aston Martin,110,7.8,51100000.0,124900000.0,1,3000.0,Goldfinger,1964-05-01,8.4,1964
3,Sean Connery,20,Aston Martin,130,7.0,63600000.0,141200000.0,0,9000.0,Thunderball,1965-09-01,6.8,1965
4,Sean Connery,21,Toyota,117,6.9,43100000.0,111600000.0,1,9500.0,You Only Live Twice,1967-11-01,6.3,1967


<br>

Check if everything is now correct

In [30]:
data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 27 entries, 0 to 26
Data columns (total 13 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   bond_actor         27 non-null     string        
 1   bond_kills         27 non-null     Int64         
 2   car_manufacturer   27 non-null     string        
 3   film_length        27 non-null     Int64         
 4   imdb               27 non-null     Float64       
 5   income_usa         27 non-null     Float64       
 6   income_world       27 non-null     Float64       
 7   martinis_consumed  27 non-null     Int64         
 8   movie_budget       27 non-null     Float64       
 9   movie_title        27 non-null     string        
 10  release_date       27 non-null     datetime64[ns]
 11  rotten_tomatoes    27 non-null     Float64       
 12  release_year       27 non-null     Int64         
dtypes: Float64(5), Int64(4), datetime64[ns](1), string(3)
memory usage: 2.9 

<br>

## Fixing Inconsistences in Data

The movie budget figure is incorrect, you need to multiply by 1000

In [33]:
data = (
    james_bond_data.rename(columns=new_column_names)
    .combine_first(
        pd.DataFrame({"imdb": {10: 7.1}, "rotten_tomatoes": {10: 6.8}})
    )
    .assign(
        income_usa=lambda data: (
            data["income_usa"]
            .replace("[$,]", "", regex=True)
            .astype("Float64")
        ),
        income_world=lambda data: (
            data["income_world"]
            .replace("[$,]", "", regex=True)
            .astype("Float64")
        ),
        movie_budget=lambda data: (
            data["movie_budget"]
            .replace("[$,]", "", regex=True)
            .astype("Float64") * 1000
        ),
        film_length=lambda data: (
            data["film_length"]
            .str.removesuffix("mins")
            .astype("Int64")
        ),
        release_date=lambda data: pd.to_datetime(
            data["release_date"], format="%B, %Y"
        ),
        release_year=lambda data: (
            data["release_date"]
            .dt.year
            .astype("Int64")
        ),
    )
)


<br>

If you check it now it should be correct

In [35]:
data.head()

Unnamed: 0,bond_actor,bond_kills,car_manufacturer,film_length,imdb,income_usa,income_world,martinis_consumed,movie_budget,movie_title,release_date,rotten_tomatoes,release_year
0,Sean Connery,4,Sunbeam,110,7.3,16067035.0,59567035.0,2,1000000.0,Dr. No,1962-06-01,7.7,1962
1,Sean Connery,11,Bentley,115,7.5,24800000.0,78900000.0,0,2000000.0,From Russia with Love,1963-08-01,8.0,1963
2,Sean Connery,9,Aston Martin,110,7.8,51100000.0,124900000.0,1,3000000.0,Goldfinger,1964-05-01,8.4,1964
3,Sean Connery,20,Aston Martin,130,7.0,63600000.0,141200000.0,0,9000000.0,Thunderball,1965-09-01,6.8,1965
4,Sean Connery,21,Toyota,117,6.9,43100000.0,111600000.0,1,9500000.0,You Only Live Twice,1967-11-01,6.3,1967


<br>

## Correcting spelling errors

Look for typos in your data so values are attributed correctly

In [36]:
data["bond_actor"].value_counts()

bond_actor
Roger Moore       7
Sean Connery      5
Daniel Craig      5
Pierce Brosnan    4
Timothy Dalton    3
George Lazenby    1
Shawn Connery     1
Roger MOORE       1
Name: count, dtype: Int64

<br>

Go ahead and fix the names

In [37]:
data = (
    james_bond_data.rename(columns=new_column_names)
    .combine_first(
        pd.DataFrame({"imdb": {10: 7.1}, "rotten_tomatoes": {10: 6.8}})
    )
    .assign(
        income_usa=lambda data: (
            data["income_usa"]
            .replace("[$,]", "", regex=True)
            .astype("Float64")
        ),
        income_world=lambda data: (
            data["income_world"]
            .replace("[$,]", "", regex=True)
            .astype("Float64")
        ),
        movie_budget=lambda data: (
            data["movie_budget"]
            .replace("[$,]", "", regex=True)
            .astype("Float64") * 1000
        ),
        film_length=lambda data: (
            data["film_length"]
            .str.removesuffix("mins")
            .astype("Int64")
        ),
        release_date=lambda data: pd.to_datetime(
            data["release_date"], format="%B, %Y"
        ),
        release_year=lambda data: (
            data["release_date"]
            .dt.year
            .astype("Int64")
        ),
        bond_actor=lambda data: (
            data["bond_actor"]
            .str.replace("Shawn", "Sean")
            .str.replace("MOORE", "Moore")
        ),
    )
)


Check the counts now

In [38]:
data["bond_actor"].value_counts()

bond_actor
Roger Moore       8
Sean Connery      6
Daniel Craig      5
Pierce Brosnan    4
Timothy Dalton    3
George Lazenby    1
Name: count, dtype: Int64

<br>

Check also the car counts

In [40]:
data["car_manufacturer"].value_counts()

car_manufacturer
Aston Martin    8
AMC             3
Rolls Royce     3
Lotus           2
BMW             2
Astin Martin    2
Sunbeam         1
Bentley         1
Toyota          1
Mercury         1
Ford            1
Citroen         1
Bajaj           1
Name: count, dtype: Int64

<br>

Correct the data

In [42]:
data = (
    james_bond_data.rename(columns=new_column_names)
    .combine_first(
        pd.DataFrame({"imdb": {10: 7.1}, "rotten_tomatoes": {10: 6.8}})
    )
    .assign(
        income_usa=lambda data: (
            data["income_usa"]
            .replace("[$,]", "", regex=True)
            .astype("Float64")
        ),
        income_world=lambda data: (
            data["income_world"]
            .replace("[$,]", "", regex=True)
            .astype("Float64")
        ),
        movie_budget=lambda data: (
            data["movie_budget"]
            .replace("[$,]", "", regex=True)
            .astype("Float64") * 1000
        ),
        film_length=lambda data: (
            data["film_length"]
            .str.removesuffix("mins")
            .astype("Int64")
        ),
        release_date=lambda data: pd.to_datetime(
            data["release_date"], format="%B, %Y"
        ),
        release_year=lambda data: (
            data["release_date"]
            .dt.year
            .astype("Int64")
        ),
        bond_actor=lambda data: (
            data["bond_actor"]
            .str.replace("Shawn", "Sean")
            .str.replace("MOORE", "Moore")
        ),
        car_manufacturer=lambda data: (
            data["car_manufacturer"].str.replace("Astin", "Aston")
        ),
    )
)


In [43]:
data["car_manufacturer"].value_counts()

car_manufacturer
Aston Martin    10
AMC              3
Rolls Royce      3
Lotus            2
BMW              2
Sunbeam          1
Bentley          1
Toyota           1
Mercury          1
Ford             1
Citroen          1
Bajaj            1
Name: count, dtype: Int64

<br>

## Check for Invalid operators
Need to verify if data is in the correct numerical range

In [46]:
 data[["film_length", "martinis_consumed"]].describe()

Unnamed: 0,film_length,martinis_consumed
count,27.0,27.0
mean,168.222222,0.62963
std,206.572083,1.547905
min,106.0,-6.0
25%,123.0,0.0
50%,130.0,1.0
75%,133.0,1.0
max,1200.0,3.0


Fix the issues

In [47]:
data = (
    james_bond_data.rename(columns=new_column_names)
    .combine_first(
        pd.DataFrame({"imdb": {10: 7.1}, "rotten_tomatoes": {10: 6.8}})
    )
    .assign(
        income_usa=lambda data: (
            data["income_usa"]
            .replace("[$,]", "", regex=True)
            .astype("Float64")
        ),
        income_world=lambda data: (
            data["income_world"]
            .replace("[$,]", "", regex=True)
            .astype("Float64")
        ),
        movie_budget=lambda data: (
            data["movie_budget"]
            .replace("[$,]", "", regex=True)
            .astype("Float64") * 1000
        ),
        film_length=lambda data: (
            data["film_length"]
            .str.removesuffix("mins")
            .astype("Int64")
            .replace(1200, 120)
        ),
        release_date=lambda data: pd.to_datetime(
            data["release_date"], format="%B, %Y"
        ),
        release_year=lambda data: (
            data["release_date"]
            .dt.year
            .astype("Int64")
        ),
        bond_actor=lambda data: (
            data["bond_actor"]
            .str.replace("Shawn", "Sean")
            .str.replace("MOORE", "Moore")
        ),
        car_manufacturer=lambda data: (
            data["car_manufacturer"].str.replace("Astin", "Aston")
        ),
        martinis_consumed=lambda data: (
            data["martinis_consumed"].replace(-6, 6)
        ),
    )
)


<br> 

Check if the issues are fixed

In [48]:
 data[["film_length", "martinis_consumed"]].describe()

Unnamed: 0,film_length,martinis_consumed
count,27.0,27.0
mean,128.222222,1.074074
std,12.454018,1.268734
min,106.0,0.0
25%,120.5,0.0
50%,128.0,1.0
75%,132.0,1.0
max,163.0,6.0


<br>
## Remove duplicate data

Check if rows of data have been duplicated

In [49]:
data.loc[data.duplicated(keep=False)]

Unnamed: 0,bond_actor,bond_kills,car_manufacturer,film_length,imdb,income_usa,income_world,martinis_consumed,movie_budget,movie_title,release_date,rotten_tomatoes,release_year
8,Roger Moore,1,AMC,125,6.7,21000000.0,97600000.0,0,7000000.0,The Man with the Golden Gun,1974-07-01,5.1,1974
9,Roger Moore,1,AMC,125,6.7,21000000.0,97600000.0,0,7000000.0,The Man with the Golden Gun,1974-07-01,5.1,1974
15,Timothy Dalton,13,Rolls Royce,130,6.7,51185000.0,191200000.0,2,40000000.0,The Living Daylights,1987-05-01,6.3,1987
16,Timothy Dalton,13,Rolls Royce,130,6.7,51185000.0,191200000.0,2,40000000.0,The Living Daylights,1987-05-01,6.3,1987


<br>

Remove the duplicates with `drop_duplicates` method

In [None]:
data = (
    james_bond_data.rename(columns=new_column_names)
    .combine_first(
        pd.DataFrame({"imdb": {10: 7.1}, "rotten_tomatoes": {10: 6.8}})
    )
    .assign(
        income_usa=lambda data: (
            data["income_usa"]
            .replace("[$,]", "", regex=True)
            .astype("Float64")
        ),
        income_world=lambda data: (
            data["income_world"]
            .replace("[$,]", "", regex=True)
            .astype("Float64")
        ),
        movie_budget=lambda data: (
            data["movie_budget"]
            .replace("[$,]", "", regex=True)
            .astype("Float64") * 1000
        ),
        film_length=lambda data: (
            data["film_length"]
            .str.removesuffix("mins")
            .astype("Int64")
            .replace(1200, 120)
        ),
        release_date=lambda data: pd.to_datetime(
            data["release_date"], format="%B, %Y"
        ),
        release_year=lambda data: (
            data["release_date"]
            .dt.year
            .astype("Int64")
        ),
        bond_actor=lambda data: (
            data["bond_actor"]
            .str.replace("Shawn", "Sean")
            .str.replace("MOORE", "Moore")
        ),
        car_manufacturer=lambda data: (
            data["car_manufacturer"].str.replace("Astin", "Aston")
        ),
        martinis_consumed=lambda data: (
            data["martinis_consumed"].replace(-6, 6)
        ),
    )
    .drop_duplicates(ignore_index=True)
)


<br> 

Check if the data hgas been removed

In [53]:
data.loc[data.duplicated(keep=False)]

Unnamed: 0,bond_actor,bond_kills,car_manufacturer,film_length,imdb,income_usa,income_world,martinis_consumed,movie_budget,movie_title,release_date,rotten_tomatoes,release_year


<br>

## Storing Your Cleansed Data

Use the `to_csv` method

In [None]:
data.to_csv("james_bond_data_cleansed.csv", index=False)

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

# Analysis

Lets plot some graphs

In [None]:
data = pd.read_csv("james_bond_data_cleansed.csv").convert_dtypes()

In [None]:
data.head()

Draw a plot to figure out if Rotten Tomatoes are related to IMDB ratings

In [None]:
fig, ax = plt.subplots()
ax.scatter(data["imdb"], data["rotten_tomatoes"])
ax.set_title("Scatter Plot of Ratings")
ax.set_xlabel("Average IMDb Rating")
ax.set_ylabel("Average Rotten Tomatoes Rating")
fig.show()

Check movie length distribution

In [None]:
fig, ax = plt.subplots()
length = data["film_length"].value_counts(bins=7).sort_index()
length.plot.bar(
    ax=ax,
    title="Film Length Distribution",
    xlabel="Time Range (mins)",
    ylabel="Count",
)
fig.show()


In [None]:
 data["film_length"].agg(["min", "max", "mean", "std"])

Check if the number of killed people have an effect on the ratings

In [None]:
fig, ax = plt.subplots()
ax.scatter(data["imdb"], data["bond_kills"])
ax.set_title("Scatter Plot of Kills vs Ratings")
ax.set_xlabel("Average IMDb Rating")
ax.set_ylabel("Kills by Bond")
fig.show()


![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)
