# Question 2

Based on existing data, on average, have movies become more or less revenue generating over time based on inflation-adjusted gross? Which directors are featured in top 10 G-rated inflation-adjusted gross movies in the last half century (1972 - 2022)? 

**Methods and Results**

To answer Q2:
use dateime to year to group gross by year; used data frames disney-director.csv, and disney_movie_total_gross.csv. 


In [2]:
import pandas as pd
import altair as alt

In [3]:
# import data, datetime as datetime 
dm_total_gross = pd.read_csv('data/disney_movies_total_gross.csv', parse_dates = ['release_date'])

dm_total_gross.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 579 entries, 0 to 578
Data columns (total 6 columns):
 #   Column                    Non-Null Count  Dtype         
---  ------                    --------------  -----         
 0   movie_title               579 non-null    object        
 1   release_date              579 non-null    datetime64[ns]
 2   genre                     562 non-null    object        
 3   MPAA_rating               523 non-null    object        
 4   total_gross               579 non-null    object        
 5   inflation_adjusted_gross  579 non-null    object        
dtypes: datetime64[ns](1), object(5)
memory usage: 27.3+ KB


In data wrangling, we noticed below that some columns include special characters and/or may not be of the type we need for mathematical calculations. In lieu of dealing with them column by column, we could write a function to do the stripping and type re-assignment for us. 

```
def replace_char(df, col, special_character):
    import pandas as pd
    df_after=df.assign(col=df['col'].str.strip('special_character'))
    
    return df_after
```

```
def coltype_chg(df, col, col_type):
    import pandas as pd
    df_after=df.assign(col=df['col'].astype('col_type'))
    
    return df_after
```

In [4]:
# strip the $ sign from gross, remove ',' & set appropriate datatypes for columns so calculation can be done property 
dm_total_gross = dm_total_gross.assign(total_gross = dm_total_gross['total_gross'].str.strip('$'))
dm_total_gross = dm_total_gross.assign(inflation_adjusted_gross = dm_total_gross['inflation_adjusted_gross'].str.strip('$'))
dm_total_gross = dm_total_gross.assign(total_gross = dm_total_gross['total_gross'].str.replace(',', ''))
dm_total_gross = dm_total_gross.assign(inflation_adjusted_gross = dm_total_gross['inflation_adjusted_gross'].str.replace(',', ''))

dm_total_gross = dm_total_gross.assign(total_gross = dm_total_gross['total_gross'].astype('float'))
dm_total_gross = dm_total_gross.assign(inflation_adjusted_gross = dm_total_gross['inflation_adjusted_gross'].astype('float'))


In [5]:
dm_total_gross.head()

Unnamed: 0,movie_title,release_date,genre,MPAA_rating,total_gross,inflation_adjusted_gross
0,Snow White and the Seven Dwarfs,1937-12-21,Musical,G,184925485.0,5228953000.0
1,Pinocchio,1940-02-09,Adventure,G,84300000.0,2188229000.0
2,Fantasia,1940-11-13,Musical,G,83320000.0,2187091000.0
3,Song of the South,1946-11-12,Adventure,G,65000000.0,1078511000.0
4,Cinderella,1950-02-15,Drama,G,85000000.0,920608700.0


In [6]:
# since we are comparing by year, need to assign a new column with the year of the movie release; set the display for float format so the data is more readable by human(me)
dm_total_gross = dm_total_gross.assign(release_year = dm_total_gross['release_date'].dt.year)
pd.set_option('display.float_format', lambda x: '%.2f' % x)
dm_total_gross.head()

Unnamed: 0,movie_title,release_date,genre,MPAA_rating,total_gross,inflation_adjusted_gross,release_year
0,Snow White and the Seven Dwarfs,1937-12-21,Musical,G,184925485.0,5228953251.0,1937
1,Pinocchio,1940-02-09,Adventure,G,84300000.0,2188229052.0,1940
2,Fantasia,1940-11-13,Musical,G,83320000.0,2187090808.0,1940
3,Song of the South,1946-11-12,Adventure,G,65000000.0,1078510579.0,1946
4,Cinderella,1950-02-15,Drama,G,85000000.0,920608730.0,1950


In [7]:
year_mean = dm_total_gross.groupby('release_year').mean().reset_index()
year_count = pd.DataFrame(dm_total_gross['release_year'].value_counts()).reset_index().rename(columns={'release_year':'movie_count', 'index':'release_year'})

merged = year_mean.merge(year_count,on = 'release_year')
merged.head()

Unnamed: 0,release_year,total_gross,inflation_adjusted_gross,movie_count
0,1937,184925485.0,5228953251.0,1
1,1940,83810000.0,2187659930.0,2
2,1946,65000000.0,1078510579.0,1
3,1950,85000000.0,920608730.0,1
4,1954,28200000.0,528279994.0,1


Below is a plot demostraing the yearly movie gross mean

In [8]:
# plot the yearly movie gross mean
plot_ym = alt.Chart(merged).mark_line().encode(
    x = 'release_year',
    y = 'inflation_adjusted_gross')
plot_ym

Below is a plot demonstrating the yearly movie produced sum

In [9]:

plot_mc = plot_ym = alt.Chart(merged).mark_bar().encode(
    x = 'release_year',
    y = 'movie_count')
plot_mc

Below moving on to address director questions by merging dataframes, as well as to filter and loc. 

In [10]:
#import dataframes
directors = pd.read_csv('data/disney-director.csv')


In [11]:
directors.info()
directors.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 56 entries, 0 to 55
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   name      56 non-null     object
 1   director  56 non-null     object
dtypes: object(2)
memory usage: 1.0+ KB


Unnamed: 0,name,director
0,Snow White and the Seven Dwarfs,David Hand
1,Pinocchio,Ben Sharpsteen
2,Fantasia,full credits
3,Dumbo,Ben Sharpsteen
4,Bambi,David Hand


```{figure} snowwhite.jpeg
---
height: 450px
name: snowwhite
---
Based on this dataset, SnowWhite ranks top by inflation adjusted revenue. 
```

Source of image: https://movies.disney.com/snow-white-and-the-seven-dwarfs

In [12]:
# I filtered for the years I cared about
dm_loc = dm_total_gross[dm_total_gross['release_year']>= 1972]
dm_loc.head()

Unnamed: 0,movie_title,release_date,genre,MPAA_rating,total_gross,inflation_adjusted_gross,release_year
18,The Apple Dumpling Gang,1975-07-01,Comedy,,31916500.00,131246872.00,1975
19,Freaky Friday,1977-01-21,Comedy,,25942000.00,98067733.00,1977
20,The Many Adventures of Winnie the Pooh,1977-03-11,,,0.00,0.00,1977
21,The Rescuers,1977-06-22,Adventure,,48775599.00,159743914.00,1977
22,Herbie Goes to Monte Carlo,1977-06-24,,,28000000.00,105847527.00,1977
...,...,...,...,...,...,...,...
574,The Light Between Oceans,2016-09-02,Drama,PG-13,12545979.00,12545979.00,2016
575,Queen of Katwe,2016-09-23,Drama,PG,8874389.00,8874389.00,2016
576,Doctor Strange,2016-11-04,Adventure,PG-13,232532923.00,232532923.00,2016
577,Moana,2016-11-23,Adventure,PG,246082029.00,246082029.00,2016


In [13]:
merged = dm_loc.merge(directors, left_on = 'movie_title', right_on = 'name')
merged.head()

Unnamed: 0,movie_title,release_date,genre,MPAA_rating,total_gross,inflation_adjusted_gross,release_year,name,director
0,The Many Adventures of Winnie the Pooh,1977-03-11,,,0.0,0.0,1977,The Many Adventures of Winnie the Pooh,Wolfgang Reitherman
1,The Rescuers,1977-06-22,Adventure,,48775599.0,159743914.0,1977,The Rescuers,Wolfgang Reitherman
2,The Fox and the Hound,1981-07-10,Comedy,,43899231.0,133118889.0,1981,The Fox and the Hound,Art Stevens
3,The Black Cauldron,1985-07-24,Adventure,,21288692.0,50553142.0,1985,The Black Cauldron,Ted Berman
4,The Great Mouse Detective,1986-07-02,Adventure,,23605534.0,53637367.0,1986,The Great Mouse Detective,Ron Clements
5,Oliver & Company,1988-11-18,Adventure,G,49576671.0,102254492.0,1988,Oliver & Company,George Scribner
6,The Little Mermaid,1989-11-15,Adventure,G,111543479.0,223726012.0,1989,The Little Mermaid,Ron Clements
7,The Rescuers Down Under,1990-11-16,Adventure,G,27931461.0,55796728.0,1990,The Rescuers Down Under,Mike Gabriel
8,Beauty and the Beast,1991-11-13,Musical,G,218951625.0,363017667.0,1991,Beauty and the Beast,Gary Trousdale
9,Aladdin,1992-11-11,Comedy,G,217350219.0,441969178.0,1992,Aladdin,Ron Clements


In [14]:
# import the custom script
from custom_filter import custom_filter 

# run it on the data
Top10_dir = custom_filter('MPAA_rating','G','inflation_adjusted_gross',merged,10)

Top10_dir

Unnamed: 0,movie_title,release_date,genre,MPAA_rating,total_gross,inflation_adjusted_gross,release_year,name,director
10,The Lion King,1994-06-15,Adventure,G,422780140.0,761640898.0,1994,The Lion King,Roger Allers
9,Aladdin,1992-11-11,Comedy,G,217350219.0,441969178.0,1992,Aladdin,Ron Clements
8,Beauty and the Beast,1991-11-13,Musical,G,218951625.0,363017667.0,1991,Beauty and the Beast,Gary Trousdale
18,Tarzan,1999-06-16,Adventure,G,171091819.0,283900254.0,1999,Tarzan,Chris Buck
13,Pocahontas,1995-06-10,Adventure,G,141579773.0,274370957.0,1995,Pocahontas,Mike Gabriel
15,101 Dalmatians,1996-11-27,Comedy,G,136189294.0,258728898.0,1996,101 Dalmatians,Wolfgang Reitherman
6,The Little Mermaid,1989-11-15,Adventure,G,111543479.0,223726012.0,1989,The Little Mermaid,Ron Clements
17,Mulan,1998-06-19,Adventure,G,120620254.0,216807832.0,1998,Mulan,Barry Cook
14,The Hunchback of Notre Dame,1996-06-21,Adventure,G,100138851.0,190988799.0,1996,The Hunchback of Notre Dame,Gary Trousdale
16,Hercules,1997-06-13,Adventure,G,99112101.0,182029412.0,1997,Hercules,Ron Clements


In [16]:
#formatting

!black custom_filter.py
!black notebooks.ipynb

[1mAll done! ✨ 🍰 ✨[0m
[34m1 file [0mleft unchanged.
Usage: black [OPTIONS] SRC ...
Try 'black -h' for help.

Error: Invalid value for 'SRC ...': Path 'notebooks.ipynb' does not exist.


```{figure} lion-king.webp
---
height: 450px
name: lionking
---
Lion King is another major success of Disney
```

Source of image: https://www.nytimes.com/2019/07/18/movies/disney-lion-king.html

:::{seealso}
Now, since there really isn't any math equations this project needed, but the Jupyter Book demands two math equations... I thought I'd insert two simple equations brought up in this article [The Mathematics of f/stop Aperture Numbers](http://pleasemakeanote.blogspot.com/2010/10/mathematics-of-fstop-aperture-numbers.html), below are the equations re-produced based on this source.
:::

Code as below:

```{math}
:label: stop_cal
  S = \frac {f}{D}
```

```{math}
:label: diameter_cal
  D = \frac {f}{D}
```

where
S = stop number;
f = Focal Length;
D = Aperture Diameter.

To quote an example from the source, "a lens set at a focal length of 70mm and a stop number of 5.6 has an aperture diameter of 12.5mm."

Additionally, "for an aperture to let in twice as much light, its diameter must increase by approximately 41%." As shown in this equation below. 

```{math}
:label: diameter_incr
  {D_0} = \sqrt{2}{D_1} \approx {D_1} + 0.414{D_1}
```