# Question 1

Based on existing data, which section(s) inside Disney is/are generating more/less revenue over the years of 1991-2006? Is there a trend? 

**Methods and Results**

To answer Q1:
uses only disney_revenue_1991-2016.csv, melt the sections so I can do a plot of each section over the years. 


```{figure} disney-brands.png
---
height: 300px
name: disney-brands
---
Here's an image from [Disney website](https://lifebeginsatdisney.weebly.com/family-brands.html) demonstrating its various brands.
```

In [1]:
import pandas as pd
import altair as alt


In [9]:
# import potentially useful files
revenues = pd.read_csv('data/disney_revenue_1991-2016.csv')
revenues.head(5)

Unnamed: 0,Year,Studio Entertainment[NI 1],Disney Consumer Products[NI 2],Disney Interactive[NI 3][Rev 1],Walt Disney Parks and Resorts,Disney Media Networks,Total
0,1991,2593.0,724.0,,2794.0,,6111
1,1992,3115.0,1081.0,,3306.0,,7502
2,1993,3673.4,1415.1,,3440.7,,8529
3,1994,4793.0,1798.2,,3463.6,359.0,10414
4,1995,6001.5,2150.0,,3959.8,414.0,12525


```{figure} Disney_Interactive_Logo.png 
---
height: 300px
name: disney-interactive
---
Disney Interactive
```

Figure 4 above is a logo of Disney Interactive. Look closer at {numref}`disney-brands` you'll notice a Disney Interactive Studio, which appears to be part of Disney Interactive over times. 

In [3]:
# drop Disney Interactive as a category for comparison, due to more than half of NaN values in the column
# also drop other years with NaN values 

revenues1 = revenues.drop(columns = 'Disney Interactive[NI 3][Rev 1]')
revenues1 = revenues1.dropna()
revenues1.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 21 entries, 3 to 24
Data columns (total 6 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   Year                            21 non-null     int64  
 1   Studio Entertainment[NI 1]      21 non-null     float64
 2   Disney Consumer Products[NI 2]  21 non-null     float64
 3   Walt Disney Parks and Resorts   21 non-null     float64
 4   Disney Media Networks           21 non-null     object 
 5   Total                           21 non-null     int64  
dtypes: float64(3), int64(2), object(1)
memory usage: 1.1+ KB


Just as a note, my approach to NaN values here is to drop them. As we've learned in the python class, alternatives to dropping NaN values is to use the values from preceding or following years. 

To cite [this source](https://medium.com/@dkatzman_3920/how-to-deal-with-missing-or-na-values-in-the-dataset-7d8f1693450d) {cite:p}`Dav20`: "it often times may make more sense instead to use more mathematical ways of imputing missing values. For instance, it may make sense to impute some sort of summary statistic for that variable, whether it be the mean, median, or mode for a continuous variable, or the mode for a categorical variable."

Hence, the code cells below show what code we can use, if we there to adopt these approaches.


```
revenues_bfill = revenues.fillna(method='bfill'
```

```
revenues_ffill = revenues.fillna(method='ffill'
```

In [4]:
revenues1.agg(['max','min','median'])

Unnamed: 0,Year,Studio Entertainment[NI 1],Disney Consumer Products[NI 2],Walt Disney Parks and Resorts,Disney Media Networks,Total
max,2015.0,8713.0,4499.0,16162.0,9733.0,52465.0
min,1994.0,4793.0,1798.2,3463.6,10941.0,10414.0
median,2005.0,6849.0,2590.0,9023.0,13207.0,31944.0


In [5]:
#renamed column for plotting as special character is causing the plot column to be unreadable earlier
revenues1 = revenues1.rename(columns = {'Studio Entertainment[NI 1]':'Studio EntertainmentNI 1', 'Disney Consumer Products[NI 2]':'Disney Consumer ProductsNI 2'})

#change dtype for Disney Media Networks to float
revenues1 = revenues1.rename(columns = {'Disney Media Networks':'Disney_Media_Networks'})
revenues1 = revenues1.assign(Disney_Media_Networks = revenues1['Disney_Media_Networks'].astype('float'))

                        

In [6]:
#melt revenues section columns so they can be plotted on the same chart for visual comparison 
melted_revenues1 = (revenues1.melt(id_vars = ['Year'],
        value_vars=['Studio EntertainmentNI 1', 'Disney Consumer ProductsNI 2', 'Walt Disney Parks and Resorts', 'Disney_Media_Networks'],
        var_name = 'Sections',
        value_name = 'value'))
melted_revenues1



Unnamed: 0,Year,Sections,value
0,1994,Studio EntertainmentNI 1,4793.0
1,1995,Studio EntertainmentNI 1,6001.5
2,1997,Studio EntertainmentNI 1,6981.0
3,1998,Studio EntertainmentNI 1,6849.0
4,1999,Studio EntertainmentNI 1,6548.0
...,...,...,...
79,2011,Disney_Media_Networks,18714.0
80,2012,Disney_Media_Networks,19436.0
81,2013,Disney_Media_Networks,20356.0
82,2014,Disney_Media_Networks,21152.0


Below is a plot generalized to show melted revenues changes over time.

In [7]:
plotM = alt.Chart(melted_revenues1).mark_line().encode(
    x = 'Year:Q',
    y = 'value:Q',
    color = 'Sections:N'
)
plotM

In [14]:
# A bar chart showing averages for these years
barMean = melted_revenues1.groupby('Sections').mean('value').reset_index()
barMean

Unnamed: 0,Sections,Year,value
0,Disney Consumer ProductsNI 2,2004.904762,2807.866667
1,Disney_Media_Networks,2004.904762,12778.857143
2,Studio EntertainmentNI 1,2004.904762,6776.357143
3,Walt Disney Parks and Resorts,2004.904762,9062.447619


In [17]:
alt.Chart(barMean).mark_bar().encode(
    x='value',
    y='Sections',
    color = 'Sections'
)

As indicated by the graph, after excluding Interactive section due to lack of data, for years without NaN data, sections that demonstrate steady growth are Media Networkss and Consumer Products. Studio Entertainment, which appears to be where the animated movies fall under (at least for box office, if not re-production of the movie copyright), seems to be more or less stagnant in growth. 


```{note}
Did you pay attention to the differences in revenue and net revenue? Well, I most certainly did not when I was creating this project. Hence, just to clarify, the most revenue generating may not equal to the most 'profiting'. See below a simple equation for explanation from this source(https://www.investopedia.com/terms/r/revenue.asp), {cite:p}`Ada22`

```

```{math}
:label: revenue
  Revenue = ({Quantity~Sold}\times{Unit~Price})
```

```{math}
:label: net-revenue
  Net~Revenue = ({Quantity~Sold}\times{Unit~Price})-{Discounts}-{Allowances}-{Returns}
```