# Interactive Visualization with Plotly

For this lab, you'll need to install Plotly. Make sure to follow *both* the [plotly](https://plotly.com/python/getting-started/#installation) steps and the [jupyter support](https://plotly.com/python/getting-started/#jupyterlab-support) steps.

[Plotly](https://plotly.com/python/) is an interactive visualization package which is as part of the [Plotly and Dash](https://plot.ly) enterprise. Here we'll showcase just a few graphs to get you acquainted with their [Plotly Express](https://plotly.com/python/plotly-express/) module. We'll use [data from the titanic disaster](https://www.kaggle.com/competitions/titanic/data).

In [1]:
pip install plotly


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [2]:
import plotly.express as px
import pandas as pd

In [3]:
df = pd.read_csv('/Users/yuritziavila-robledo/Downloads/titanic.csv')

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          714 non-null    float64
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        204 non-null    object 
 11  Embarked     889 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB


## Scatter Plots

In [5]:
fig = px.scatter(data_frame=df,
                 x='Age',
                 y='Fare',
                 hover_data=['Name', 'Sex'],
                 template='plotly_white',
                 color_discrete_sequence=['#3182bd'],
                 log_y=True)
fig.show()

## Bar Charts

We can create some [interesting bar chart variations](https://plot.ly/python/bar-charts/):

In [6]:
fig = px.bar(df, 
             x='Age', 
             y='Sex',
             barmode='overlay',
             hover_data=['Name'],
             template='plotly_white',
             color_discrete_sequence=px.colors.qualitative.D3
            )
fig.show()

## Histograms

In [7]:
df['Survived'].dtype

dtype('int64')

In [8]:
fig = px.histogram(df, 
                   x='Age', 
                   color='Survived',
                   template='plotly_white',
                   color_discrete_sequence=px.colors.qualitative.D3
                  )

fig.update_layout(
    bargap=0.1, # gap between bars of adjacent location coordinates
)

fig.show()

## Bubble Plot

For this plot, we'll transform the data a bit to investigate the survival rates across different age decades.

In [9]:
# calculate decade
df['Age_rounded'] = df['Age'].round(-1)
df[['Age', 'Age_rounded']].head()

Unnamed: 0,Age,Age_rounded
0,22.0,20.0
1,38.0,40.0
2,26.0,30.0
3,35.0,40.0
4,35.0,40.0


In [10]:
df_plot = df.groupby(['Pclass', 'Age_rounded'])['Survived'] \
            .agg([('Survived', 'sum'), 
                  ('Passengers', 'count')]).reset_index()

df_plot["Pclass"] = df_plot["Pclass"].astype(str)

fig = px.scatter(data_frame=df_plot,
                 x='Age_rounded',
                 y='Survived',
                 size='Passengers',
                 color='Pclass',
                 color_discrete_sequence=px.colors.qualitative.D3,
                 template='plotly_white')
fig.show()

## Limitations ...

What if we want to be able to "animate" the age decade of the passengers? [Be careful](https://plotly.com/python/animations/#:~:text=Animations%20are%20designed%20to%20work%20well%20when%20each%20row%20of%20input%20is%20present%20across%20all%20animation%20frames%2C%20and%20when%20categorical%20values%20mapped%20to%20symbol%2C%20color%20and%20facet%20are%20constant%20across%20frames.%20Animations%20may%20be%20misleading%20or%20inconsistent%20if%20these%20constraints%20are%20not%20met.).

In [11]:
fig = px.histogram(df.sort_values('Age_rounded'), 
                   x='Pclass',
                   color='Survived',
                   template='plotly_white',
                   animation_frame="Age_rounded",  # this is the value to "animate"
                   # animation_group="PassengerId",  # uncomment this ...
                   color_discrete_sequence=px.colors.qualitative.D3)

fig.update_layout(
    xaxis_tickmode = 'array',
    xaxis_tickvals = [1, 2, 3],
    xaxis_ticktext = ['First', 'Second', 'Third'],
    bargap=0.1, # gap between bars of adjacent location coordinates
)

fig["layout"].pop("updatemenus") # drop animation buttons
fig.show()

**(In class, if there's time) Can we improve on this?**

In [12]:
# fig = px.histogram(df.sort_values('Age_rounded'), 
#                    x='Pclass',
#                    facet_row='Survived',
#                    color='Age_rounded',
#                    template='plotly_white',
#                    color_discrete_sequence=px.colors.sequential.Blues)

# fig.update_layout(
#     xaxis_tickmode = 'array',
#     xaxis_tickvals = [1, 2, 3],
#     xaxis_ticktext = ['First', 'Second', 'Third'],
#     bargap=0.1, # gap between bars of adjacent location coordinates
# )

# fig["layout"].pop("updatemenus") # drop animation buttons
# fig.show()

# EXERCISES

## Exercise 1

Take a look at the `Cabin` column of the data, and investigate how it relates to at least one other column. Consider the context of the Titanic ship wreck. Try to formulate a question around this column, and visualize it using Plotly. **Build at least 2 different plots** of the same data.

Feel free to use the [gallery](https://plotly.com/python/) as a resource.

#The cabin column could relate to their survival based on which floor they were on they could have escaped their rooms and gotten to the deck faster than someone who was in a room way at the bottom 

In [13]:
# Plot 1: Survival vs Cabin Class
plot1 = px.histogram(df, 
                    x="Cabin", 
                    color="Survived",
                    title="Survival vs Cabin Class",
                    labels={"Cabin": "Cabin Class", "Survived": "Survival"},
                    category_orders={"Survived": [0, 1]},
                    color_discrete_map={0: "red", 1: "green"})
plot1.update_layout(xaxis=dict(type='category'))

In [14]:
#Plot 2: Fare Price and Cabin Class vs Survival 
plot2 = px.scatter(df, 
                  x="Fare", 
                  y="Survived", 
                  color="Cabin",
                   title="Scatter plot of Fare vs. Survival (Colored by Cabin Class)",
                   labels={"Fare": "Ticket Fare", "Survived": "Survival"},
                   color_discrete_sequence=px.colors.qualitative.Set1,
                   hover_data=["Pclass", "Cabin"])
plot2.update_layout(yaxis=dict(tickmode='linear', tickvals=[0, 1], ticktext=['Not Survived', 'Survived']))


## Exercise 2

Consider the Bubble Plot, above. Try to figure out what it might be trying to communicate.

1. Point out at least three issues with this visualization.
2. Build at least two visualizations in Plotly that communicate a similar message, but which do it far better.

3 Issues with this visualization

1) One issue with the plot is there being too many plots on one area which makes the visualization hard to understand.
   
2) The scale of the visualization is difficult for some columns such as the Age_Rounded column with values of 70 and 80, there is data but it is not visible.
   
3) The third issue with the visualization is the size of the bubbles or the percentages of the passenger data. 

In [15]:
df_plot = df.groupby(['Pclass', 'Age_rounded', 'Survived']).size().reset_index(name='Passengers')

fig = px.bar(df_plot, 
             x='Age_rounded', 
             y='Passengers', 
             color='Survived', 
             facet_col='Pclass', 
             barmode='stack',
             template='plotly_white',
             labels={'Survived': 'Survival Status', 'Age_rounded': 'Age Rounded', 'Passengers': 'Passenger Count'},
             title='Survival Count by Age, Passenger Class, and Survival Status')
fig.show()

In [16]:
fig = px.box(df, x='Pclass', y='Age', color='Survived', 
             labels={'Pclass': 'Passenger Class', 'Age': 'Age', 'Survived': 'Survival Status'},
             title='Distribution of Age by Passenger Class and Survival Status',
             template='plotly_white')
fig.update_traces(marker=dict(size=3))
fig.show()