# Plotly

Plotly Express is the easy-to-use, high-level interface to Plotly, which operates on a variety of types of data and produces easy-to-style figures.

These examples will cover the Express Plotly, used for more basic graphs.  If you want to see more examples in Express Plotly or just Plotly in general, please go to: https://plotly.com/python/.

## Scatterplots

With px.scatter, each data point is represented as a marker point, whose location is given by the x and y columns.

Syntax: 

`px.scatter(df, x = , y = , color = , symbol = , hover_data = , ..., labels = dict(... = "...", ...), title = "...")`

- Color sets the color based on responses in a column
- Size sets the size based on responses in a column
- Symbol markers (https://plotly.com/python/marker-style/) can be mapped to responses in a column
- Add info to hover output using hover_data
- Axis labels and labels for keys can be added as a dictionary, mapping the current name of the variable displayed to the name of the axis, using the `labels` parameter:

e.g. `labels = dict(x = "X Axis Name 1", y = "Y Axis Name 2", color = "Name of Color Variable")`

- You can also change the axis labels using: `fig.update_xaxes(title_text = '...')` and `fig.update_yaxes(title_text = '...')`
- For a title, use: `title = "..."` parameter in scatter function.

In [1]:
import plotly.express as px

iris = px.data.iris() # iris is a pandas DataFrame
fig = px.scatter(iris, x = "sepal_length", y = "petal_length", color = 'species', 
                 size = "petal_width", hover_data = ['petal_width'], 
                 labels = dict(sepal_length = "Sepal Length", petal_length = "Petal Length", species = "Species"), # changes all labels in graph
                title = "Iris Data")
fig.show()

### Coloring

To change the discrete color, add: `color_discrete_sequence = px.colors.qualitative.xxx` where xxx is the name of the below color you want to use or a list of explicitly chosen colors you want to use.

You can also use `color_discrete_map` and create a dictionary explicitly mapping the output to the color you want to use.

In [2]:
fig = px.colors.qualitative.swatches()
fig.show()

In [4]:
import plotly.express as px

iris = px.data.iris() # iris is a pandas DataFrame
fig = px.scatter(iris, x = "sepal_length", y = "petal_length", color = 'species', 
                 size = "petal_width", hover_data = ['petal_width'], 
                 labels = dict(sepal_length = "Sepal Length", petal_length = "Petal Length", species = "Species"),
                title = "Iris Data", color_discrete_sequence = px.colors.qualitative.Vivid)
fig.show()

### Facetting

Can also facet on a column variable using the `facet_col` input parameter and facet on a row varaible using the `facet_row` input parameter.

In [5]:
import plotly.express as px
tips = px.data.tips()
fig = px.scatter(tips, x = "total_bill", y = "tip", color = "smoker", facet_col = "sex", facet_row = "time")
fig.show()

### Linear Regression

Can include a linear regression line using `trendline` parameter.  Options include `ols` for Ordinary Least Squares, `lowess` for Locally Weighted Scatterplot Smoothing, (see more here: https://plotly.com/python/linear-fits/).

In [6]:
import plotly.express as px
tips = px.data.tips()
fig = px.scatter(tips, x = "total_bill", y = "tip", color = "smoker",
                trendline = "ols")
fig.update_xaxes(title_text = "Total Bill")
fig.update_yaxes(title_text = "Tip")
fig.show()

Plotly Express will fit a trendline per trace, and allows you to access the underlying model parameters for all the models.

In [8]:
results = px.get_trendline_results(fig)
results.query("smoker == 'Yes'").px_fit_results.iloc[0].summary() #iloc locates the output

0,1,2,3
Dep. Variable:,y,R-squared:,0.238
Model:,OLS,Adj. R-squared:,0.23
Method:,Least Squares,F-statistic:,28.48
Date:,"Mon, 29 Nov 2021",Prob (F-statistic):,6.89e-07
Time:,15:17:19,Log-Likelihood:,-150.19
No. Observations:,93,AIC:,304.4
Df Residuals:,91,BIC:,309.4
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,1.5643,0.299,5.228,0.000,0.970,2.159
x1,0.0696,0.013,5.337,0.000,0.044,0.095

0,1,2,3
Omnibus:,25.397,Durbin-Watson:,1.732
Prob(Omnibus):,0.0,Jarque-Bera (JB):,41.494
Skew:,1.146,Prob(JB):,9.77e-10
Kurtosis:,5.335,Cond. No.,53.9


### Exercise

The `experiment` dataset represents the results of 100 simulated participants on three hypothetical experiments, along with their gender and control/treatment group.

Build a scatterplot with `experiment_1` as the x-axis and `experiment_2` as the y-axis.  Colorize based on `group`.  Add lowess linear model.  Change the symbol of the point based on `gender` and the size based on `experiment_3`.  Add a title and format the x and y axis labels.

In [10]:
experiment = px.data.experiment()
fig = px.scatter(experiment, x = 'experiment_1', y = 'experiment_2', color = 'group', trendline = 'lowess',
                symbol='gender', size='experiment_3', labels=dict(experiment_1='Experiment 1', experiment_2 = 'Experiment 2',
                title='Experiment Dataset'))
fig.show()

## Line Graphs

Line plots can be made on using any type of cartesian axis, including linear, logarithmic, categorical or date axes. Line plots on date axes are often called time-series charts.

Plotly auto-sets the axis type to a date format when the corresponding data are either ISO-formatted date strings or if they're a date pandas column or datetime NumPy array.

Syntax:

`px.line(df, x = "...", y = "...", color = "...", title = "...", symbol = "...", color_discrete_sequence = "...", ...)`

- X is the variable for x axis
- Y is the varaible for y axis
- Color maps to a column variable to be used to colorize the line(s)
- Title will add a title to the graph
- Symbol determines the shape of the marker
- color_discrete_sequence can be added to change the color to a discrete color map or a dictionary/mapping created by you

In [13]:
import plotly.express as px

gapminder = px.data.gapminder().query("continent == 'Europe'")
fig = px.line(gapminder, x = "year", y = "lifeExp", color = "country", title = "Life Expectancy in Europe", 
              color_discrete_sequence = px.colors.qualitative.Pastel1)
fig.update_xaxes(title_text = "Year")
fig.update_yaxes(title_text = "Life Expectancy (Years)")
fig.show()

### Exercise

The `stocks` dataset shows stock prices at various dates.

Build a line plot with `AMZN` as the y-axis and `date` as the x-axis.  Change the color of the line.  Add a title and format the x and y axis labels.

In [20]:
import plotly.express as px

stocks = px.data.stocks()
fig = px.line(stocks, x = 'date', y ='AMZN', labels=dict(AMZN='Amazon', date='Date'), title='Amazon Stock Prices')
fig.show()


## Bar Charts

With px.bar, each row of the DataFrame is represented as a rectangular mark.

Syntax:

`px.bar(df, x = '...', y = '...', color = '...', title = '...', ...)`

- X is the variable to be displayed on the x axis
- Y is the variable to be displayed on the y axis
- Color is the column variable to be used to determine coloring
- Title is used to define the title of the graph

Color and axis labels can be changed similarly to the above graphs.

In [22]:
import plotly.express as px

medals = px.data.medals_long()

fig = px.bar(medals, x = "nation", y = "count", color = "medal", title = "Long-Form Input", barmode='group')
fig.show()

### Exercise

Use the x list below for x-axis and the y list below for y-axis of a bar graph.  Colorize based on bar.  Change axis labels and add a title.

In [31]:
import pandas as pd
x = ['Product A', 'Product B', 'Product C']
y = [20, 14, 23]

fig = px.bar(x = x, y = y, color = x, labels = dict(x = "Product", y = "Count"), title = "Bar Graph")
fig.show()



## Pie Chart


Syntax:

`px.pie(df, values = '...', names = '...', title = '...', ...)`

- Values is the column variable with the amounts to represent the wedge sizes
- Names is the column varaible with the list of names to associate with the wedges
- Title is used to define the title of the graph

Color and axis labels can be changed similarly to the above graphs.

In [32]:
import plotly.express as px

tips = px.data.tips()
fig = px.pie(tips, values = 'tip', names = 'day', color_discrete_sequence=px.colors.sequential.RdBu, title = "Tips per Day")
fig.show()

# Add Statistics (scipy.stats)

Let's say we want to compare means using a boxplot to see if there is evidence of a different response between factors.  Then we want to analyze using statistical theory to see if there is a significant difference amongst average response of factors.


## Boxplots

Syntax:

`px.box(df, x = "...", y = "...", color = "...", ...)`

- x is the varaible on the x axis, used if you want multiple boxplots side by side
- y is the variable used to calculate the 5 # summary
- color is the variable to colorize based on
- add title if you want to denote a title

The x and y-axis can be changed as can the color mapping as the other graphs above.

You can also incorporate:

`fig.update_traces(quartilemethod = "...")`

To specify either "inclusive", "exclusive", or "linear" by default.

## One-Way ANOVA Test

The one-way ANOVA tests the null hypothesis that two or more groups have the same population mean. The test is applied to samples from two or more groups, possibly with differing sizes.

Syntax:

`scipy.stats.f_oneway(*args)`

- *args are the numeric arrays you want to compare

In [33]:
## Boxplot to see if there is visual evidence of a difference in average bill between lunch and dinner

import plotly.express as px

tips = px.data.tips()
fig = px.box(tips, x = "time", y = "total_bill", color = "time", title = "Restaurant Total Bill per Time of Day",
             color_discrete_sequence = px.colors.qualitative.Pastel1)
fig.update_xaxes(title_text = "Meal")
fig.update_yaxes(title_text = "Total Bill")
fig.show()

In [34]:
import scipy.stats as stats
#oneway ANOVA comparing means
stats.f_oneway(tips['total_bill'][tips['time'] == 'Dinner'],
               tips['total_bill'][tips['time'] == 'Lunch'])

F_onewayResult(statistic=8.396303207955597, pvalue=0.004104621407595661)

### Exercise

Re-run the above analysis on `total_bill` but use the factor for whether or not the restauranter was a `smoker`.

In [43]:
import plotly.express as px

tips = px.data.tips()
fig = px.box(tips, x = "smoker", y = "total_bill", color = "smoker", title = "Restaurant Total Bill with Smoking",
             color_discrete_sequence = px.colors.qualitative.Vivid)
fig.update_xaxes(title_text = "Meal")
fig.update_yaxes(title_text = "Total Bill")
fig.show()

In [40]:
import scipy.stats as stats
stats.f_oneway(tips['total_bill'][tips['smoker'] == 'Yes'],
               tips['total_bill'][tips['smoker'] == 'No'])

F_onewayResult(statistic=1.7914119525962602, pvalue=0.18201032884302323)