# Plotly

## Installing the dependencies

We need to make sure that we have all of the dependencies we need installed. If you haven't already, please install the required python packages. You can do this inside this notebook by using:

In [None]:
!pip install "plotly>=4" numpy pandas

Plotly used to be pretty finicky about displaying in Jupyter/Colab, but Plotly 4 allows things to just work.

We'll also download some supporting materials for this notebook:

In [None]:
# Download data and solutions

import urllib.request
import os

def download_data(path):
    if os.path.exists(path):
        return
    if not os.path.exists('data'):
        os.mkdir('data')
    if not os.path.exists('solutions'):
        os.mkdir('solutions')
    url = 'https://raw.githubusercontent.com/ualberta-rcg/python-plotting/master/notebooks/' + path
    output_file = path
    urllib.request.urlretrieve(url, output_file)
    print("Downloaded " + path)

def show_solution(file):
    fp = open('solutions/{}'.format(file), 'r')
    print(fp.read())

download_data('data/gapminder_gdp_europe.csv')
download_data('solutions/plotly-scatter-add-trace2.py')
download_data('solutions/plotly-scatter-netherlands-france.py')

### Creating random points

We'll create some random data points: one array for the x values, one array for y values, a hundred numbers in each.

[`numpy.random.randn`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.randn.html) samples points from a standard normal distribution (mean 0, standard deviation 1).

In [None]:
# Create random data with numpy
import numpy as np

N = 100

random_x = np.random.randn(N)
random_y = np.random.randn(N)

# Poke around with one of these
print("Length:", len(random_x))
print("First ten values:", random_x[0:9])

The `graph_objects` submodule of plotly has functions that will create our graph objects for us.

A **trace** is just the name we give a collection of data and the specifications of which we want that data plotted. This terminology is used a lot in the plotly documentation. We can have any number of traces in a plot.

## Scatter plots

Here we will create a trace for a basic scatter plot, with the data coming from our generated data above.

In [None]:
import plotly.graph_objects as go

fig = go.Figure()
fig.add_trace(go.Scatter(
    x = random_x,
    y = random_y,
    mode = 'markers'
))

You'll notice that a plot appears. This is because the `add_trace` method returns a `Figure` object and that this is the last line of the cell.

When the figure or (or a method that returns the figure) isn't the last line of a sell we can still show it using the `show` method.

In [None]:
print('before')
fig.show()
print('after')

Play around with the user interface (TODO: say more about this)

Rather than embedding the plot, we can also use the `write_html` function to export our plot to an html file that we can put online, or email to a colleague. This function writes an html file at the location you provide in the first argument.

In [None]:
fig.write_html('my-first-scatter.html')

A warning though: the HTML file produced will be pretty huge.

**Note**: although we are getting our mock data from NumPy, please note that our data can just as easily come from Python lists, e.g.,: 

In [None]:
fig = go.Figure()
fig.add_trace(go.Scatter(
    x = [1, 6, 4, 9],
    y = [100, 56, 200, 48],
    mode = 'markers'
))

There are a lot of options to modify the visual appearance of the scatter plot. For example we can connect the data points in the order they appear in the arrays:

In [None]:
fig = go.Figure()
trace = go.Scatter(
    x = random_x,
    y = random_y,
    mode = 'lines'
)
fig.add_trace(trace)
fig.show()

Or we can have both dots and lines:

In [None]:
go.Figure().add_trace(go.Scatter(x = random_x,
                                 y = random_y,
                                 mode = 'markers+lines'))

Notice that we reduced three separate python statements into one statement above. We did not create a `trace` variable or a `data` variable as intermediate results.

If this approach matches the style you prefer, then this is prefectly valid code, but it can make things harder to read (but might make things easier to modify in some situations).

### What about those scatterplots where the dots change size because of another variable?

This is something we can easily do by creating another random numpy array to hold size information, and set that as a configuration for the marker:

In [None]:
# Size must be positive and be big enough to see!
random_size = np.absolute(np.random.randn(N)) * 30

trace = go.Scatter(
    x = random_x,
    y = random_y,
    mode = 'markers',
    marker = {'size': random_size}
)
fig = go.Figure()
fig.add_trace(trace)
fig.show()

### But I want a graph ..., you know, a line graph ...

Lets create a hundred linear points in the x direction, evenly spaced between zero and one. The function [numpy.linspace](https://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html) does this for us.

In [None]:
linear_x = np.linspace(0, 1, N)

# Poke around ...
print("Length:", len(linear_x))
print("First ten values:", linear_x[0:9])

We can now make something that looks like a traditional graph:

In [None]:
fig = go.Figure()
trace = go.Scatter(
    x = linear_x,
    y = random_y,
    mode = 'lines'
)
fig.add_trace(trace)
fig.show()

### But I want two graphs...

Let's create two random sets of Y values to plot:

In [None]:
# Mean 0, standard deviation 1
random_y0 = np.random.randn(N)

# Mean 2, standard deviation 1
# Nudge the values up for these guys so the mean is 2 ...
random_y1 = np.random.randn(N) + 2

As mentioned earlier, we can plot multiple data sets by adding more traces to the array we pass to `iplot`:

In [None]:
fig = go.Figure()
trace0 = go.Scatter(
    x = linear_x,
    y = random_y0,
    mode = 'lines'
)
trace1 = go.Scatter(
    x = linear_x,
    y = random_y1,
    mode = 'lines+markers'
)
fig.add_trace(trace0)
fig.add_trace(trace1)
fig.show()

Previous versions of Plotly by default had a button to "Compare data on Hover" that was removed fairly recently. We can turn it on using the following command:

In [None]:
fig.update_layout(hovermode='x')

By the way, in recent versions of plotly, we can create the scatter plot and add it to the figure in one function call with the `add_scatter` method.

In [None]:
# Compare with above ...

fig = go.Figure()
fig.add_scatter(
    x = linear_x,
    y = random_y0,
    mode = 'lines'
)
fig.add_scatter(
    x = linear_x,
    y = random_y1,
    mode = 'lines+markers'
)
fig.show()

### Exercise

The code above is repeated below.

Modify the code:
* Generate another random data set called `random_y2`. This data should be sampled from a distribution with mean `-2`.
* Create a third trace called `trace2` using this data.
* Plot all three datasets. We want `trace2` to be represented as markers only.

In [None]:
# Your code here ...
# Modify this code as described above

fig = go.Figure()
fig.add_scatter(
    x = linear_x,
    y = random_y0,
    mode = 'lines'
)
fig.add_scatter(
    x = linear_x,
    y = random_y1,
    mode = 'lines+markers'
)

fig.show()

In [None]:
# PRINT SOLUTION (copy/paste output into a cell to run)
show_solution('plotly-scatter-add-trace2.py')

### We can control many more aspects of the visual representation of our plots

Some obvious choices are:
* color
* marker size
* marker color
* line width
* line color
* name (that shows up in the legend)

Some of the options to do this become a bit arcane, so the [scatter documentation](https://plot.ly/python/line-and-scatter/) will be your friend...

Here is an example:

In [None]:
trace0 = go.Scatter(
    x = linear_x,
    y = random_y0,
    mode = 'lines+markers',
    name = 'Size of thing',
    marker = dict(
      size = 10,
      color = 'rgba(255, 182, 193, .9)',
      line = dict(
        width = 2,
      )
    ),
)
trace1 = go.Scatter(
    x = linear_x,
    y = random_y1,
    mode = 'lines',
    name = 'Size of other thing',
    line = dict(
      width = 5,
      color = 'rgba(50, 50, 255, .5)',
    )
)

fig = go.Figure(data = [trace0, trace1])
fig.show()

Notice that this time we didn't use the `add_trace` method, but instead declared our figure with the traces in a list as a `data` attribute.

### No, I want a green line instead

We can edit these styles after the fact, then re-plot using the `update_traces` method.

The `selector` allows us to match certain criteria to determine which traces we want to update (no `selector` keyword argument means update all of the traces).

The other arguments determine what we want to change about the selected traces.

(Personal log: this used to be a bit easier (but less robust) in Plotly 3.

In [None]:
fig.update_traces(selector=dict(name='Size of thing'),
                  marker=dict(color='rgba(50, 255, 50, .5)'))
fig.show()

### Yeah, but you should always label your axes and have a title

For plotly, there are many ways to label your axes and supply a title. There is a lot of flexibility so that you can change fonts and sizes and affect a lot of attributes of the **layout** of a plot.

The easiest way is to use the `update_layout` method if you want to do simple things...


In [None]:
# Create the figure
fig = go.Figure(data=[trace0, trace1])
fig.update_layout(title = 'The size of a couple of things',
                  xaxis_title = 'Time (Seconds)',
                  yaxis_title = 'Width (inches)')
# Plot the figure
fig.show()

### Yeah, but my data is in a Pandas dataframe

Lets load one of those Gapminder datasets (Europe) into a dataframe, and convert the columns to nice numeric values.

In [None]:
import pandas as pd

df = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')

print("Yucky columns:", df.columns )

# Extract year from last 4 characters of each column name
years = df.columns.str.strip('gdpPercap_')

# Convert year values to integers, saving results back to dataframe
df.columns = years.astype(int)
print("Nice columns:", df.columns )

# Look at the first five rows...
df.head()

We can now slice our dataframe to plot it. Here we plot GDP per-capita data for the Netherlands and France.

In [None]:
trace0 = go.Scatter(
    x = df.columns,
    y = df.loc['Netherlands'],
    mode = 'lines'
)
trace1 = go.Scatter(
    x = df.columns,
    y = df.loc['France'],
    mode = 'lines+markers'
)

fig = go.Figure(data=[trace0, trace1])
fig.show()

### Exercise

That last graph was awesome ... BUT:

* The graph should be titled 'GDP per-capita for the Netherlands and France'
* The Netherlands graph is called 'trace 0', and the France graph is called 'trace 1'. They should be named after the country.
* The x-axis should be labeled 'Year'
* The y-axis should be labeled 'GDP per-capita'
* The France graph should be blue (`'rgb(0, 0, 255)'`) and the Netherlands graph should be orange (`'rgb(255, 127, 0)'`)

Apply what you have learned in this notebook to fix the graph!

In [None]:
# Your code here ...

In [None]:
# PRINT SOLUTION (copy/paste output into a cell to run)
show_solution('plotly-scatter-netherlands-france.py')

There is a new component of Plotly called **Plotly Express** that has a lot of features including:

* A simple syntax for doing simple things
* Access to example data sets
* Also works with Graph Objects so can use methods like `add_trace`, etc.
* Nice integration with Pandas


In [None]:
import plotly.express as px

# Loads a pandas dataframe from an example data set
df = px.data.iris()

fig = px.scatter(df, x="sepal_width", y="sepal_length", color="species",
                 title="A Plotly Express Figure")

fig.show()

### Summary

Here are some key points:

* Data is organized in traces, which hold x, y data and style information
* Traces are combined into a figure (with optional layout information).
* `fig.show()` puts a plot in a notebook, `write_html` generates a web page.
* One or more traces can be plotted as `data`
* Data can be combined with `layout` to make a figure

We will see this same overall pattern with other types of plotly plot types.

What we have seen so far is the tip of the iceberg: consult the Plotly documentation (and Stackoverflow) to find out all the options.

**[On to the next notebook (Plotly Bar charts)](03-plotly-bar.ipynb) ...**