# Dealing with 3D data

... or more accurately, dealing with scalar data defined on a 2D grid

... much like you might find in a Pandas dataframe!

... or a spreadsheet!

Lets initialize Plotly to work in our notebook:

In [None]:
import plotly.graph_objects as go

May need to download supporting materials ...

In [None]:
# Download data and solutions

import urllib.request
import os

def download_data(path):
    if os.path.exists(path):
        return
    if not os.path.exists('data'):
        os.mkdir('data')
    if not os.path.exists('solutions'):
        os.mkdir('solutions')
    url = 'https://raw.githubusercontent.com/ualberta-rcg/python-plotting/master/notebooks/' + path
    output_file = path
    urllib.request.urlretrieve(url, output_file)
    print("Downloaded " + path)

download_data('data/gapminder_gdp_europe.csv')

## Dataset: a paraboloid

Lets first create an artificial dataset. We will create a [paraboloid](https://en.wikipedia.org/wiki/Paraboloid).

![](assets/paraboloid.png)

(Or at least an approximation of one.)

The data for our paraboloid will be defined on a grid of points and stored as a 2D-array of Z values.

The indices of the columns go from `0 ... (width - 1)` (x direction).

The indices of the rows go from `0 ... (height - 1)` (y direction).

But we don't want these indices to be the x, y values for our calculations: in particular, we want to minimum point of our paraboloid to be roughly in the middle of the plot.

We can write functions to calculate the x, y values based on where we are in the row, columns of our 2d array. We can then use those x, y values to define a function `f(x, y)` as the formula for our paraboloid.

In [None]:
# Convert column into physical x location
def get_x(column, width):
    return (column - width/2)

# Convert row into physical y location
def get_y(row, height):
    return (row - height/2)

# A paraboloid
def f(x, y):
    return(x**2 + y**2)

We will now:

* Create a 1D-array of `x` values (18 or them)
* Create a 1D-array of `y` values (16 of them)
* use numpy to help us define a 2D-array (18 x 16)
* Loop over the row and columns to assign the 'z' values to the 2D-array.


In [None]:
import numpy as np

# Create 2D array to hold values
# Number of columns is width
# Number of rows is height

width = 18
height = 16

# NEW! (?) List comprehensions to define lists!
x = [get_x(column, width) for column in range(width)]
y = [get_y(row, height) for row in range(height)]
z = np.zeros(shape=(width,height))

# Compute 2d arrays of z values for the paraboloid
for row in range(height):
    for column in range(width):
        z[column][row] = f(x[column], y[row])

### How about a heatmap?

Now that we have our lattice, we can very quickly and easily plot a heatmap:

In [None]:
data = [
    go.Heatmap(
        z=z,
    )
]
fig = go.Figure(data=data)
fig.show()

### The X and Y axes values have the column indices ... I want the x, y values

We an provide our x, y values as options:

In [None]:
data = [
    go.Heatmap(
        x=x,
        y=y,
        z=z,
    )
]
fig = go.Figure(data=data)
fig.show()

### Looks kind of blocky though ...

Our data is by it's nature blocky, but we can ask plotly to smooth it out using the `zsmooth` option. This can have values of `fast` and `best`.

Only the best for us:

In [None]:
# To spice things up, we'll define our heatmap as a trace this time ...

trace = go.Heatmap(
    x=x,
    y=y,
    z=z,
    zsmooth='best'
)

fig = go.Figure()
fig.add_trace(trace)
fig.show()

### Nice, but the colors are kind of boring ...

Plotly maintains a collection of color scales to paint data with.

The default color scale is called `'Plasma'`.

According to the `help` documentation for `Heatmap`, the color scale names are:

```
['Greys', 'YlGnBu', 'Greens', 'YlOrRd', 'Bluered', 'RdBu',
 'Reds', 'Blues', 'Picnic', 'Rainbow', 'Portland', 'Jet',
 'Hot', 'Blackbody', 'Earth', 'Electric', 'Viridis', 'Cividis']
```

But it turns out there are a whole lot more:

https://plotly.com/python/builtin-colorscales/

Let's check out `'Jet'`:

In [None]:
data = [
    go.Heatmap(
        x=x,
        y=y,
        z=z,
        colorscale='Jet'
    )
]
fig = go.Figure(data=data)
fig.show()

### No, I don't want those -- I want one that is pink for the high values, dark grey in the middle, and turquoise for the low values.

No problem, we can set `colorscale` to be a list that maps normalized values (`0` means the lowest value, `1` means the highest value) to RGB values

In [None]:
data = [
    go.Heatmap(
        x=x,
        y=y,
        z=z,
        colorscale = [[0.0, 'rgb(0,255,255)'],
                      [0.5, 'rgb(51,51,51)'],
                      [1.0, 'rgb(255,128,128)']]
    )
]

fig = go.Figure(data=data)
fig.show()

### I was kidding about the colors ... how about a contour plot through?

We just replace the word `Heatmap` with `Contour`:

In [None]:
data = [
    go.Contour(
        x=x,
        y=y,
        z=z,
        colorscale='Jet'
    )
]

fig = go.Figure(data=data)
fig.show()

### But why is the thing squashed?

We've glossed over the fact that aspect ratio of the heatmap doesn't really match what we would expect for a paraboloid.

Below we set the y-axis to scale with the x-axis to keep the aspect ratio to what we expect. Constraining the x-axis to 'domain' ensures that the graph itself doesn't stretch out to fit the width of the notebook (comment out that code to see the effect).

In [None]:
data = [
    go.Contour(
        x=x,
        y=y,
        z=z,
        colorscale='Jet'
    )
]
layout = go.Layout(
  xaxis = {
    'constrain': 'domain'
  }, 
  yaxis = {
    'scaleanchor': 'x'
  }
)
fig = go.Figure(data=data, layout=layout)
fig.show()

### Now I just want the thing to look 3D ...

Again easy change ... now we just replace `Contour` with `Surface`.

We'll also give our plot a title ... and color it with the `'Viridis'` color scheme.

In [None]:
data = [
    go.Surface(
        x=x,
        y=y,
        z=z,
        colorscale='Viridis'
    )
]
layout = go.Layout(
    title='Real 3d stuff'
)
fig = go.Figure(data=data, layout=layout)
fig.show()

## Applying what we have learned to data coming from a Pandas DataFrame

If you think about it, a lot of the data you see in a spreadsheet (or DataFrame) looks a lot like a 2D-array of values, like our paraboloid.

Lets import pandas and load in our European GDP per capita data. We will also convert our column names to integer years.

In [None]:
import pandas as pd

df = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
years = df.columns.str.strip('gdpPercap_')
df.columns = years.astype(int)

## Same song, different key ...

It's now very easy to plot a heatmap of our data.

We will use the years as the x values, the country names as the y values, and add a title:

In [None]:
# Some versions will prefer `z=df.to_numpy()` below

data = [
    go.Heatmap(
        x=years,
        y=df.index,
        z=df,
        colorscale='Jet'
    )
]
layout = go.Layout(
    title='GDP per-capita',
    yaxis=dict(tickmode='linear',
               tickangle=45)
)
fig = go.Figure(data=data, layout=layout)
fig.show()

### Try this (10 minutes)

What's going on here?

```python
    yaxis=dict(tickmode='linear',
               tickangle=45)
```

Try removing one or more of those options to see why these options were included.

Play around with some of the options described on [this page](https://plot.ly/python/axes/#set-and-style-axes-title-labels-and-ticks) and use them to annotate your heatmap.

### But I hate those hover messages ...

A default hover text for one of the heatmap cells might look like:

```
x: 1982
y: Croatia
z: 13.22182k
```

Although the variables `x`, `y`, and `z` might be meaningful for Plotly, they will not be meaningful to any person looking at our heatmap.

That can be changed by giving plotly a 2 dimensional array of hover text values to use for our heatmap.

We have to make sure that we match the shape of the dataframe we are using (e.g., if our dataframe has 20 rows and 40 columns, our array of hover text values also must have 20 rows and 40 columns.

We can use the Pandas dataframe method `iterrows` to iterate over all of the rows in the dataframe (we will also have access to the row index, which is a country name). Then for each row, we can iterate over each column using the `iteritems` method (and we will have access to the column names).

Here's how we would want to construct our hover text values:

In [None]:
hovertext = []
arow = None
for country, row in df.iterrows():
    hoverrow = []
    for year, data in row.items():
        hoverrow.append(str(year) + ': '+ country + ' GDP per-capita is ' + str(data))
    hovertext.append(hoverrow)

We're done, but we can use Pandas to visually inspect the first five rows ...

In [None]:
pd.DataFrame(hovertext).head()

Now we are ready to redo our heat map.

We use the options `hoverinfo='text'` and `text=hovertext` to make it work.

In [None]:
data = [
    go.Heatmap(
        x=years,
        y=df.index,
        z=df,
        hoverinfo='text',
        text=hovertext,
        colorscale='Jet'
    )
]
layout = go.Layout(
    title='GDP per-capita',
    yaxis=dict(tickmode='linear',
               tickangle=45)
)
fig = go.Figure(data=data, layout=layout)
fig.show()

### Sorting by one of the years? (2007)

This is more of a Pandas feature than a plotly one, but here's how it's done:

In [None]:
df_sorted = df.sort_values(2007)

data = [
    go.Heatmap(
        x=years,
        y=df_sorted.index,
        z=df_sorted,
        colorscale='Jet'
    )
]
layout = go.Layout(
    title='GDP per-capita',
    yaxis=dict(tickmode='linear',
               tickangle=45)
)
fig = go.Figure(data=data, layout=layout)
fig.show()

### Insight

What can you say about the per capita GDB of Ireland during the period of time reported?

**[On to the next notebook (Plotly controls)](05-plotly-controls.ipynb) ...**