# Plotting participant data using plotly

[Plotly](https://plotly.com/graphing-libraries/) is a open source graphing libray for Python, R, and Javascript which is easier to use than [matplotlib](https://matplotlib.org/) and somewhat less opinionated/constrained than [seaborn](https://seaborn.pydata.org/). When used with [Dash](https://dash.plotly.com/) it provides relatively simple ways to produce iteractive plots for dashboards.

This notebook provides a quick introduction/cookbook/tutorial on producing plots using Plotly and uses it to plot some motion sensor data from the [UCI Machine Learning repository](https://archive.ics.uci.edu/ml/datasets/Activity+recognition+with+healthy+older+people+using+a+batteryless+wearable+sensor#).

## Load the data

First we load the downloaded data into a [pandas](https://pandas.pydata.org/) dataframe and add use useful columns:

In [3]:
from pathlib import Path

import pandas as pd

DATA_ROOT = Path('..') / 'data'

dfs = []
activity_labels = ['bed', 'chair', 'lying', 'ambulating']
default_names = ['time', 'front', 'vertical', 'lateral', 'sensor_id', 'rssi', 'phase', 'frequency', 'activity']
for data_file in Path(DATA_ROOT).rglob('d[12]p??[FM]'):
    df = pd.read_csv(data_file, names=default_names)
    df['activity_label'] = df['activity'].apply(lambda i: activity_labels[i - 1])
    df['gender_label'] = str(data_file)[-1]
    df['participant'] = data_file.name
    
    # Add a column indicating order of the activities for a particiapnt.
    df = df.sort_values(by=['time'])
    df['activity_sequence'] = (df['activity'].shift(1) != df['activity']).cumsum()
    dfs.append(df)

sensor_df = pd.concat(dfs, axis='index')
sensor_df = sensor_df.sort_values(by=['participant', 'time'])

sensor_df.head()

Unnamed: 0,time,front,vertical,lateral,sensor_id,rssi,phase,frequency,activity,activity_label,gender_label,participant,activity_sequence
0,0.0,0.27203,1.0082,-0.082102,1,-63.5,2.4252,924.25,1,bed,M,d1p01M,1
1,0.5,0.27203,1.0082,-0.082102,1,-63.0,4.7369,921.75,1,bed,M,d1p01M,1
2,1.5,0.44791,0.91636,-0.013684,1,-63.5,3.0311,923.75,1,bed,M,d1p01M,1
3,1.75,0.44791,0.91636,-0.013684,1,-63.0,2.0371,921.25,1,bed,M,d1p01M,1
4,2.5,0.34238,0.96229,-0.059296,1,-63.5,5.892,920.25,1,bed,M,d1p01M,1


We are going to plot one participant's data so we subset the data for one participant:

In [4]:
participants = sensor_df['participant'].unique()
participant = participants[0]
participant_df = sensor_df[sensor_df['participant'] == participant]
participant_df.head()

Unnamed: 0,time,front,vertical,lateral,sensor_id,rssi,phase,frequency,activity,activity_label,gender_label,participant,activity_sequence
0,0.0,0.27203,1.0082,-0.082102,1,-63.5,2.4252,924.25,1,bed,M,d1p01M,1
1,0.5,0.27203,1.0082,-0.082102,1,-63.0,4.7369,921.75,1,bed,M,d1p01M,1
2,1.5,0.44791,0.91636,-0.013684,1,-63.5,3.0311,923.75,1,bed,M,d1p01M,1
3,1.75,0.44791,0.91636,-0.013684,1,-63.0,2.0371,921.25,1,bed,M,d1p01M,1
4,2.5,0.34238,0.96229,-0.059296,1,-63.5,5.892,920.25,1,bed,M,d1p01M,1


## Plotting data

Plotly comes with two APIs for producing plots, a higher level API `plotly.express`, which creates complete figures typically in one command, and a lower lever API `plotly.graph_objects` which provides greater flexibility at the expense of some verbosity.

We will focus on the `graph_objects` API as it is actually easier to produce plots with multiple elements (known as "traces") and also add, hide or customise legends.

Firstly, we import the `graph_objects` API.

In [8]:
import plotly.graph_objects as go

Now, we can plot the front sensor data for the participant. This involves
- create a figure using `go.Figure()`
- adding a trace to the plot using the `add_trace` method


In [26]:
df = participant_df # For brevity

fig = go.Figure()
fig.add_trace(go.Scatter(x=df['time'], y=df['front']))
fig.show()

This is rather large, so we can set up the figure layout using `layout` keyword in the `Figure` constructor, to set the width and height of the figure and adjust the margins:

In [37]:
layout = dict(width=400, height=300, margin=dict(l=0, r=0, t=32, b=0))
fig = go.Figure(layout=layout)
fig.add_trace(go.Scatter(x=df['time'], y=df['front']))
fig.show()

Note, that we used the `Scatter` class even though this is a line plot. If we want to just use markers we can change the 'mode' of the plot. The following example shows how we can use lines, markers and lines and markers in the same plot. Note, that Plotly automatically adds an interactive legend. The entries can be clicked to show or hide individual traces. The `name` keyword argument in the `Scatter` constructor determines the label shown in the legend.

In [42]:
fig = go.Figure(layout={**layout, 'width': 600})
fig.add_trace(go.Scatter(x=df['time'], y=df['front'], mode='lines', name='lines'))
fig.add_trace(go.Scatter(x=df['time'], y=df['vertical'], mode='markers', name='markers'))
fig.add_trace(go.Scatter(x=df['time'], y=df['lateral'], mode='lines+markers', name='lines+markers'))
fig.show()

We can save some keystrokes by using a for loop to plot the sensor data:

In [44]:
fig = go.Figure(layout={**layout, 'width': 600})
for sensor in ['front', 'vertical', 'lateral']:
    fig.add_trace(go.Scatter(x=df['time'], y=df[sensor], mode='lines+markers', name=sensor))

fig.show()

So now we can plot the sensor data. However, the sensor data is collected for different activities and we would like to indicate the different activities. For this, we will simply plot a rectangle in the background for each activity. Again, we can use the `Scatter` class, and set the `x` and `y` values to the corners of the rectangle:

In [47]:
fig = go.Figure(layout={**layout, 'width': 600})
for _, activity_df in df.groupby('activity_sequence'):
    start = activity_df['time'].min()
    end = activity_df['time'].max()
    name = activity_df['activity_label'].unique()[0]
    fig.add_trace(
        go.Scatter(
            x=[start, start, end, end], 
            y=[-2, 2, 2, -2], # Chosen arbitrarily
            mode='lines', 
            name=name))

fig.show()

This plot has some problems, the rectangles are not joined up, and actually we don't really want the lines, just the rectangles filled in. To overcome these problems, we can add `mode='none'` and `fill='toself'` to the `Scatter` constructor. 

In addition the colors are different for activities with the same name. We can define a dictionary containing the colors for each activity and pass the `fillcolor` keyword argument to the `Scatter` constructor. We also use the `opacity` keyword so we can see the grid lines through the rectangles. For simplicity, we use colors conviniently defined in the `plotly.express` API: 

In [73]:
import plotly.express as px

colors = px.colors.qualitative.Pastel

activity_colors = {
    'bed': colors[0],
    'ambulating': colors[1],
    'lying': colors[2],
    'chair': colors[3],
}

fig = go.Figure(layout={**layout, 'width': 600})
for _, activity_df in df.groupby('activity_sequence'):
    start = activity_df['time'].min()
    end = activity_df['time'].max()
    name = activity_df['activity_label'].unique()[0]
    fig.add_trace(
        go.Scatter(
            x=[start, start, end, end], 
            y=[-2, 2, 2, -2], 
            mode='none', 
            fill='toself',
            fillcolor=activity_colors[name],
            opacity=0.333,
            name=name))
   

fig.show()

This is an improvement, but there multiple legend entries for the same type of activity. We must keep track of which activities have already been added to the legend using a `set`, and use the `showlegend` keyword argument in the `Scatter` constructor to determine if the name of a trace should be added to the legend. We must also add the `legendgroup` keyword to ensure that all activities are shown or hidden when the corresponding legend entry is clicked.

In [77]:
fig = go.Figure(layout={**layout, 'width': 600})

# Create the empty set
activity_legend = set()
for _, activity_df in df.groupby('activity_sequence'):
    start = activity_df['time'].min()
    end = activity_df['time'].max()
    name = activity_df['activity_label'].unique()[0]
    fig.add_trace(
        go.Scatter(
            x=[start, start, end, end], 
            y=[-2, 2, 2, -2], 
            mode='none', 
            fill='toself',
            fillcolor=activity_colors[name],
            opacity=0.333,
            name=name,
            legendgroup=name,
            showlegend=not name in activity_legend))
    
    # Add the anme to the set.
    activity_legend.add(name)

fig.show()

Now, we can add the sensor data over the top of the activity rectangles:


In [88]:
fig = go.Figure(layout={**layout, 'width': 900, 'height': 500})

colors = px.colors.qualitative.D3

sensor_colors = {
    'front': colors[0],
    'lateral': colors[1],
    'vertical': colors[2],
}

activity_legend = set()

for index, group in enumerate(df.groupby('activity_sequence')):
    _, activity_df = group
    start = activity_df['time'].min()
    end = activity_df['time'].max()
    name = activity_df['activity_label'].unique()[0]
    fig.add_trace(
        go.Scatter(
            x=[start, start, end, end], 
            y=[-2, 2, 2, -2], 
            mode='none', 
            fill='toself',
            fillcolor=activity_colors[name],
            opacity=0.333,
            name=name,
            # Group activity legend entries together
            legendgroup='activity',
            showlegend=not name in activity_legend))
    
    for sensor in ['front', 'lateral', 'vertical']:
        fig.add_trace(
            go.Scatter(
                x=activity_df['time'], 
                y=activity_df[sensor], 
                mode='lines+markers', 
                line_color=sensor_colors[sensor], 
                # Group sensor legend entries together
                legendgroup='sensor',
                name=sensor, 
                # Only add to the legend if this is the first iteration
                showlegend=index == 0))
    
    activity_legend.add(name)

fig.show()