# Build an Interactive Data Analytics Dashboard with Python

### Instructor

* Teddy Petrou
* Dunder Data
* Author of multiple books and Python libraries

### Overview

Learn how to build interactive dashboards with Dash Python library

1. [Launch the completed Dashboard](#Launch-the-Completed-Dashboard)
1. [Get introduced to the data](#Get-Introduced-to-the-Data)
1. [Visualizations with Plotly](#Visualizations-with-Plotly)
1. Creating Choropleth maps
1. Building the Dashboard with Dash
1. Adding interactivity to the Dashboard

### Assumptions

* You have at least intermediate knowledge of Python
* Understand basics of Pandas
* Know basics of HTML/CSS

### Goal

* Provide a thorough walk through of the visualization/dashboarding of a complete data application
* No coverage of data preparation/modeling
* Be able to use code as a template for your data projects

### Data

* Coronavirus cases and deaths for all countries and states
* Source - [John Hopkins GitHub repository](https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series)
* Data updates have recently stopped as of March 9, 2023

## Launch the Completed Dashboard

These instructions are also available at - https://github.com/tdpetrou/Build-an-Interactive-Data-Analytics-Dashboard-with-Python-Oreilly

### Setting up your environment

1. Verify you have Python 3.9+
1. Download the course material from the GitHub repository
1. Create the virtual environment
1. Launch the dashboard

### Verify you have Python 3.9+

1. Open your terminal/command prompt
1. If you installed Anaconda or Miniconda
    1. You should have a **base** environment
    1. Verify it is active by verifying that **`(base)`** is prepended to your prompt
    1. Run `python --version` to output the version
    1. If you have Python 3.8 or less upgrade by doing the following:
        1. Run `conda create -n py310 python=3.10`
        1. Run `conda deactivate`
        1. Run `conda activate py310`
        1. Now the `python` command is mapped to Python 3.10
1. If you don't use Anaconda or Miniconda you will need to verify you have at least Python 3.9 and complete an upgrade if necessary on your own

### Clone the GitHub repository

1. Navigate to the [course page][1] and click on the green **code** button
1. Click on the **Download ZIP** link from the dropdown menu. If you know git, you can clone the repository
1. Unzip the contents and move the folder to a proper location in your file system (i.e. do not keep it in your downloads folder)

### Create the virtual environment

1. Using the `cd` command
    1. Navigate to the folder you just unzipped and moved from above
    1. Navigate into the `project` directory
1. Run the command `python -m venv dashboard_venv`. This creates a new virtual environment named **dashboard_venv**
1. Deactivate the conda environment with `conda deactivate`
1. Activate the virtual environment with the following command:
    1. Mac/Linux - `source dashboard_venv/bin/activate`
    2. Windows - `dashboard_venv\Scripts\activate.bat`
1. There should be `(dashboard_venv)` prepended to your prompt
1. Run `pip install -U pip` to upgrade pip to the latest version
1. Run `pip install -r requirements.txt` to install all the necessary packages into this environment. This will take some time to complete

### Launch the dashboard

1. Run the command `python dashboard.py`
1. The following text should be printed to the screen - **Dash is running on http://127.0.0.1:8050/**
1. Open your web browser and navigate to 127.0.0.1:8050
1. You should see the coronavirus forecasting dashboard


[1]: https://github.com/tdpetrou/Build-an-Interactive-Data-Analytics-Dashboard-with-Python-Oreilly

## Get Introduced to the Data

Definitions:

* **group** - either **world** or **usa**
* **area** - a country or US state

The `notebooks/data` folder contains:

* `all_data.csv` - one row per area per date. Contains both historical and predicted values
* `summary.csv` - total deaths/cases for each area. No predictions
* `population.csv` - population and code for each area

In [None]:
# notice that last date is later than current day
# this dataset has both historical and predicted data
import pandas as pd
df_all = pd.read_csv('data/all_data.csv', parse_dates=['date'])
df_all.tail()

In [None]:
# one row per area - total deaths/cases per area
df_summary = pd.read_csv('data/summary.csv', parse_dates=['date'])
df_summary.head()

In [None]:
# population and code for each area
pd.read_csv('data/population.csv').head()

### Select Texas data

In [None]:
df_texas = df_all.query('group == "usa" and area == "Texas"')
df_texas = df_texas.set_index('date')
df_texas.head()

### Get the last actual date and first prediction date

In [None]:
last_date = df_summary['date'].iloc[0]
first_pred_date = last_date + pd.Timedelta('1D')
last_date, first_pred_date

## Visualizations with Plotly

* Plotly - Python library that creates interactive data visualizations for the web
    * [Documentation][1]
    
### Plotly vs Dash

* Both are products of the company Plotly
* Both are free and open source with an enterprise version available for extra features and services
* Closely related but different purposes
* Plotly creates visualizations, producing independent HTML files (with JavaScript and CSS) that can be embedded on any page, including notebooks
* Dash creates the dashboards with tools such as data tables, tabs, dropdowns, radio buttons, and many more widgets. It also runs the application, allowing an interactive experience for the users. All graphs in a dash application are created from the plotly library. 

We will build our application with Dash, but must learn Plotly first

### Introduction to Plotly

* Huge library
* Cover the fundamentals

[1]: https://plotly.com/python/
[2]: https://plotly.com/python/plotly-fundamentals/

### General steps to create a plotly graph

* Multiple ways to create graphs
* Will show one straightforward path
* Documentation suggests to use plotly express
* We will will NOT do this as plotly express cannot create all graphs

The following three steps will be used to create our graphs:

1. Create Figure - with `go.Figure` or `make_subplots`
2. Add trace - with `fig.add_*`
3. Update layout - with `fig.update_layout` or `fig.update_*`

## Plotly Figure Object

* Create an empty figure by import `graph_objects` module

In [None]:
import plotly.graph_objects as go
fig = go.Figure()
fig

### Adding traces

* A **trace** is a type of plot (scatter, bar, pie, histogram, etc...)
* Use one of the `add_*` methods
* [Visit this reference page][1] to see all traces

### Adding a line

* No `add_line` method
* Must use `add_scatter` with `mode` set to one of the following:
    * `"lines"` - connect the points without showing the markers
    * `"markers"` - show just the markers
    * `"lines+markers"` - connect the points and show the markers
* Set `x` and `y` parameters

[1]: https://plotly.com/python/reference/index/

In [None]:
x = df_texas.index
y = df_texas['Deaths']
fig = go.Figure()
fig.add_scatter(x=x, y=y, mode="lines")

### Updating the layout

In plotly, the **layout** consists of the following graph properties plus several more:

* height
* width
* title
* xaxis/yaxis
* legend
* margin
* annotations

In [None]:
fig = go.Figure()
fig.add_scatter(x=x, y=y, mode="lines+markers")
fig.update_layout(height=400, 
                  width=800,
                  title="COVID-19 Deaths in Texas")

### Finding all of the layout properties

* The `update_layout` method does not show any of its properties in its docstrings
* View [the layout reference page][1] for a complete (very long) list of layout options
* Many properties are **nested**
* Use the Jupyter Notebook to find properties with `fig.layout` + **tab**


![2]

From here, choose one of the properties and press **shift + tab + tab** to reveal the docstrings. Below, the docstrings for the `title` property are shown.

![3]

[1]: https://plotly.com/python/reference/layout/
[2]: images/layout_props.png
[3]: images/layout_docs.png

### Explore properties

* Some are deeply nested (font)
* Can expand pop-up menu documentation so that it remains on the bottom of screen

### Update title

In [None]:
fig.update_layout(title={
    "text": "COVID-19 Deaths in Texas",
    "x": .5,
    "y": .85,
    "font": {
        "color": "blue",
        "family": "dejavu sans",
        "size": 25
    }
})

## Creating a figure with multiple traces

* Continue calling `fig.add_*` methods
* Each successive trace will have a new color
    * [Default qualitative color sequence][1]
* Use `name` parameter to set legend label

Split actual and predicted data into separate DataFrames

[1]: https://plotly.com/python/discrete-color/#color-sequences-in-plotly-express

In [None]:
df_texas_actual = df_texas.loc[:last_date]
df_texas_pred = df_texas.loc[first_pred_date:]

Plot both actual and predicted lines in same plot

Plot both actual and predicted **bar** plots

## Creating subplots

* Use `make_subplots` to create a grid of plots
    * Set `rows` and `cols` parameters to integers
* In the `add_*` methods set `row` and `col` parameters to specific grid location

In [None]:
from plotly.subplots import make_subplots

### Cleaning up the subplots

* Use two for-loops to plot two traces on each graph
* Colors made to be the same for both subplots

In [None]:
from plotly.colors import qualitative
COLORS = qualitative.T10[:2]
KINDS = 'Deaths', 'Cases'
dfs = {'actual': df_texas_actual, 'prediction': df_texas_pred}

fig = make_subplots(rows=2, cols=1, vertical_spacing=0.1)
for row, kind in enumerate(KINDS, start=1):
    for (name, df), color in zip(dfs.items(), COLORS):
        fig.add_scatter(x=df.index, 
                        y=df[kind], 
                        mode="lines+markers", 
                        name=name,
                        line={"color": color},
                        row=row,
                        showlegend=row==1,
                        col=1)
    
fig.update_layout(title={"text": "Texas", 
                         "x": 0.5, 
                         "y": 0.97, 
                         "font": {"size": 20}})
fig

## Adding annotations

* Need to add titles to subplots
* Can do in `make_subplots` with `subplot_titles`, but no control
* Instead use `fig.add_annotation` or `annotations` parameter in `fig.update_layout`
    * `annotations` parameter in `fig.update_layout`
    * Use a list of dictionaries
* `margin`
    * Space between the four edges and the figure
    * Default is 80 pixels for left/right margins & 100 top/bottom
    * Decrease to fill out figure
* `fig.update_annotations`
    * updates all annotations

In [None]:
fig.update_layout(
            annotations=[
                {"y": 0.95, "text": "<b>Deaths</b>"},
                {"y": 0.3, "text": "<b>Cases</b>"},
            ],
            margin={"t": 40, "l": 50, "r": 10, "b": 0},
            legend={
                "x": 0.5, 
                "y": -0.05, 
                "xanchor": "center", 
                "orientation": "h", 
                "font": {"size": 15}},
        )
annot_props = {
        "x": 0.1,
        "xref": "paper",
        "yref": "paper",
        "xanchor": "left",
        "showarrow": False,
        "font": {"size": 18},
    }
fig.update_annotations(annot_props)
fig

## Experiment with layout changes

## Choropleth maps

* [Choropleth trace][1] - creates a variety of polygons (states and countries for our project) colored by the value of a numeric variable
* Use `add_choropleth`

[1]: https://plotly.com/python/reference/choropleth/

In [None]:
fig = go.Figure()
fig.add_choropleth()

### Coloring countries by deaths

* Use summary table to get countries with at least 1 million population

In [None]:
df_world = df_summary.query("group == 'world' and population > 1")
df_world.head(3)

* Each country has a [standardized ISO-3 code][1] that plotly understands
* Assign these codes and the deaths column as their own variables

[1]: https://en.wikipedia.org/wiki/ISO_3166-1_alpha-3

In [None]:
locations = df_world['code']
z = df_world['Deaths']

### Create choropleth with coronavirus info

* Set:
    * `locations` - list of ISO codes
    * `z` - number of deaths
    * `zmin` - min number for scale
    * `colorscale` - set to a [continuous scale][1]
    

[1]: https://plotly.com/python/builtin-colorscales/

In [None]:
fig = go.Figure()
fig.add_choropleth(locations=locations, z=z, zmin=0, colorscale="orrd")
fig.update_layout(margin={"t": 0, "l": 10, "r": 10, "b": 0})

### Selecting a better range and projection

* Don't show most northern/southern areas
* Use a different [projection][1] to choose from. Projection "robinson" is chosen below, but feel free to experiment with others. We can select the latitude and longitude range, and the projection by setting the `geo` parameter in `update_layout`.

[1]: https://plotly.com/python/map-configuration/#map-projections

In [None]:
fig = go.Figure()
fig.add_choropleth(locations=locations, 
                   z=z, 
                   zmin=0, 
                   colorscale="orrd",  
                   marker_line_width=0.5)
fig.update_layout(
    geo={
        "showframe": False,
        "lataxis": {"range": [-37, 68]},
        "lonaxis": {"range": [-130, 150]},
        "projection": {"type": "robinson"}
    },
    margin={"t": 0, "l": 10, "r": 10, "b": 0})

### Customizing the hover text

* Add all statistics to hover
* Create Series with info

In [None]:
def hover_text(x):
    name = x["area"]
    deaths = x["Deaths"]
    cases = x["Cases"]
    deathsm = x["Deaths per Million"]
    casesm = x["Cases per Million"]
    pop = x["population"]
    return (
        f"<b>{name}</b><br>"
        f"Deaths - {deaths:,.0f}<br>"
        f"Cases - {cases:,.0f}<br>"
        f"Deaths per Million - {deathsm:,.0f}<br>"
        f"Cases per Million - {casesm:,.0f}<br>"
        f"Population - {pop:,.0f}M"
    )

text = df_world.apply(hover_text, axis=1)
text.head()

* Set `text` parameter
* Must set `hoverinfo` to `'text'` to only use 

In [None]:
fig = go.Figure()
fig.add_choropleth(locations=locations, z=z, zmin=0, colorscale="orrd", 
                   marker_line_width=0.5, text=text, 
                   hoverinfo="text"
                  )
fig.update_layout(
    geo={
        "showframe": False,
        "lataxis": {"range": [-37, 68]},
        "lonaxis": {"range": [-130, 150]},
        "projection": {"type": "robinson"}
    },
    margin={"t": 0, "l": 10, "r": 10, "b": 0})

### USA Choropleth

* Set `locationmode` to `"USA-states"`
* Set `projection` to "albers usa" which moves Alaska and Hawaii near the other 48 states
* Colored by "Cases per Million".

In [None]:
df_states = df_summary.query("group == 'usa'")
locations = df_states['code']
z = df_states['Cases per Million']
text = df_states.apply(hover_text, axis=1)

fig = go.Figure()
fig.add_choropleth(locations=locations, locationmode='USA-states', z=z, zmin=0, 
                   colorscale="orrd", marker_line_width=0.5, text=text, hoverinfo="text")
fig.update_layout(
    geo={
        "showframe": False,
        "projection": {"type": "albers usa"}
    },
    margin={"t": 0, "l": 10, "r": 10, "b": 0})

## Plotly Summary

Plotly is a great tool for creating interactive data visualizations for the web. The three main steps for creating a visualization are:

1. Create Figure - with `go.Figure` or `make_subplots`
2. Add trace - with `fig.add_*`
3. Update layout - with `fig.update_layout` or `fig.update_*`

### Traces

* A trace is plotly terminology for a "kind of plot" (scatter, bar, pie, box, choropleth, etc...)
* Find the trace you want on [the left side of this page][1]
    * Or type `fig.add_` and press tab
* Read documentation for a specific trace once selected e.g. `fig.add_scatter` -> shift + tab + tab
* Add as many traces as you want to one figure

### Layout

* The layout is where properties such as height, width, title, xaxis/yaxis, legend, annotations, etc... are set
* Use `fig.update_layout` to set properties for entire figure
* Documentation does NOT show parameters with `fig.update_layout`
    * Discover them with `fig.layout.` + tab
    * Read documentation on specific property `fig.layout.title` -> shift + tab + tab
    
### Subplots

* Create grid of subplots with `make_subplots` using `rows` and `cols`
* All trace methods, `fig.add_*`, have `row` and `col` to specify subplot
* Use `fig.update_layout` to change properties on entire figure
* Other `fig.update_*` methods exist that have `row` and `col` parameters to change specific subplot

### Choropleth

* Colored polygons (countries and states for our project)
* Some properties are in `fig.add_choropleth`, others are in `fig.update_layout` using `geo` parameter
* Set `locations` to be code (ISO-3 for countries and two-character abbreviation for states)
* Set `locationmode` to be "USA-States" for USA
* Set projection and range (`latrange`/`lonrange`) for world
* Set projection to be "albers usa" for usa

[1]: https://plotly.com/python/reference/index/