# Data Visualization with Plotly Demo

## Introduction to Jupyter Notebook
Jupyter Notebooks are a staple in any data scientist's toolkit. It is a free, open source, interactive data science environment that can function as both an IDE and a visualisation tool. A Jupyter Notebook is a single document where you can run code, display the output and add equations and explainations. Each notebook is a `.ipynb` file, which is a text file that describes the content of the notebook in JSON format.

Each Jupter Notebook contains a kernal that can be thought of as a "computational engine" that executes the code within the notebook. Notebooks are made up of a number of cells. For example, this piece of text you are reading resides in the first cell of this notebook. They can be markdown cells that display text in-place or code cells. When a code cell is run, the output is displayed below the cell. The order in which cells are run matters! Cells containing functions or variables have to be run before those same functions or variables can be called from a subsequent cell. 

How to use a Jupyter Notebook:
- To run a cell, either click the arrow to the left of the cell or press `ctrl + Enter` after selecting the cell. When a cell is run, a number will appear in square brackets (e.g. [1]) telling you the order in which each cell is run.
- To interrupt a cell while it is running, press the button with the black square in the toolbar at the top
- To restart the kernal, right-click `kernel` and choose from the list of restart options available


## Introduction to Plotly

Pandas is an open source library providing data structure and data analysis tools for the Python language. Plotly is another open source that allows you to put together high quality graphs to faciliate the visualisation of the data. Plotly Dash (written on top of Plotly.js and React.js) allows one to quickly build data apps that are rendered in the browser. 

This notebook contains examples of how each of these libraries can be leveraged to analyse and visualise data. For more information, please check out the official documentation listed below.

#### Further Documentation
https://pandas.pydata.org/docs/ \
https://plot.ly/python/ \
https://dash.plotly.com/introduction 

## Setting Up

You can install the libraries using pip or conda. 

**N.B.** you may have to restart the kernel after installing these packages for your first run.

In [None]:
#!/bin/env python

# install packages
!pip install --user pandas
!pip install --user numpy
!pip install --user matplotlib
!pip install --user plotly
!pip install --user jupyter-dash
!pip install --user nbformat

Having installed the libraries, you can import them as follows.

In [None]:
# import libraries
%matplotlib inline

import requests
import json
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from jupyter_dash import JupyterDash
from dash import dcc
from dash import html
from dash.dependencies import Input, Output, State

# Set display row/column to show all data
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

## Access Data From Endpoint

#### Further Documentation
https://docs.python-requests.org/en/master/

**N.B.** the url used in this example is from the demo project we have set up. Please replace it with your own url.

In [None]:
# define endpoint url
url = "https://gee-team-test1.ew.r.appspot.com/api/text"

# use requests library to send HTTP requests
# in this example, GET sentiment analysis data
data = json.loads(requests.get(url).text)

# examine data
data

## Data Visualisation

Plotly is a commonly-used data visualisation library. The following examples will show you how to create different graphs from the sample data.

We can first read the sample data into a dataframe. The sample data is taken from the UK Met Office and shows the maximum and minimum temperature, the rainfall and the number of hours of sunlight for each month in 2018.

In [None]:
# read sample csv data into dataframe
weather = pd.read_csv('SampleData_Weather.csv')
weather

To gain more insight into a particular column, you can use the *describe()* method on the dataframe column name.

In [None]:
# describe the monthly rainfall
rain_data = weather.Rain.describe()
print(rain_data)

#### 1D Line Graph

In [None]:
# plot max temp. vs month

max_temp_fig = px.line(weather, x='Month', y='Tmax')
max_temp_fig.show()


# you can compare this to the following line of code, which uses the more standard matplotlib library to plot the same data

# weather.plot.line(y=’Tmax’, x=’Month’)

We can plot multiple lines on the same graph and edit the layout to make it look more sophisticated. We can also update the dataframe with the average temperature and show that on the graph too.

In [None]:
# calculate ave. temp. and create a new column in the dataframe 
weather['Tmed'] = (weather['Tmax'] + weather['Tmin'])/2

# inspect the first 5 rows
weather.head()

In [None]:
# plot max and min temp. vs month
min_temp = go.Scatter(x=weather['Month'], y=weather['Tmin'], name='Min Temp')
med_temp = go.Scatter(x=weather['Month'], y=weather['Tmed'], name='Ave Temp')
max_temp = go.Scatter(x=weather['Month'], y=weather['Tmax'], name='Max Temp')

min_max_temp_fig = go.Figure()

min_max_temp_fig.add_trace(min_temp)
min_max_temp_fig.add_trace(med_temp)
min_max_temp_fig.add_trace(max_temp)

# edit the layout
min_max_temp_fig.update_layout(title="Temperature Distribution",
                               xaxis_title='Month',
                               yaxis_title='Temperature (Celsius)')

min_max_temp_fig.show()


#### Bar Chart

In [None]:
# lot rainfall vs month

rainfall_fig = px.bar(weather, x='Month', y='Rain')
rainfall_fig.update_layout(title="Rainfall Distribution",
                           xaxis_title="Month",
                           yaxis_title='Rain')
rainfall_fig.show()

# The following line of code achieves the same thing using Matplotlib

# weather.plot.bar(y='Rain', x='Month')

#### Histogram

Histograms are useful for when you want to visualise the frequency distribution of the data.

In [None]:
rainfall_hist = px.histogram(weather, x='Rain', nbins=10) # you can specify the number of bins
rainfall_hist.update_layout(title="Frequency of Rainfall Amount",
                            bargap=0.1) # you can specify a gap between bars
rainfall_hist.show()

#### Multiple Charts

You can also create separate charts for each column of data. The following example shows separate line graphs of the four columns.

In [None]:
# multiple line charts 
rain = go.Scatter(x=weather['Month'], y=weather['Rain'], name="Rain")
sun = go.Scatter(x=weather['Month'], y=weather['Sun'], name="Sun")

subplots_fig = make_subplots(rows=2, cols=2,
                             subplot_titles=("Min Temp", "Max Temp", "Rain", "Sun"))

# use min_temp and max_temp plots from before
subplots_fig.add_trace(min_temp, row=1, col=1)
subplots_fig.add_trace(max_temp, row=2, col=1)
subplots_fig.add_trace(rain, row=1, col=2)
subplots_fig.add_trace(sun, row=2, col=2)

subplots_fig.update_layout(height=600, width=800, title_text="Subplots Demo")

subplots_fig.show()


## Introducing Jupyter Dash

Dash is Plotly's open source Python framework for building full stack analytic web applications using pure Python. The JupyterDash library makes these features available from the jupyter notebook.

In [None]:
### Run ngrok to tunnel Dash app port 8050 to the outside world. 
### This command runs in the background.
get_ipython().system_raw('./ngrok http 8050 &')

In [None]:
# get ID of the most recent 
last_text_id = list(data.keys())[0]

app = JupyterDash(__name__)

app.layout = html.Div([
    html.H1("JupyterDash Demo"),
    
    
    # THESE LINES DISPLAY THE OUTPUT OF NLP API
    html.P("Most Recent Text ID: {}".format(last_text_id)),
    html.P("Text Analysed: {}".format(data[last_text_id]["text"])),
    html.P("Sentiment: {}".format(data[last_text_id]["sentiment"])),
  
    # THESE LINES DEMO ONE OF THE DASH CORE COMPONENT(dcc) i.e. dcc.Input
    html.H3("Change the value in the text box to see callbacks in action!"),
    html.Div([
        "Input: ",
        dcc.Input(id='my-input', value='initial value', type='text')
    ]),
    html.Br(),
    html.Div(id='my-output'),
    
    # THESE LINES DEMO THE INTEGRATION OF PLOTLY GRAPHS WITH DASH
    dcc.Graph(figure=subplots_fig),

])


@app.callback(
    Output(component_id='my-output', component_property='children'),
    Input(component_id='my-input', component_property='value')
)
def update_output_div(input_value):
    return 'Output: {}'.format(input_value)


In [None]:
app.run_server(mode="external", port=8050)

#### In case the below cell has errors, please rerun it

In [None]:
### Get the public URL where you can access the Dash app. Copy this URL.
! curl -s http://localhost:4040/api/tunnels | python3 -c \
    "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"

**If you get a 'ERR_NGROK_6022' error when accessing the Dash app URL**
* Follow the steps to sign up for an account and copy the provided auth token
* Run the below cell, replacing the '<>' placeholders with your auth token to add the token to your ngrok configuration
* Comment out the cell as it only needs to be run once
* Wait a few minutes, restart the kernel and try accessing the Dash app URL again

In [None]:
# Comment out after running - only needs to be run once
get_ipython().system_raw('./ngrok config add-authtoken <your-authtoken-here>')