# Interactive Plots of COVID-19 Data
This is a notebook to interact with COVID-19 data using [Jupyter](https://jupyter.org/) and [Hvplot](https://hvplot.holoviz.org/). Currently we are focused on data from the US but may expand our analyses in the near future.

## Load Johns Hopkins COVID-19 Data
Here we load the COVID-19 confirmed case data from the [The Center for Systems Science and Engineering (CSSE)](https://systems.jhu.edu) at Johns Hopkins University. The CCSE COVID-19 [GitHub Repo](https://github.com/CSSEGISandData/COVID-19) has more information about these data and their sources.

In [34]:
import numpy as np
import pandas as pd
pd.set_option('display.max_rows', 1000)
#import hvplot.pandas

In [35]:
dr='https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/'

In [57]:
src = dr + 'time_series_covid19_confirmed_global.csv'

In [58]:
src2 = dr + 'time_series_covid19_deaths_global.csv'

In [59]:
src3 = dr + 'time_series_covid19_recovered_global.csv'

In [60]:
df = pd.read_csv(src)
df.rename(columns={'Country/Region': 'country', 'Province/State': 'state',
                   'Lat': 'lat', 'Long': 'lon'}, inplace = True)
df = df[(df.country=='US') & (df.state!='Diamond Princess') & 
        (df.state!='Grand Princess')].reset_index(drop=True)
df.columns = df.columns[0:4].append(pd.to_datetime(df.columns[4:]))

In [61]:
df

Unnamed: 0,state,country,lat,lon,2020-01-22 00:00:00,2020-01-23 00:00:00,2020-01-24 00:00:00,2020-01-25 00:00:00,2020-01-26 00:00:00,2020-01-27 00:00:00,...,2020-03-17 00:00:00,2020-03-18 00:00:00,2020-03-19 00:00:00,2020-03-20 00:00:00,2020-03-21 00:00:00,2020-03-22 00:00:00,2020-03-23 00:00:00,2020-03-24 00:00:00,2020-03-25 00:00:00,2020-03-26 00:00:00
0,,US,37.0902,-95.7129,1,1,2,2,5,5,...,6421,7783,13677,19100,25489,33276,43847,53740,65778,83836


In [40]:
df2 = pd.read_csv(src2)
df2.rename(columns={'Country/Region': 'country', 'Province/State': 'state',
                   'Lat': 'lat', 'Long': 'lon'}, inplace = True)
df2 = df2[(df2.country=='US') & (df2.state!='Diamond Princess') & 
        (df2.state!='Grand Princess')].reset_index(drop=True)
df2.columns = df2.columns[0:4].append(pd.to_datetime(df2.columns[4:]))

In [41]:
df3 = pd.read_csv(src3)
df3.rename(columns={'Country/Region': 'country', 'Province/State': 'state',
                   'Lat': 'lat', 'Long': 'lon'}, inplace = True)
df3 = df3[(df3.country=='US') & (df3.state!='Diamond Princess') & 
        (df3.state!='Grand Princess')].reset_index(drop=True)
df3.columns = df3.columns[0:4].append(pd.to_datetime(df3.columns[4:]))

In [42]:
df

Unnamed: 0,state,country,lat,lon,2020-01-22 00:00:00,2020-01-23 00:00:00,2020-01-24 00:00:00,2020-01-25 00:00:00,2020-01-26 00:00:00,2020-01-27 00:00:00,...,2020-03-17 00:00:00,2020-03-18 00:00:00,2020-03-19 00:00:00,2020-03-20 00:00:00,2020-03-21 00:00:00,2020-03-22 00:00:00,2020-03-23 00:00:00,2020-03-24 00:00:00,2020-03-25 00:00:00,2020-03-26 00:00:00
0,,US,37.0902,-95.7129,1,1,2,2,5,5,...,6421,7783,13677,19100,25489,33276,43847,53740,65778,83836


In [43]:
df['country'] = df.apply(lambda x: (x.country,x.state), axis=1)
df

Unnamed: 0,state,country,lat,lon,2020-01-22 00:00:00,2020-01-23 00:00:00,2020-01-24 00:00:00,2020-01-25 00:00:00,2020-01-26 00:00:00,2020-01-27 00:00:00,...,2020-03-17 00:00:00,2020-03-18 00:00:00,2020-03-19 00:00:00,2020-03-20 00:00:00,2020-03-21 00:00:00,2020-03-22 00:00:00,2020-03-23 00:00:00,2020-03-24 00:00:00,2020-03-25 00:00:00,2020-03-26 00:00:00
0,,"(US, nan)",37.0902,-95.7129,1,1,2,2,5,5,...,6421,7783,13677,19100,25489,33276,43847,53740,65778,83836


In [44]:
dfsub=df.loc[:,"country" ::] 
dfsub

Unnamed: 0,country,lat,lon,2020-01-22 00:00:00,2020-01-23 00:00:00,2020-01-24 00:00:00,2020-01-25 00:00:00,2020-01-26 00:00:00,2020-01-27 00:00:00,2020-01-28 00:00:00,...,2020-03-17 00:00:00,2020-03-18 00:00:00,2020-03-19 00:00:00,2020-03-20 00:00:00,2020-03-21 00:00:00,2020-03-22 00:00:00,2020-03-23 00:00:00,2020-03-24 00:00:00,2020-03-25 00:00:00,2020-03-26 00:00:00
0,"(US, nan)",37.0902,-95.7129,1,1,2,2,5,5,5,...,6421,7783,13677,19100,25489,33276,43847,53740,65778,83836


In [45]:
dfm=pd.melt(dfsub, id_vars=dfsub.columns.values[0:3], var_name="Date", value_name="Value")
dfm

Unnamed: 0,country,lat,lon,Date,Value
0,"(US, nan)",37.0902,-95.7129,2020-01-22,1
1,"(US, nan)",37.0902,-95.7129,2020-01-23,1
2,"(US, nan)",37.0902,-95.7129,2020-01-24,2
3,"(US, nan)",37.0902,-95.7129,2020-01-25,2
4,"(US, nan)",37.0902,-95.7129,2020-01-26,5
5,"(US, nan)",37.0902,-95.7129,2020-01-27,5
6,"(US, nan)",37.0902,-95.7129,2020-01-28,5
7,"(US, nan)",37.0902,-95.7129,2020-01-29,5
8,"(US, nan)",37.0902,-95.7129,2020-01-30,5
9,"(US, nan)",37.0902,-95.7129,2020-01-31,7


In [46]:
dfm.rename(columns = {'country':'id'}, inplace = True)
dfm

Unnamed: 0,id,lat,lon,Date,Value
0,"(US, nan)",37.0902,-95.7129,2020-01-22,1
1,"(US, nan)",37.0902,-95.7129,2020-01-23,1
2,"(US, nan)",37.0902,-95.7129,2020-01-24,2
3,"(US, nan)",37.0902,-95.7129,2020-01-25,2
4,"(US, nan)",37.0902,-95.7129,2020-01-26,5
5,"(US, nan)",37.0902,-95.7129,2020-01-27,5
6,"(US, nan)",37.0902,-95.7129,2020-01-28,5
7,"(US, nan)",37.0902,-95.7129,2020-01-29,5
8,"(US, nan)",37.0902,-95.7129,2020-01-30,5
9,"(US, nan)",37.0902,-95.7129,2020-01-31,7


In [47]:
dfm.to_csv('covid.csv', index=False)


In [None]:
state = df.state.str.split(',').apply(lambda x: x[-1].strip())
county = df.state.str.split(',').apply(lambda x: x[0].strip())
county[~df.state.str.contains(',')] = None

In [None]:
df.state = state
df.insert(0, 'county', county)
df.head()

## Plot All US Cases on Log Scale
Below is a quick plot of all confirmed cases in the US on a logarithmic scale. 

Hvplot creates holoviews objects, and the `*` symbol means [overlay](http://holoviews.org/reference/containers/bokeh/Overlay.html).  See [holoviz plot customization](http://holoviews.org/user_guide/Customizing_Plots.html) for available options.  

In [None]:
opts = {'legend':True, 'logy': True, 'grid': True, 'width': 700, 'height': 400,
        'title': 'Confirmed cases of COVID-19 in the USA', 'padding':0.1 }
s = df.select_dtypes(np.int).sum()
s.name = 'USA'
lines = s.hvplot(**opts) 
dots  = s.hvplot.scatter(**opts)
usa = lines * dots
usa

In [None]:
df = pd.read_csv(src)
df.rename(columns={'Country/Region': 'country', 'Province/State': 'state',
                   'Lat': 'lat', 'Long': 'lon'}, inplace = True)
df.head()

In [None]:
df2 = pd.read_csv(src2)
df2.rename(columns={'Country/Region': 'country', 'Province/State': 'state',
                   'Lat': 'lat', 'Long': 'lon'}, inplace = True)
df2.head()

In [None]:
df3 = pd.read_csv(src3)
df3.rename(columns={'Country/Region': 'country', 'Province/State': 'state',
                   'Lat': 'lat', 'Long': 'lon'}, inplace = True)
df3.head()

In [None]:
def country(name='USA'):
    conf = df[(df.country==name)]
    death = df2[(df2.country==name)]
    reco = df3[(df3.country==name)]
    opts = {'legend': True, 'logy': True, 'grid': True, 'width':950, 'height': 300,
        'title': f'Cases of COVID-19 in {name}', 'padding':0.1, 'xticks':10,
        'ylim':(1.0,1.0e3)}
    s = conf.select_dtypes(np.int).sum()
    s2 = death.select_dtypes(np.int).sum()
    s3 = reco.select_dtypes(np.int).sum()
    s.name = name + ' conf'
    s2.name = name + ' death'
    s3.name = name + ' reco'
    linec = s.hvplot(**opts)
    lined = s2.hvplot(**opts)
    liner = s3.hvplot(**opts)

    return linec, lined, liner

In [None]:
usa = country(name='US')
china = country(name='China')
italy = country(name='Italy')
turkey = country(name='Turkey')
japan = country(name='Japan')

In [None]:
(china[0] * china[1] * china[2]).opts(title_format='Cases of COVID-19', ylim=(1.0,1.0e5), legend_position='top_left')

In [None]:
(italy[0] * italy[1] * italy[2]).opts(title_format='Cases of COVID-19', ylim=(1.0,1.0e5))

In [None]:
(usa[0] * usa[1] * usa[2]).opts(title_format='Cases of COVID-19', ylim=(1.0,1.0e5), legend_position='top_left')

In [None]:
(usa[0] * china[0] * italy[0] * turkey[0] * japan[0]).opts(title_format='Cases of COVID-19', ylim=(1.0,1.0e5), legend_position='top_left')

In [None]:
(usa[1] * china[1] * italy[1] * turkey[1] * japan[1]).opts(title_format='Cases of COVID-19', ylim=(1.0,1.0e4), legend_position='top_left')

## Single State Example
Here is an example of plotting data from a single US state.

In [None]:
MA = df[(df.state=='MA') | (df.state=='Massachusetts')]

In [None]:
opts = {'legend': False, 'logy': True, 'grid': True, 'width': 700, 'height': 400,
        'title': f'Confirmed cases of COVID-19 in Massachusetts ', 'padding':0.1,
        'ylim':(1.0,1.0e3)}
s = MA.select_dtypes(np.int).sum()
lines = s.hvplot(**opts)
dots = s.hvplot.scatter(**opts)
lines * dots

## Multiple Region Example

Turn the above code for a single state into a function so it's easier to explore several states

In [None]:
def state(name='Massachusetts', code='MA'):
    state = df[(df.state==name) | (df.state==code)]
    opts = {'legend': True, 'logy': True, 'grid': True, 'width': 700, 'height': 400,
        'title': f'Confirmed cases of COVID-19 in {code}', 'padding':0.1,
        'ylim':(1.0,1.0e3)}
    s = state.select_dtypes(np.int).sum()
    s.name = code
    lines = s.hvplot(**opts)
    dots = s.hvplot.scatter(**opts)
    hstate = lines * dots
    return hstate

In [None]:
ma = state(name='Massachusetts', code='MA')

In [None]:
mi = state(name='Louisiana', code='LA')

In [None]:
pa = state(name='Pennsylvania', code='PA')

In [None]:
(ma * mi * pa * usa).opts(title_format='Confirmed cases of COVID-19', ylim=(1.0,1.0e4))