# Visualizing Covid-19 data using plotly and ipywidgets

In this notebook, we'll explore Covid-19 data posted daily on the ECDC website. We'll use plotly to generate maps and charts to help us make sense of the spread of the virus throughout the world, and illustrate the capabilities of ipywidgets to make these charts interactive and enhance their flexibility.

## 0. Imports
Below are the libraries used in this project. I've included ```requirements.txt``` and ```environment.yml``` files in the same folder for reproducibility.

In [1]:
import math
import requests

import ipywidgets as widgets
from ipywidgets import interactive_output, HBox, Layout, interact_manual
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.io as pio

In [2]:
pio.renderers.default = "iframe"

## 1. Getting the data

### 1.1 ECDC daily data

The European Centre for Disease Prevention and Control (ECDC for short) has a wealth of resources on their [website](https://www.ecdc.europa.eu/en), including a file containing Covid-19 cases and deaths split by country and reported date that is, as of the time of writing, updated daily and posted [here](https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide). We'll use this as the basis for our dataset, and add some information via other pages. We'll start by writing a helper function to retrieve the data using ```requests```, read it and return a clean DataFrame using ```pandas```. If you're not familiar with these libraries, I highly recommend you peruse the docs and learn how to use them, as they're ubiquitous in the data science world. You can find the documentation for ```requests``` [here](https://requests.readthedocs.io/en/master/) and ```pandas``` [here](https://pandas.pydata.org/docs/).

In [3]:
def read_latest_ecdc_file(file_link=None, file_date=None, max_consecutive_dates=5, walk_back=True):
    """
    Returns a DataFrame of the latest Covid-19 numbers by country as posted daily on the ECDC website.
    
    Args:
        file_link (str): Download link for daily .xls file. If file_link is None, the function defaults to the last known location as a convenience.
        file_date (Union[str, datetime.date, pd.Timestamp]): Starting date for file download tries. If file_date is None, the function defaults to today.
        max_consecutive_dates (int): Number of days to walk back or forward if the current date's file is not available.
        walk_back (bool): Way to increment dates for file matching in time. If walk_back is True, the function will decrement dates until max_consecutive_dates is reached or a valid link is found. 
        Otherwise, the function will increment dates.
        
    Returns:
        pd.DataFrame
    """    
    if file_link is None:
        file_link = "https://www.ecdc.europa.eu/sites/default/files/documents/COVID-19-geographic-disbtribution-worldwide-"
    
    if file_date is None:
        file_date = pd.Timestamp.today()
    else:
        file_date = pd.Timestamp(file_date)
    
    try:
        latest_link = file_link + "{:%Y-%m-%d}.xlsx".format(file_date)
        resp = requests.get(latest_link)
    except:
        print("Invalid URL.")
        return

    while resp.status_code != 200 and max_consecutive_dates > 1:
        print("File retrieval failed for {:%Y-%m-%d}".format(file_date))
        if walk_back:
            file_date -= pd.Timedelta("1d")
        else:
            file_date += pd.Timedelta("1d")
        max_consecutive_dates -= 1
        latest_link = file_link + "{:%Y-%m-%d}.xlsx".format(file_date)
        resp = requests.get(latest_link)
    
    if resp.status_code != 200:
        print("File retrieval failed for {:%Y-%m-%d}.".format(file_date))
        print("Maximum number of consecutive dates reached. Please check if URL is correct or expand date window using max_consecutive_dates argument.")
        return
    
    print("Latest file date: {:%Y-%m-%d}.".format(file_date))
    
    df = pd.read_excel(resp.content)
    df.columns = ["date_rep", "day", "month", "year", "cases", "deaths", "country", "alpha_2_code", "alpha_3_code", "population_2018", "ecdc_continent"]
    
    return df

In [4]:
df_ecdc = read_latest_ecdc_file()
df_ecdc.head()

Latest file date: 2020-04-21.


Unnamed: 0,date_rep,day,month,year,cases,deaths,country,alpha_2_code,alpha_3_code,population_2018,ecdc_continent
0,2020-04-21,21,4,2020,35,2,Afghanistan,AF,AFG,37172386.0,Asia
1,2020-04-20,20,4,2020,88,3,Afghanistan,AF,AFG,37172386.0,Asia
2,2020-04-19,19,4,2020,63,0,Afghanistan,AF,AFG,37172386.0,Asia
3,2020-04-18,18,4,2020,51,1,Afghanistan,AF,AFG,37172386.0,Asia
4,2020-04-17,17,4,2020,10,4,Afghanistan,AF,AFG,37172386.0,Asia


### 1.2 Wikipedia ISO country codes

We'll use the Wikipedia page of all [ISO country codes](https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes) to make sure our data includes all possible countries and the correct codes. Again, we'll write a helper function to retrieve this data and return a clean DataFrame to easily generate daily updates. Instead of using ```requests``` as we did previously, we'll leverage the ```read_html``` function included with ```pandas``` to easily retrieve tables from webpages. If you're not familiar with this function, read the official documentation [here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_html.html).

In [5]:
def read_wiki_iso_country_codes(page_link=None, table_index=0, matching_length=True):
    """
    Returns a DataFrame of country codes from Wikipedia.
    
    Args:
        page_link (str): Link to the Wikipedia page of ISO country codes.
        table_index (int): Index of the table within the HTML elements.
        matching_length (bool): If True, only keeps ISO codes that respect string length requirements for both columns.
        
    Returns:
        pd.DataFrame
    """
    # Get HTML tables
    if page_link is None:
        tables = pd.read_html(r"https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes")
    else:
        tables = pd.read_html(page_link)
    
    # Load & format
    df = tables[table_index]
    df.columns = df.columns.droplevel()
    df.columns = [c.split("[")[0].strip().lower().replace(" ", "_").replace("-", "_") for c in df.columns]
    df["alpha_2_code"].iloc[0] = "AF"
    
    # Filter out countries that have alpha_3_code longer than 3 characters
    if matching_length:
        len_before = df.shape[0]
        df = df[df["alpha_3_code"].str.len() == 3].copy()
        len_after = df.shape[0]
        print("{:,.0f} entries dropped from 3-letter ISO codes.".format(len_before - len_after))
        
    return df

In [6]:
df_wiki = read_wiki_iso_country_codes()
df_wiki.head()

31 entries dropped from 3-letter ISO codes.


Unnamed: 0,country_name,official_state_name,sovereignty,alpha_2_code,alpha_3_code,numeric_code,subdivision_code_links,internet_cctld
0,Afghanistan,The Islamic Republic of Afghanistan,UN member state,AF,AFG,4,ISO 3166-2:AF,.af
2,Åland Islands,Åland,Finland,AX,ALA,248,ISO 3166-2:AX,.ax
3,Albania,The Republic of Albania,UN member state,AL,ALB,8,ISO 3166-2:AL,.al
4,Algeria,The People's Democratic Republic of Algeria,UN member state,DZ,DZA,12,ISO 3166-2:DZ,.dz
5,American Samoa,The Territory of American Samoa,United States,AS,ASM,16,ISO 3166-2:AS,.as


### 1.3 Continents & regions

Finally, we'll also retrieve a table of regions/continents that will come in handy when grouping or filtering our data when we generate charts later. I found a table with the information I needed [here](http://statisticstimes.com/geography/countries-by-continents.php) by googling and built a helper function to retrieve the information in the unlikely case of an update, but any page you find will do as long as it's easily mergeable with the rest of the data (in this case, all tables have 3-letter ISO country codes, which will enable us to easily create a unified data set). Again, since the information is available directly on the webpage in a tabular form, we'll use ```pd.read_html``` to retrieve it.

In [7]:
def read_regions_continents_table(page_link=None, table_index=2):
    """
    Returns a DataFrame of region and continent classification for each country.
    
    Args:
        page_link (str): Link to the webpage.
        table_index (int): Index of the table within the HTML elements.
        
    Returns:
        pd.DataFrame
    """
    # Get HMTL tables
    if page_link is None:
        tables = pd.read_html(r"http://statisticstimes.com/geography/countries-by-continents.php")
    else:
        tables = pd.read_html(page_link)
    
    # Load & format
    df = tables[table_index]
    df.drop(columns=["No"], inplace=True)
    df.columns = [c.strip().lower().replace(" ", "_").replace("-", "_") for c in df.columns]
    
    return df

In [8]:
df_geo_agg = read_regions_continents_table()
df_geo_agg.head()

Unnamed: 0,country_or_area,iso_alpha3_code,m49_code,region_1,region_2,continent
0,Afghanistan,AFG,4,Southern Asia,,Asia
1,Åland Islands,ALA,248,Northern Europe,,Europe
2,Albania,ALB,8,Southern Europe,,Europe
3,Algeria,DZA,12,Northern Africa,,Africa
4,American Samoa,ASM,16,Polynesia,,Oceania


### 1.4 Full data set

Now that we have all the pieces, we can merge the data and generate a full data set to use in our visualizations. Note that ```plotly``` operates on tidy data (if you're not familiar with the term, more on that [here](https://www.jeannicholashould.com/tidy-data-in-python.html)) so we need to be careful in the way we generate our data. If you've used ```matplotlib```/```seaborn``` (or built-in plotting capabilities of ```pandas```, which rely on ```matplotlib```), you might be used to these libraries doing some of the data heavy lifting for you -- none of that here. We'll see however that it's worth a little extra work on the data to be able to leverage ```plotly```'s capabilities.

In this step, we'll also include some new computations/indicators, like the cumulative tally of cases and deaths as well as growth rates, percentage of population infected or deceased due to the disease. These are all easily computable from the ECDC data.

In [9]:
def create_full_dataset(df_ecdc=None, df_wiki=None, df_geo_agg=None):
    """
    Returns a clean dataset composed by assembling ECDC data and Wikipedia ISO country data. Ensures all countries are included and span all dates for compatibility with Plotly. 
    Adds additional metrics per country: cumulative cases, cumulative deaths, mortality rate, % of population infected, % of population deaths.
    
    Args:
        df_ecdc (pd.DataFrame): DataFrame from raw data published daily by the ECDC.
        df_wiki (pd.DataFrame): DataFrame of table of country codes from Wikipedia.
        df_geo_agg (pd.DataFrame): DataFrame of regional and continental classification for each country.
        
    Returns:
        pd.DataFrame
    """
    # Safety check
    if any(x is None for x in [df_ecdc, df_wiki, df_geo_agg]):
        print("One or more DataFrame is missing from the arguments, please check.")
        return
    
    # Format return DataFrame
    df = df_ecdc.copy()
    df.drop(columns=["day", "month", "year"], inplace=True)
    df["date_rep"] = df["date_rep"].astype(str)
    df["cum_cases"] = None
    df["cum_deaths"] = None
    df.dropna(subset=["alpha_3_code"], inplace=True)
    
    # Get list of all dates covered and missing countries
    all_dates = pd.date_range(df["date_rep"].min(), df["date_rep"].max()).astype(str).tolist()
    missing_countries = list(set(df_wiki["alpha_3_code"]).difference(set(df_ecdc["alpha_3_code"])))
    frames = []
    
    # Fill in dates for countries included in the ECDC DataFrame
    for alpha_3_code in df["alpha_3_code"].unique():
        try:
            df_country = df[df["alpha_3_code"] == alpha_3_code].copy()
            # Additional dates
            dates_to_add = list(set(all_dates).difference(set(df_country["date_rep"])))
            df_to_add = pd.DataFrame({"date_rep": dates_to_add})
            df_to_add["cases"] = 0
            df_to_add["deaths"] = 0
            df_to_add["country"] = df_country["country"].iloc[0]
            df_to_add["alpha_2_code"] = df_country["alpha_2_code"].iloc[0]
            df_to_add["alpha_3_code"] = alpha_3_code
            df_to_add["population_2018"] = df_country["population_2018"].iloc[0]
            # Concatenate both
            df_temp = pd.concat([df_country, df_to_add], ignore_index=True)
            df_temp.sort_values(by="date_rep", inplace=True)
            # Add cumulative counts
            df_temp["cum_cases"] = df_temp["cases"].cumsum()
            df_temp["cum_deaths"] = df_temp["deaths"].cumsum()
            frames.append(df_temp)
        except:
            print("Issue encountered while adding dates to country code {}".format(alpha_3_code))
        
    # Fill in missing countries
    for alpha_3_code in missing_countries:
        df_missing = pd.DataFrame({"date_rep": all_dates})
        df_missing["cases"] = 0
        df_missing["deaths"] = 0
        df_missing["country"] = df_wiki.loc[df_wiki["alpha_3_code"] == alpha_3_code, "country_name"].iloc[0]
        df_missing["alpha_2_code"] = df_wiki.loc[df_wiki["alpha_3_code"] == alpha_3_code, "alpha_2_code"].iloc[0]
        df_missing["alpha_3_code"] = alpha_3_code
        df_missing["population_2018"] = np.NaN
        df_missing["cum_cases"] = 0
        df_missing["cum_deaths"] = 0
        frames.append(df_missing)
    
    # Create full DataFrame and order by date
    df = pd.concat(frames, ignore_index=True)
    df.sort_values(by=["country", "date_rep"], inplace=True)
    # Add indicators
    df["mortality_rate"] = (df["cum_deaths"] / df["cum_cases"]).fillna(0)
    df["fraction_infected"] = df["cum_cases"] / df["population_2018"]
    df["fraction_deaths"] = df["cum_deaths"] / df["population_2018"]
    df["infections_growth_rate"] = df["cum_cases"].pct_change()
    df.loc[df["date_rep"] == df["date_rep"].min(), "infections_growth_rate"] = np.NaN
    df.loc[df["infections_growth_rate"] == math.inf, "infections_growth_rate"] = 1
    df["deaths_growth_rate"] = df["cum_deaths"].pct_change()
    df.loc[df["date_rep"] == df["date_rep"].min(), "deaths_growth_rate"] = np.NaN
    df.loc[df["deaths_growth_rate"] == math.inf, "deaths_growth_rate"] = 1
    # Merge region and continent classification
    df = df.merge(df_geo_agg[["iso_alpha3_code", "region_1", "region_2", "continent"]], how="left", left_on="alpha_3_code", right_on="iso_alpha3_code").drop(columns=["iso_alpha3_code"])
    # Clean
    df.loc[df["country"] == "Kosovo", "continent"] = "Europe"
    df.loc[df["country"] == "Taiwan", "continent"] = "Asia"
    df.loc[df["country"] == "Bonaire", "continent"] = "South America"
    df["country"] = df["country"].str.replace("_", " ")
    
    return df

In [10]:
df_clean = create_full_dataset(df_ecdc, df_wiki, df_geo_agg)
df_clean.head()

Unnamed: 0,date_rep,cases,deaths,country,alpha_2_code,alpha_3_code,population_2018,ecdc_continent,cum_cases,cum_deaths,mortality_rate,fraction_infected,fraction_deaths,infections_growth_rate,deaths_growth_rate,region_1,region_2,continent
0,2019-12-31,0,0,Afghanistan,AF,AFG,37172386.0,Asia,0,0,0.0,0.0,0.0,,,Southern Asia,,Asia
1,2020-01-01,0,0,Afghanistan,AF,AFG,37172386.0,Asia,0,0,0.0,0.0,0.0,,,Southern Asia,,Asia
2,2020-01-02,0,0,Afghanistan,AF,AFG,37172386.0,Asia,0,0,0.0,0.0,0.0,,,Southern Asia,,Asia
3,2020-01-03,0,0,Afghanistan,AF,AFG,37172386.0,Asia,0,0,0.0,0.0,0.0,,,Southern Asia,,Asia
4,2020-01-04,0,0,Afghanistan,AF,AFG,37172386.0,Asia,0,0,0.0,0.0,0.0,,,Southern Asia,,Asia


## 2. Visualizing static data with plotly

Now that our data set is ready, we can start exploring it visually using ```plotly```. If you're not familiar with it, ```plotly``` is an interactive, open-source plotting library built on top of JavaScript that enables us to create interactive visualizations to display in Jupyter notebooks (or alternatively saved to standalone HTML files or served as part of Python-built web applications using the Dash framework). More specifically, I'll be using ```plotly.express```, which is a high-level interface to ```plotly``` and enables us to concisely generate complex charts, much in the same way as ```matplotlib``` or ```seaborn```. You can find the documentation for ```plotly.express``` [here](https://plotly.com/python-api-reference/plotly.express.html) and a quick tutorial [here](https://plotly.com/python/px-arguments/). Plotly also has a bunch of different [tutorials and examples](https://plotly.com/python/) on their website - these go beyond ```plotly.express``` so make sure to check them out if you want to learn more about the library's offering.

If you're having any issues generating the below charts, make sure to run throught the [Jupyter notebook/lab setup instructions](https://plotly.com/python/getting-started/).

### 2.1 Heatmap

Plotly makes it very easy to generate maps in a few lines of python code. Let's try it out by using ```plotly.express``` to generate a heat map of various metrics from our dataset, which is called a ```choropleth```. You can have a look a the documentation for it [here](https://plotly.com/python-api-reference/generated/plotly.express.choropleth.html#plotly.express.choropleth). If you're looking for different types of maps, check out Plotly's [maps page](https://plotly.com/python/maps/) that covers various examples.

In [11]:
def generate_heatmap(df=df_clean.copy(), scope="world", plot_date=None, metric="cum_cases"):
    """
    Generates a heat map from ECDC Covid-19 country-level data.
    
    Args:
        df (pd.DataFrame): data to be visualized.
        scope (str): geographical scope of the map. Can be one of 'world', 'europe', 'asia', 'africa', 'north america', 'south america'. Defaults to 'world'.
        plot_date (str, datetime.date or pd.Timestamp): date to plot. Defaults to latest.
        metric (str): metric to plot. Can be one of 'cases', 'deaths', 'cum_cases', 'cum_deaths', 'mortality_rate', 'fraction_infected', 'fraction_deaths', 'infections_growth_rate', 'deaths_growth_rate'. Defaults to 'cum_cases'.
        
    Returns:
        None
    """
    # Title and legend dict
    title_legend_dict = {"cases": "new cases", "deaths": "new deaths", "cum_cases": "cumulative cases", "cum_deaths": "cumulative deaths", "mortality_rate": "mortality rate", "fraction_infected": "% of pop. infected",
                        "fraction_deaths": "% of pop. dead", "infections_growth_rate": "infections growth rate", "deaths_growth_rate": "deaths growth rate"}
    
    # Default args
    if plot_date is None:
        plot_date = df["date_rep"].max()
    else:
        plot_date = "{:%Y-%m-%d}".format(pd.Timestamp(plot_date))
        
    # Filter df to match scope & date
    if scope.lower() != "world":
        df = df[df["continent"] == scope.title()].copy()
        
    df = df[df["date_rep"] == plot_date].copy()
    
    # Checks
    if scope not in ["world", "europe", "asia", "africa", "north america", "south america"]:
        print("Value of scope not recognized ({}). Check function signature for supported values".format(scope))
        return
    else:
        scope = scope.lower()
    
    if "{:%Y-%m-%d}".format(pd.Timestamp(plot_date)) not in df["date_rep"].unique():
        print("Plot date {:%Y-%m-%d} not within available dates".format(pd.Timestamp(plot_date)))
        return
    
    if metric not in title_legend_dict.keys():
        print("Value of metric not recognized ({}). Check function signature for supported values.".format(metric))
        return
    else:
        metric = metric.lower()
    
    # Generate figure
    fig = px.choropleth(df, locations="alpha_3_code", color=metric, hover_name="country", range_color=[0, df[metric].max()], color_continuous_scale=px.colors.sequential.Reds, scope=scope)
    fig.update_geos(showcountries=True, countrycolor="black")
    
    # Format legend
    fig.update_layout(coloraxis_colorbar=dict(title=title_legend_dict[metric].capitalize(), thicknessmode="pixels", thickness=25, lenmode="pixels", len=397, yanchor="middle", y=.5, ticks="outside"))
    
    # Format figure
    fig.update_layout(margin=dict(r=0, t=0, l=0, b=0), width=900, height=700, title_text="<b>Covid-19: {} by country on {:%d %b %Y}</b>".format(title_legend_dict[metric], pd.Timestamp(plot_date)), 
                      title_y=.99, title_x=.5, title_xanchor="center")
    
    pio.show(fig)

In [12]:
generate_heatmap()

Right out of the box, you can see that Plotly offers some interactive components like hover values and zoom. The ```choropleth``` object also makes it easy to personalize hover values, change colors or even automatically focus on one region. Let's try out a continent view using our helper function.

In [13]:
generate_heatmap(scope="europe")

### 2.2 Scatter map

In much the same way we generated our heat map above, we can instead generate a scatter map with minimal changes using ```plotly.express```' ```scatter_geo``` object (documentation [here](https://plotly.com/python-api-reference/generated/plotly.express.scatter_geo.html#plotly.express.scatter_geo)). The syntax is the same for the most part.

In [14]:
def generate_scatter_map(df=df_clean.copy(), scope="world", plot_date=None, metric="cum_cases"):
    """
    Generates a scatter map from ECDC Covid-19 country-level data.
    
    Args:
        df (pd.DataFrame): data to be visualized.
        scope (str): geographical scope of the map. Can be one of 'world', 'europe', 'asia', 'africa', 'north america', 'south america'. Defaults to 'world'.
        plot_date (str, datetime.date or pd.Timestamp): date to plot. Defaults to latest.
        metric (str): metric to plot. Can be one of 'cases', 'deaths', 'cum_cases', 'cum_deaths', 'mortality_rate', 'fraction_infected', 'fraction_deaths', 'infections_growth_rate', 'deaths_growth_rate'. Defaults to 'cum_cases'.
        
    Returns:
        None
    """
    # Title and legend dict
    title_legend_dict = {"cases": "new cases", "deaths": "new deaths", "cum_cases": "cumulative cases", "cum_deaths": "cumulative deaths", "mortality_rate": "mortality rate", "fraction_infected": "% of pop. infected",
                        "fraction_deaths": "% of pop. dead", "infections_growth_rate": "infections growth rate", "deaths_growth_rate": "deaths growth rate"}
    
    # Default date
    if plot_date is None:
        plot_date = df["date_rep"].max()
    else:
        plot_date = "{:%Y-%m-%d}".format(pd.Timestamp(plot_date))
        
    # Filter df to match scope & date
    if scope.lower() != "world":
        df = df[df["continent"] == scope.title()].copy()
        
    df = df[df["date_rep"] == plot_date].copy()
    
    # Checks
    if scope not in ["world", "europe", "asia", "africa", "north america", "south america"]:
        print("Value of scope not recognized ({}). Check function signature for supported values".format(scope))
        return
    else:
        scope = scope.lower()
    
    if "{:%Y-%m-%d}".format(pd.Timestamp(plot_date)) not in df["date_rep"].unique():
        print("Plot date {:%Y-%m-%d} not within available dates".format(pd.Timestamp(plot_date)))
        return
    
    if metric not in title_legend_dict.keys():
        print("Value of metric not recognized ({}). Check function signature for supported values.".format(metric))
        return
    else:
        metric = metric.lower()
    
    # Generate figure
    fig = px.scatter_geo(df, locations="alpha_3_code", hover_name="country", size=metric, size_max=60, color_discrete_sequence=[px.colors.sequential.Reds[-2]], scope=scope, opacity=.7)
    fig.update_geos(showcountries=True, countrycolor="black")
    
    # Format figure
    fig.update_layout(margin=dict(r=0, t=0, l=0, b=0), width=900, height=700, title_text="<b>Covid-19: {} by country on {:%d %b %Y}</b>".format(title_legend_dict[metric], pd.Timestamp(plot_date)), 
                      title_y=.99, title_x=.5, title_xanchor="center")
    
    pio.show(fig)

In [15]:
generate_scatter_map()

In [16]:
generate_scatter_map(scope="asia")

## 3. Animating charts with plotly

In order to get a better idea of the spread of the virus, we can animate our maps via plotly. Both ```choropleth``` and ```scatter_geo```, along with most of ```plotly.express``` graphs have an ```animation_frame``` argument which makes it easy to explore temporality in a data set, for example. As noted earlier, please note: for this to work, your data must be tidy.

### 3.1 Animated heatmap

Since our data is already tidy and ```plotly.express``` supports animations out of the box, modifying our function to play our data through time will be as easy as adding the ```animation_frame``` parameter to our chart.

In [17]:
def generate_animated_heatmap(df=df_clean.copy(), scope="world", metric="cum_cases"):
    """
    Generates an animated heat map from ECDC Covid-19 country-level data.
    
    Args:
        df (pd.DataFrame): data to be visualized.
        scope (str): geographical scope of the map. Can be one of 'world', 'europe', 'asia', 'africa', 'north america', 'south america'. Defaults to 'world'.
        metric (str): metric to plot. Can be one of 'cases', 'deaths', 'cum_cases', 'cum_deaths', 'mortality_rate', 'fraction_infected', 'fraction_deaths', 'infections_growth_rate', 'deaths_growth_rate'. Defaults to 'cum_cases'.
        
    Returns:
        None
    """
    # Title and legend dict
    title_legend_dict = {"cases": "new cases", "deaths": "new deaths", "cum_cases": "cumulative cases", "cum_deaths": "cumulative deaths", "mortality_rate": "mortality rate", "fraction_infected": "% of pop. infected",
                        "fraction_deaths": "% of pop. dead", "infections_growth_rate": "infections growth rate", "deaths_growth_rate": "deaths growth rate"}
        
    # Filter df to match scope
    if scope.lower() != "world":
        df = df[df["continent"] == scope.title()].copy()
    
    # Checks
    if scope not in ["world", "europe", "asia", "africa", "north america", "south america"]:
        print("Value of scope not recognized ({}). Check function signature for supported values".format(scope))
        return
    else:
        scope = scope.lower()
    
    if metric not in title_legend_dict.keys():
        print("Value of metric not recognized ({}). Check function signature for supported values.".format(metric))
        return
    else:
        metric = metric.lower()
    
    # Generate figure
    fig = px.choropleth(df, locations="alpha_3_code", color=metric, hover_name="country", range_color=[0, df[metric].max()], color_continuous_scale=px.colors.sequential.Reds, scope=scope, animation_frame="date_rep")
    fig.update_geos(showcountries=True, countrycolor="black")
    
    # Format legend
    fig.update_layout(coloraxis_colorbar=dict(title=title_legend_dict[metric].capitalize(), thicknessmode="pixels", thickness=25, lenmode="pixels", len=397, yanchor="middle", y=.5, ticks="outside"))
    
    # Format figure
    fig.update_layout(margin=dict(r=0, t=0, l=0, b=0), width=900, height=700, title_text="<b>Covid-19: {} by country</b>".format(title_legend_dict[metric]), 
                      title_y=.99, title_x=.5, title_xanchor="center")
    
    pio.show(fig)

In [18]:
generate_animated_heatmap()

### 3.2 Animated scatter map

In [19]:
def generate_animated_scatter_map(df=df_clean.copy(), scope="world", metric="cum_cases"):
    """
    Generates an animated scatter map from ECDC Covid-19 country-level data.
    
    Args:
        df (pd.DataFrame): data to be visualized.
        scope (str): geographical scope of the map. Can be one of 'world', 'europe', 'asia', 'africa', 'north america', 'south america'. Defaults to 'world'.
        metric (str): metric to plot. Can be one of 'cases', 'deaths', 'cum_cases', 'cum_deaths', 'mortality_rate', 'fraction_infected', 'fraction_deaths', 'infections_growth_rate', 'deaths_growth_rate'. Defaults to 'cum_cases'.
        
    Returns:
        None
    """
    # Title and legend dict
    title_legend_dict = {"cases": "new cases", "deaths": "new deaths", "cum_cases": "cumulative cases", "cum_deaths": "cumulative deaths", "mortality_rate": "mortality rate", "fraction_infected": "% of pop. infected",
                        "fraction_deaths": "% of pop. dead", "infections_growth_rate": "infections growth rate", "deaths_growth_rate": "deaths growth rate"}
        
    # Filter df to match scope
    if scope.lower() != "world":
        df = df[df["continent"] == scope.title()].copy()
    
    # Checks
    if scope not in ["world", "europe", "asia", "africa", "north america", "south america"]:
        print("Value of scope not recognized ({}). Check function signature for supported values".format(scope))
        return
    else:
        scope = scope.lower()
    
    if metric not in title_legend_dict.keys():
        print("Value of metric not recognized ({}). Check function signature for supported values.".format(metric))
        return
    else:
        metric = metric.lower()
    
    # Generate figure
    fig = px.scatter_geo(df, locations="alpha_3_code", hover_name="country", size=metric, size_max=60, color_discrete_sequence=[px.colors.sequential.Reds[-2]], scope=scope, animation_frame="date_rep")
    fig.update_geos(showcountries=True, countrycolor="black")
    
    # Format figure
    fig.update_layout(margin=dict(r=0, t=0, l=0, b=0), width=900, height=700, title_text="<b>Covid-19: {} by country</b>".format(title_legend_dict[metric]), 
                      title_y=.99, title_x=.5, title_xanchor="center")
    
    pio.show(fig)

In [20]:
generate_animated_scatter_map(scope="south america")

## 4. Adding more interactivity with ipywidgets

It's possible to generate a variety of controls using ```plotly``` directly, such as [custom buttons](https://plotly.com/python/custom-buttons/), [sliders](https://plotly.com/python/sliders/), [dropdowns](https://plotly.com/python/dropdowns/) or [range sliders/selectors](https://plotly.com/python/range-slider/), however I've personally found that these tend to get pretty verbose and was looking for something a little more concise syntax-wise. If you're coding in a Jupyter notebook/lab environment and want to add such controls, it turns out [ipywidgets](https://ipywidgets.readthedocs.io/en/latest/user_install.html) is a great resource. We'll leverage ```ipywidgets``` to add more control to our chart and unify both functions previously seen into one figure we can update through dropdown menus.

We'll use ipywidgets' ```interactive_output``` to display both our plotly animated map and some custom dropdowns that enable us to control the it, like changing the chart type, displayed metric or region, without having to generate new charts.

In [21]:
def generate_animated_map(df=df_clean.copy(), scope="world", metric="cum_cases", chart_type="choropleth"):
    """
    Generates an animated map from ECDC Covid-19 country-level data.
    
    Args:
        df (pd.DataFrame): data to be visualized.
        scope (str): geographical scope of the map. Can be one of 'world', 'europe', 'asia', 'africa', 'north america', 'south america'. Defaults to 'world'.
        metric (str): metric to plot. Can be one of 'cases', 'deaths', 'cum_cases', 'cum_deaths', 'mortality_rate', 'fraction_infected', 'fraction_deaths', 'infections_growth_rate', 'deaths_growth_rate'. Defaults to 'cum_cases'.
        chart_type (str): type of map. Can be one of 'choropleth' or 'scatter_geo'.
        
    Returns:
        None
    """
    # Title and legend dict
    title_legend_dict = {"cases": "new cases", "deaths": "new deaths", "cum_cases": "cumulative cases", "cum_deaths": "cumulative deaths", "mortality_rate": "mortality rate", "fraction_infected": "% of pop. infected",
                        "fraction_deaths": "% of pop. dead", "infections_growth_rate": "infections growth rate", "deaths_growth_rate": "deaths growth rate"}
        
    # Filter df to match scope
    if scope.lower() != "world":
        df = df[df["continent"] == scope.title()].copy()
        
    if scope not in ["world", "europe", "asia", "africa", "north america", "south america"]:
        print("Value of scope not recognized ({}). Check function signature for supported values".format(scope))
        return
    
    # Generate figure
    if chart_type == "choropleth":
        fig = px.choropleth(df, locations="alpha_3_code", color=metric, hover_name="country", hover_data=["date_rep", metric], range_color=[0, df[metric].max()],
                           color_continuous_scale=px.colors.sequential.Reds, scope=scope, animation_frame="date_rep")
        fig.update_layout(coloraxis_colorbar=dict(title=title_legend_dict[metric].capitalize(), thicknessmode="pixels", thickness=25, lenmode="pixels", len=397, yanchor="middle", y=.5, ticks="outside"))
    else:
        fig = px.scatter_geo(df, locations="alpha_3_code", hover_name="country", size=metric, size_max=60, color_discrete_sequence=[px.colors.sequential.Reds[-2]], scope=scope, animation_frame="date_rep")
    
    # Format figure
    fig.update_geos(showcountries=True, countrycolor="black")
    fig.update_layout(margin=dict(r=0, t=0, l=0, b=0), width=900, height=700, title_text="<b>Covid-19 {} by country</b>".format(title_legend_dict[metric]), title_y=.99, title_yanchor="top", title_x=0.5, title_xanchor="center")
    pio.show(fig)

In [22]:
# Controls
scope_list = sorted([("World", "world")] + [(c, c.lower()) for c in df_clean["continent"].unique() if c not in ["Antarctica", "Oceania"]])
scope_dropdown = widgets.Dropdown(options=scope_list, value="world", description="Scope")

title_legend_dict = {"cases": "new cases", "deaths": "new deaths", "cum_cases": "cumulative cases", "cum_deaths": "cumulative deaths", "mortality_rate": "mortality rate", "fraction_infected": "% of pop. infected",
                        "fraction_deaths": "% of pop. dead", "infections_growth_rate": "infections growth rate", "deaths_growth_rate": "deaths growth rate"}
metric_list = [(v.capitalize(), k) for k, v in title_legend_dict.items()]
metric_dropdown = widgets.Dropdown(options=metric_list, value="cum_cases", description="Metric")

chart_type_list = [("Heatmap", "choropleth"), ("Scatter map", "scatter_geo")]
chart_dropdown = widgets.Dropdown(options=chart_type_list, value="choropleth", description="Chart type")

# Layout & display
box_layout = Layout(justify_content="flex-start", align_items="center")
ui = HBox([scope_dropdown, metric_dropdown, chart_dropdown], layout=box_layout)
out = interactive_output(generate_animated_map, dict(scope=scope_dropdown, metric=metric_dropdown, chart_type=chart_dropdown))
display(out, ui)

Output()

HBox(children=(Dropdown(description='Scope', index=5, options=(('Africa', 'africa'), ('Asia', 'asia'), ('Europ…

## 5. Other chart types

We've looked extensively at using ```plotly```'s charting capabilities with maps as they were particularly useful for our current use case of visualizing Covid-19 data, however ```plotly.express``` also comes with implementations for all standard types of charts.

### 5.1 Bar chart

Let's generate an animated bar chart to illustrate the evolution of our metrics through time, by continent or country. We'll also use the ```observe``` method of widgets to reset values of textboxes when a dropdown value is changed.

In [23]:
# Scaling utility functions
def scale_upper_limit(x):
    sign = np.sign(x)
    x = abs(x)
    if x == 0:
        return 0
    elif x >= 1:
        power = int(math.log10(x))
        scale = math.pow(10, power)
        lower = math.floor(x / scale) * scale
        for step in [1.25, 1.5, 1.75, 2]:
            if x < lower * step:
                return lower * step * sign
    else:
        factor = 1
        while x < 1:
            factor *= 10
            x *= 10
        return scale_upper_limit(x * sign) / factor
    
    
def scale_lower_limit(x):
    sign = np.sign(x)
    x = abs(x)
    if x == 0:
        return 0
    elif x >= 1:
        power = int(math.log10(x))
        scale = math.pow(10, power)
        lower = math.floor(x / scale) * scale
        for step in [1.75, 1.5, 1.25, 1]:
            if x >= lower * step:
                return lower * step * sign
    else:
        factor = 1
        while x < 1:
            factor *= 10
            x *= 10
        return scale_lower_limit(x * sign) / factor
    
    
def generate_scale(values):
    if all(x < 0 for x in values):
        return scale_upper_limit(min(values)), scale_lower_limit(max(values))
    else:
        return scale_lower_limit(min(values)), scale_upper_limit(max(values))

In [24]:
def generate_animated_bar_chart(df=df_clean.copy(), scope="world", x="cum_cases", y="default", x_cutoff=None, top_n=None, fit_dates=True):
    """
    Generates an animated bar chart from ECDC Covid-19 country-level data.
    
    Args:
        df (pd.DataFrame): ECDC data to be visualized.
        scope (str): geographical scope of the data. Can be one of 'world', 'europe', 'asia', 'africa', 'north america', 'south america'. Defaults to 'world'.
        x (str): metric to plot. Can be one of 'cases', 'deaths', 'cum_cases', 'cum_deaths', 'mortality_rate', 'fraction_infected', 'fraction_deaths', 'infections_growth_rate', 'deaths_growth_rate'. Defaults to 'cum_cases'.
        y (str): unit to group data. If scope is 'world', defaults to 'continent', otherwise defaults to 'country'.
        x_cutoff (int): will not represent y groups that fall strictly below x_cutoff value. If x is cumulative, cutoff value applies to the last value by date. If not, cutoff value applies to the maximum within the date range.
        top_n (int): only display top N y groups. Operates like x_cutoff.
        fit_dates (bool): drop dates before data starts being non-zero for selected criteria.
        
    Returns:
        None
    """
    # Title and legend dict
    title_legend_dict = {"cases": "new cases", "deaths": "new deaths", "cum_cases": "cumulative cases", "cum_deaths": "cumulative deaths", "mortality_rate": "mortality rate", "fraction_infected": "% of pop. infected",
                        "fraction_deaths": "% of pop. dead", "infections_growth_rate": "infections growth rate", "deaths_growth_rate": "deaths growth rate"}
    # Check
    if scope.lower() not in ["world", "europe", "asia", "africa", "north america", "south america"]:
        print("Value of scope not recognized ({}). Check function signature for supported values.".format(scope))
        return
    
    # Filter df to match scope
    if scope.lower() != "world":
        df = df[df["continent"] == scope.title()].copy()
        
    if y == "default":
        if scope == "world":
            y = "continent"
        else:
            y = "country"
    
    # Metric cutoff/top n aggregates
    if "cum" in x:
        df_check = df.loc[df["date_rep"] == df["date_rep"].max(), [y, x]].groupby(y).sum()
    else:
        df_check = df[[y, x]].groupby(y).agg(max)
            
    if x_cutoff is not None:
        y_to_remove = df_check.loc[df_check[x] < x_cutoff].index
        df = df[~df[y].isin(y_to_remove)].copy()
        
    if top_n is not None:
        if len(df_check.index) > top_n:
            y_to_keep = df_check.nlargest(top_n, x).index
            df = df[df[y].isin(y_to_keep)].copy()
    
    # Fit dates to selection
    if fit_dates:
        min_date = df.loc[df[x] > 0, "date_rep"].min()
        df = df[df["date_rep"] >= min_date].copy()
    
    # Formatting args
    max_val = df_check.max()[x]
    graph_scale = [0, scale_upper_limit(max_val)]
    graph_height = 80 * len(df[y].unique())
    
    # Generate & layout figure
    fig = px.bar(df, x=x, y=y, color=y, orientation="h", hover_name=y, hover_data=["date_rep", "country", y, x], animation_frame="date_rep", range_x=graph_scale, labels={x: "", y: ""}, color_discrete_sequence=px.colors.cyclical.Twilight)
    fig.update_layout(width=1800, height=graph_height, title_text="<b>Covid-19 {} by {}</b>".format(title_legend_dict[x], y), title_y=.99, title_yanchor="top", title_x=0.5, title_xanchor="center")
    pio.show(fig)

In [25]:
# Controls definition
scope_list = sorted([("World", "world")] + [(c, c.lower()) for c in df_clean["continent"].unique() if c not in ["Antarctica", "Oceania"]])
scope_dropdown = widgets.Dropdown(options=scope_list, value="world", description="Scope")

title_legend_dict = {"cases": "new cases", "deaths": "new deaths", "cum_cases": "cumulative cases", "cum_deaths": "cumulative deaths", "mortality_rate": "mortality rate", "fraction_infected": "% of pop. infected",
                        "fraction_deaths": "% of pop. dead", "infections_growth_rate": "infections growth rate", "deaths_growth_rate": "deaths growth rate"}
metric_list = [(v.capitalize(), k) for k, v in title_legend_dict.items()]
metric_dropdown = widgets.Dropdown(options=metric_list, value="cum_cases", description="Metric")

cutoff_text = widgets.FloatText(value=0, description="Metric cutoff", disabled=False)
top_n_text = widgets.IntText(value=10, description="Show top", disabled=False)

# Reset cutoff to 0 & top_n to 10 when dropdowns change
def dropdown_value_change(change):
    cutoff_text.value = 0
    top_n_text.value=10
    
scope_dropdown.observe(dropdown_value_change, names="value")
metric_dropdown.observe(dropdown_value_change, names="value")

# Layout & display
box_layout = Layout(justify_content="flex-start", align_items="center")
ui = HBox([scope_dropdown, metric_dropdown, cutoff_text, top_n_text], layout=box_layout)
out = interactive_output(generate_animated_bar_chart, dict(scope=scope_dropdown, x=metric_dropdown, x_cutoff=cutoff_text, top_n=top_n_text))
display(ui, out)

HBox(children=(Dropdown(description='Scope', index=5, options=(('Africa', 'africa'), ('Asia', 'asia'), ('Europ…

Output()

### 5.2 Scatter/bubble chart

In [26]:
def generate_animated_scatter_plot(df=df_clean.copy(), x="cum_cases", y="mortality_rate", size="population_2018", facet_col="continent"):
    """
    Generates an animated scatter plot from ECDC Covid-19 country-level data.
    
    Args:
        df (pd.DataFrame): ECDC data to be visualized.
        x (str): metric to plot along the x axis. Can be one of 'cases', 'deaths', 'cum_cases', 'cum_deaths', 'mortality_rate', 'fraction_infected', 'fraction_deaths', 'infections_growth_rate', 'deaths_growth_rate'. Defaults to 'cum_cases'.
        y (str): metric to plot along the y axis. Values as per x. Defaults to 'mortality_rate'.
        size (str): metric used to size dots in the plot. Values as per x. Defaults to 'population_2018'.
        facet_col (str): field use to separate into subplots. Set to None for one chart. Defaults to 'continent'.
    """
    # Title and legend dict
    title_legend_dict = {"cases": "new cases", "deaths": "new deaths", "cum_cases": "cumulative cases", "cum_deaths": "cumulative deaths", "mortality_rate": "mortality rate", "fraction_infected": "% of pop. infected",
                        "fraction_deaths": "% of pop. dead", "infections_growth_rate": "infections growth rate", "deaths_growth_rate": "deaths growth rate"}
    
    # Filter dataset for missing metrics
    if df[size].isnull().sum() > 0:
        df = df[df[size].notnull()].copy()
    
    x_upper, y_upper = scale_upper_limit(df[x].max()), scale_upper_limit(df[y].max())
    x_scale = [0 - 0.1 * x_upper, x_upper]
    y_scale = [0 - 0.1 * y_upper, y_upper]
    
    fig = px.scatter(df, x=x, y=y, range_x=x_scale, range_y=y_scale, size=size, facet_col=facet_col, hover_name="country", hover_data=["continent", x, y], size_max=60, animation_frame="date_rep", 
                     animation_group="alpha_3_code", labels={x: title_legend_dict[x].capitalize(), y: title_legend_dict[y].capitalize()}, color="continent")
    fig.update_layout(width=2000, height=600, title_text="<b>Covid-19 {} vs {} by continent & country</b>".format(title_legend_dict[x], title_legend_dict[y]), title_y=.99, title_yanchor="top", title_x=0.5, title_xanchor="center")
    pio.show(fig)

In [27]:
# Controls
title_legend_dict = {"cases": "new cases", "deaths": "new deaths", "cum_cases": "cumulative cases", "cum_deaths": "cumulative deaths", "mortality_rate": "mortality rate", "fraction_infected": "% of pop. infected",
                        "fraction_deaths": "% of pop. dead", "infections_growth_rate": "infections growth rate", "deaths_growth_rate": "deaths growth rate"}
x_list = [(v.capitalize(), k) for k, v in title_legend_dict.items()]
x_dropdown = widgets.Dropdown(options=x_list, value="cum_cases", description="x")
y_dropdown = widgets.Dropdown(options=x_list, value="mortality_rate", description="y")
size_list = x_list + [("2018 population", "population_2018")]
size_dropdown = widgets.Dropdown(options=size_list, value="population_2018", description="Marker size")

# Layout & display
box_layout = Layout(justify_content="flex-start", align_items="center")
ui = HBox([x_dropdown, y_dropdown, size_dropdown], layout=box_layout)
out = interactive_output(generate_animated_scatter_plot, dict(x=x_dropdown, y=y_dropdown, size=size_dropdown))
display(ui, out)

HBox(children=(Dropdown(description='x', index=2, options=(('New cases', 'cases'), ('New deaths', 'deaths'), (…

Output()

## 6. Converting to a dashboard

### 6.1 Using voila

For the purpose of dahboarding, I'll include a second notebook containing only the final animated map. If you're wondering how to transform your Jupyter notebook into an interactive dashboard, see the following [article](https://blog.jupyter.org/and-voil%C3%A0-f6a2c08a4a93) that nicely and concisely lays out how to use voila!

### 6.2 Using Dash

To build a more robuts and flexible dashboard, I've found that Dash is well worth looking into. If you're not familiar with Dash, it's a Python framework for building web applications written on top of Flask, Plotly and React. Their page includes a great [tutorial](https://dash.plotly.com/installation) that will help you get started. To illustrate, I also went through the exercise of building a Dash app containing all 3 final charts (map, bar chart and scatter plot), by repackaging the functions used in this notebook for data collection and chart generation. Dash has You can find the code in the dashboard folder, and I've also deployed the app [here](url) if you want to check it out.