# **COVID-19 United States Exploratory Look (updated October 30)**

## Introduction

Hello! The goal of this post is to take a general look at the basic distribution of COVID-19 case and death additions over the past few months, with a slight emphasis on United States' numbers. Due to the sheer number of countries affected by the global pandemic, I will narrow the focus all comparative illustrations to the top ten countries aggregated by total case count. With this in mind, the analysis will present visualizations on the following target areas:

* Top Countries' Daily Aggregated Counts 
* Top Countries' Comparative Cumulative Counts (Scaled and Unscaled Counts)
* United States' Daily COVID-19 Percent Change

## Dataset

The data presented in this post comes from the dataset downloaded from [EU Open Data Portal](https://data.europa.eu/euodp/en/data/dataset/covid-19-coronavirus-data). The dataset from the site contains the latest available public data on COVID-19 including a daily situation update. 

In [14]:
#Loading relevant packages
import pandas as pd
import numpy as np

#Loading dataset
geo = pd.read_csv('10-30_distribution.csv', encoding = "ISO-8859-1")
geo.columns = map(str.lower, geo.columns)
geo["date"] = pd.to_datetime(geo['daterep'])

#Creating columns that scales case increase by total country population
geo["cases_popscaled"] = geo['cases']/geo['popdata2019']
geo["deaths_popscaled"] = geo['deaths']/geo['popdata2019']

#Obtaining list of top ten countries by total case count
aggregated_table = geo.groupby(['countriesandterritories'], as_index = False).agg(sum)
top_ten = aggregated_table.sort_values(by = "cases", ascending = False).head(10)
top_ten_names = top_ten['countriesandterritories'].values

#Creating copy of table that drops unnecessary columns and only includes top ten countries
dropped_columns = ["daterep", "day", "month", 'year', "geoid", 'countryterritorycode', 'continentexp']
geo_dataset = geo.drop(columns = dropped_columns)
geo_dataset = geo_dataset.loc[geo_dataset['countriesandterritories'].isin(top_ten_names)]

#Reordering columns in dataset
geo_dataset = geo_dataset[['date', 'countriesandterritories', 'popdata2019', 'cases', 
                           'cases_popscaled', 'deaths', 'deaths_popscaled']]

#Creating a table of cumulative cases and deaths in top countries
geo_cumulative = geo_dataset[['countriesandterritories', 'date', 'cases', 
                              'deaths', 'cases_popscaled', 'deaths_popscaled']]
geo_cumulative = geo_cumulative.groupby(['countriesandterritories', 'date']).sum()
geo_cumulative = geo_cumulative.groupby(level=0).cumsum().reset_index()

#Creating a table on USA COVID-19 daily percentage change
USA_data = geo_dataset.copy()
USA_data = USA_data.loc[USA_data['countriesandterritories'].isin(["United_States_of_America"])]
case_percentages = USA_data.sort_values('date')['cases'].pct_change().sort_index(ascending = True).values * 100
death_percentages = USA_data.sort_values('date')['deaths'].pct_change().sort_index(ascending = True).values * 100
USA_data['casepercentagegrowth'] = case_percentages
USA_data['deathpercentagegrowth'] = death_percentages

#### A Quick Look at General Tables

The table that we will be working with contains seven columns as follows:

* **date**: Time data showing date of reported statistics.
* **countriesandterritories**: Categorical data showing the country of reported statistics.
* **popdata2019**: Numerical data showing the total country population count (updated 2019).
* **cases**: Numerical data showing total cases reported on given date.
* **cases_popscaled**: Numerical data showing cases reported on given date scaled by country's total population.
* **deaths**: Numerical data showing total deaths reported on a given date.
* **deaths_pop_scaled**: Numerical data showing deaths reported on given date scaled by country's total population.

In [15]:
geo_dataset.head(5)

Unnamed: 0,date,countriesandterritories,popdata2019,cases,cases_popscaled,deaths,deaths_popscaled
1728,2020-10-30,Argentina,44780675.0,12691,0.000283,371,8e-06
1729,2020-10-29,Argentina,44780675.0,13924,0.000311,341,8e-06
1730,2020-10-28,Argentina,44780675.0,14308,0.00032,429,1e-05
1731,2020-10-27,Argentina,44780675.0,11712,0.000262,405,9e-06
1732,2020-10-26,Argentina,44780675.0,9253,0.000207,283,6e-06


In addition, a subset table of USA-specific values has the following additional columns:

* **casepercentagegrowth**: Numerical data showing the current date's cases divided by the prior date's cases then multiplied by 100.
* **deathpercentagegrowth**: Numerical data showing the current date's deaths divided by the prior date's deaths then multiplied by 100.

In [16]:
USA_data.head(5)

Unnamed: 0,date,countriesandterritories,popdata2019,cases,cases_popscaled,deaths,deaths_popscaled,casepercentagegrowth,deathpercentagegrowth
49948,2020-10-30,United_States_of_America,329064917.0,88130,0.000268,968,3e-06,12.45231,-0.921187
49949,2020-10-29,United_States_of_America,329064917.0,78371,0.000238,977,3e-06,4.315244,-1.11336
49950,2020-10-28,United_States_of_America,329064917.0,75129,0.000228,988,3e-06,9.903597,95.643564
49951,2020-10-27,United_States_of_America,329064917.0,68359,0.000208,505,2e-06,15.005047,52.567976
49952,2020-10-26,United_States_of_America,329064917.0,59440,0.000181,331,1e-06,-28.433828,-63.384956


## Visualization 1: Top Ten Countries' Aggregated Counts

In [17]:
import altair as alt

chart_cases = alt.Chart(geo_dataset).mark_area().encode(
    alt.X('date:T',
        axis=alt.Axis(domain=False, tickSize=0)
    ),
    alt.Y('sum(cases):Q'),
    alt.Color('countriesandterritories:N',
        scale=alt.Scale(scheme='category20b')
    )
).interactive(
).properties(
    width = 800,
    height = 500,
    title = "Daily COVID-19 Aggregated Cases in Top Countries"
)

chart_cases.configure_title(
    fontSize=20,
    font='Courier',
    color='black'
).display(renderer = 'svg')

In [18]:
chart_deaths = alt.Chart(geo_dataset).mark_area().encode(
    alt.X('date:T',
        axis=alt.Axis(domain=False, tickSize=0)
    ),
    alt.Y('sum(deaths):Q'),
    alt.Color('countriesandterritories:N',
        scale=alt.Scale(scheme='category20b')
    )
).interactive(
).properties(
    width = 800,
    height = 500,
    title = "Daily COVID-19 Aggregated Deaths in Top Countries"
)

chart_deaths.configure_title(
    fontSize=20,
    font='Courier',
    color='black'
).display(renderer = 'svg')

## Visualization 2: Top Ten Countries' Individual Counts

In [19]:
highlight = alt.selection(type='single', on='mouseover',
                         fields=['countriesandterritories'], nearest=True)

base = alt.Chart(geo_cumulative).encode(
    x='date:T',
    y='cases:Q',
    color = alt.Color('countriesandterritories:N',
        scale=alt.Scale(scheme='category20')
))

points = base.mark_circle().encode(
    opacity=alt.value(0)
).add_selection(
    highlight
).properties(
    width=600
).properties(
    width = 800,
    height = 500,
    title = "Cumulative COVID-19 Cases in Top Countries"
).interactive()

lines = base.mark_line().encode(
    size=alt.condition(~highlight, alt.value(3), alt.value(5))
)

(points + lines).configure_title(
    fontSize=20,
    font='Courier',
    color='black'
).display(renderer = 'svg')

In [20]:
highlight = alt.selection(type='single', on='mouseover',
                         fields=['countriesandterritories'], nearest=True)

base = alt.Chart(geo_cumulative).encode(
    x='date:T',
    y='cases_popscaled:Q',
    color = alt.Color('countriesandterritories:N',
        scale=alt.Scale(scheme='category20')
))

points = base.mark_circle().encode(
    opacity=alt.value(0)
).add_selection(
    highlight
).properties(
    width=600
).properties(
    width = 800,
    height = 500,
    title = "Cumulative Population Percentage COVID-19 Cases in Top Countries"
).interactive()

lines = base.mark_line().encode(
    size=alt.condition(~highlight, alt.value(3), alt.value(5))
)

(points + lines).configure_title(
    fontSize=20,
    font='Courier',
    color='black'
).display(renderer = 'svg')

In [21]:
highlight = alt.selection(type='single', on='mouseover',
                         fields=['countriesandterritories'], nearest=True)

base = alt.Chart(geo_cumulative).encode(
    x='date:T',
    y='deaths:Q',
    color = alt.Color('countriesandterritories:N',
        scale=alt.Scale(scheme='category20')
))

points = base.mark_circle().encode(
    opacity=alt.value(0)
).add_selection(
    highlight
).properties(
    width=600
).properties(
    width = 800,
    height = 500,
    title = "Cumulative COVID-19 Deaths in Top Countries"
).interactive()

lines = base.mark_line().encode(
    size=alt.condition(~highlight, alt.value(3), alt.value(5))
)

(points + lines).configure_title(
    fontSize=20,
    font='Courier',
    color='black'
).display(renderer = 'svg')

In [22]:
highlight = alt.selection(type='single', on='mouseover',
                         fields=['countriesandterritories'], nearest=True)

base = alt.Chart(geo_cumulative).encode(
    x='date:T',
    y='deaths_popscaled:Q',
    color = alt.Color('countriesandterritories:N',
        scale=alt.Scale(scheme='category20')
))

points = base.mark_circle().encode(
    opacity=alt.value(0)
).add_selection(
    highlight
).properties(
    width=600
).properties(
    width = 800,
    height = 500,
    title = "Cumulative Population Percentage COVID-19 Deaths in Top Countries"
).interactive()

lines = base.mark_line().encode(
    size=alt.condition(~highlight, alt.value(3), alt.value(5))
)

(points + lines).configure_title(
    fontSize=20,
    font='Courier',
    color='black'
).display(renderer = 'svg')

## Visualization 3: USA Daily Percentage Change

In [23]:
chart_cases = alt.Chart(USA_data).mark_bar().encode(
    x = "date:T",
    y = "casepercentagegrowth:Q",
    color = alt.condition(
    alt.datum.casepercentagegrowth > 0,
    alt.value("steelblue"),
    alt.value("darkred")) 
).interactive(
).properties(
    width = 800,
    height = 500,
    title = "Daily Percentage Change USA COVID-19 Cases"
)

chart_cases.configure_title(
    fontSize=20,
    font='Courier',
    color='black'
).display(renderer = 'svg')

In [24]:
chart_cases = alt.Chart(USA_data).mark_bar().encode(
    x = "date:T",
    y = "deathpercentagegrowth:Q",
    color = alt.condition(
    alt.datum.deathpercentagegrowth > 0,
    alt.value("steelblue"),
    alt.value("darkred")) 
).interactive(
).properties(
    width = 800,
    height = 500,
    title = "Daily Percentage Change USA COVID-19 Deaths"
)

chart_cases.configure_title(
    fontSize=20,
    font='Courier',
    color='black'
).display(renderer = 'svg')

## Moving Forward

Looking at the visualization of the top 10 countries' aggregated counts over time, we see a concerning trend in which it does not look like cumulative counts are decreasing anytime soon.

#### Potential Next Steps

To get a better sense of how effective certain policies are in relationship to COVID-19 rates, a good next step is to group different countries in terms of policy strictness and compare COVID-19 growth rate differences. In addition, if strict policies do play a big role in shaping the COVID-19 case curve, it would allow for the development of predictive models that can estimate what the outlook is moving forward.

In [13]:
from IPython.display import HTML
HTML('''<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
The raw code for this IPython notebook is by default hidden for easier reading.
To toggle on/off the raw code, click <a href="javascript:code_toggle()">here</a>.''')