# COVID-19 Deaths for Some Selected Countries

This notebook uses the data provided by [Johns Hopkins CSSE](https://github.com/CSSEGISandData/COVID-19).

In [35]:
import pandas as pd
import requests
import altair as alt

## Get Data from GitHub

In [36]:
url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/" \
      "csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv"

df = pd.read_csv(url)
df.sample(5)

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,3/21/20,3/22/20,3/23/20,3/24/20,3/25/20,3/26/20,3/27/20,3/28/20,3/29/20,3/30/20
85,,Costa Rica,9.7489,-83.7534,0,0,0,0,0,0,...,2,2,2,2,2,2,2,2,2,2
240,,Libya,26.3351,17.228331,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
168,Sint Maarten,Netherlands,18.0425,-63.0548,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
174,,North Macedonia,41.6086,21.7453,0,0,0,0,0,0,...,0,1,2,2,3,3,3,4,6,7
112,Reunion,France,-21.1351,55.2471,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


## Clean Up Data

Remove columns that are not needed.

In [37]:
df = df.drop(["Province/State", "Lat", "Long"], axis=1)
df = df.rename(columns={"Country/Region": "Country"})
df.sample(5)

Unnamed: 0,Country,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,1/29/20,1/30/20,...,3/21/20,3/22/20,3/23/20,3/24/20,3/25/20,3/26/20,3/27/20,3/28/20,3/29/20,3/30/20
130,Iceland,0,0,0,0,0,0,0,0,0,...,1,1,1,2,2,2,2,2,2,2
147,Lebanon,0,0,0,0,0,0,0,0,0,...,4,4,4,4,6,6,8,8,10,11
226,Uzbekistan,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,2,2,2
159,Moldova,0,0,0,0,0,0,0,0,0,...,1,1,1,1,1,1,2,2,2,2
13,Australia,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Some countries like UK and US show up on multiple lines (as data is presented for each region), so group all lines by country and sum all values.

In [38]:
df = df.groupby(["Country"], as_index=False).sum()
df.sample(5)

Unnamed: 0,Country,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,1/29/20,1/30/20,...,3/21/20,3/22/20,3/23/20,3/24/20,3/25/20,3/26/20,3/27/20,3/28/20,3/29/20,3/30/20
138,Saint Kitts and Nevis,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
55,Estonia,0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,1,1,1,3,3
172,Uzbekistan,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,2,2,2
26,Burkina Faso,0,0,0,0,0,0,0,0,0,...,2,4,4,4,4,7,9,11,12,12
147,Slovakia,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0


Filter the data set on a small set of countries.

In [39]:
countries = ["Sweden", "Italy", "Spain", "Germany", "France", "US", "United Kingdom", "Korea, South"]
df = df[df["Country"].isin(countries)]
df


Unnamed: 0,Country,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,1/29/20,1/30/20,...,3/21/20,3/22/20,3/23/20,3/24/20,3/25/20,3/26/20,3/27/20,3/28/20,3/29/20,3/30/20
60,France,0,0,0,0,0,0,0,0,0,...,563,676,862,1102,1333,1698,1997,2317,2611,3030
64,Germany,0,0,0,0,0,0,0,0,0,...,84,94,123,157,206,267,342,433,533,645
83,Italy,0,0,0,0,0,0,0,0,0,...,4825,5476,6077,6820,7503,8215,9134,10023,10779,11591
89,"Korea, South",0,0,0,0,0,0,0,0,0,...,102,111,111,120,126,131,139,144,152,158
151,Spain,0,0,0,0,0,0,0,0,0,...,1375,1772,2311,2808,3647,4365,5138,5982,6803,7716
155,Sweden,0,0,0,0,0,0,0,0,0,...,20,21,25,36,62,77,105,105,110,146
166,US,0,0,0,0,0,0,0,0,0,...,307,417,557,706,942,1209,1581,2026,2467,2978
170,United Kingdom,0,0,0,0,0,0,0,0,0,...,234,282,336,423,466,580,761,1021,1231,1411


The Altair library that is used for visualization works best with "long" data, so melt our dataset to long format.

In [40]:
df = df.melt("Country", var_name="Date", value_name="Deaths")
df

Unnamed: 0,Country,Date,Deaths
0,France,1/22/20,0
1,Germany,1/22/20,0
2,Italy,1/22/20,0
3,"Korea, South",1/22/20,0
4,Spain,1/22/20,0
...,...,...,...
547,"Korea, South",3/30/20,158
548,Spain,3/30/20,7716
549,Sweden,3/30/20,146
550,US,3/30/20,2978


Finally, fix the date column to be a proper datetime type.

In [41]:
df["Date"] = pd.to_datetime(df["Date"])
df

Unnamed: 0,Country,Date,Deaths
0,France,2020-01-22,0
1,Germany,2020-01-22,0
2,Italy,2020-01-22,0
3,"Korea, South",2020-01-22,0
4,Spain,2020-01-22,0
...,...,...,...
547,"Korea, South",2020-03-30,158
548,Spain,2020-03-30,7716
549,Sweden,2020-03-30,146
550,US,2020-03-30,2978


## Plot the Dataset

In [44]:
domain = (10, int(df.Deaths.max()))
alt.Chart(df).transform_filter(
    alt.datum.Deaths >= 10  
).mark_line(point=True).encode(
    alt.X("Date:T"),
    alt.Y("Deaths:Q", scale=alt.Scale(type="log", domain=domain)),
    color="Country:N",
    shape=alt.Shape("Country"),
    tooltip=["Country", "Deaths", "Date"]
).properties(
    width=1000,
    height=600
).configure_point(
    size=75
).interactive()