# COVID-19 GRAPH DATA ANALYTICS in 06/24/2020
## by Thi Van Nguyen
### Master of Statistics in Binghamton University
[**Linkedin**](https://www.linkedin.com/in/thi-van-nguyen-563147104/)




![](https://d3nuqriibqh3vw.cloudfront.net/c0481846-wuhan_novel_coronavirus_illustration-spl.jpg?iFJ36W0T_gU2dfelG0z_E2oaOm_7Gnmv)


**Abstract**: 
Coronavirus diseases 2019(COVID-19) has a strong impact on every aspect over in the world. It is an infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). COVID-19 outbreak was first reported in December 2019 in Wuhan, China, and has resulted in an ongoing pandemic. At the time of writing this study, the total number confirmed cases passes over 9,3 millions in over 185 countries and territories around the world,resulting in more than 480,000 deaths.This study aims to visualize data in top countries that are hit hard by the outbreak. 
[Source of information](https://coronavirus.jhu.edu/map.html)

In [3]:

# storing and anaysis
import numpy as np
import pandas as pd
pd.set_option('display.float_format', '{:.2f}'.format)

# visualization
import matplotlib.pyplot as plt
import matplotlib.dates as dt
import seaborn as sns
sns.set(style="whitegrid")
%matplotlib inline
%matplotlib notebook


# hide warnings
import warnings
warnings.filterwarnings('ignore')

from datetime import timedelta  
from datetime import datetime

In [2]:

data = pd.read_csv('https://raw.githubusercontent.com/datasets/covid-19/master/data/time-series-19-covid-combined.csv', 
                         parse_dates=['Date'])
data.head()

Unnamed: 0,Date,Country/Region,Province/State,Lat,Long,Confirmed,Recovered,Deaths
0,2020-01-22,Afghanistan,,33.0,65.0,0.0,0.0,0.0
1,2020-01-23,Afghanistan,,33.0,65.0,0.0,0.0,0.0
2,2020-01-24,Afghanistan,,33.0,65.0,0.0,0.0,0.0
3,2020-01-25,Afghanistan,,33.0,65.0,0.0,0.0,0.0
4,2020-01-26,Afghanistan,,33.0,65.0,0.0,0.0,0.0


In [3]:
# cases 
cases = ['Confirmed', 'Deaths', 'Recovered']
# filling missing values 
data[['Province/State']] = data[['Province/State']].fillna('')
data[cases] = data[cases].fillna(0)
data
                                

Unnamed: 0,Date,Country/Region,Province/State,Lat,Long,Confirmed,Recovered,Deaths
0,2020-01-22,Afghanistan,,33.00,65.00,0.00,0.00,0.00
1,2020-01-23,Afghanistan,,33.00,65.00,0.00,0.00,0.00
2,2020-01-24,Afghanistan,,33.00,65.00,0.00,0.00,0.00
3,2020-01-25,Afghanistan,,33.00,65.00,0.00,0.00,0.00
4,2020-01-26,Afghanistan,,33.00,65.00,0.00,0.00,0.00
...,...,...,...,...,...,...,...,...
41113,2020-06-19,Zimbabwe,,-20.00,30.00,479.00,63.00,4.00
41114,2020-06-20,Zimbabwe,,-20.00,30.00,479.00,63.00,4.00
41115,2020-06-21,Zimbabwe,,-20.00,30.00,489.00,64.00,6.00
41116,2020-06-22,Zimbabwe,,-20.00,30.00,512.00,64.00,6.00


In [4]:
full_latest = data[data['Date'] == max(data['Date'])].reset_index()
full_latest

Unnamed: 0,index,Date,Country/Region,Province/State,Lat,Long,Confirmed,Recovered,Deaths
0,153,2020-06-23,Afghanistan,,33.00,65.00,29481.00,9260.00,618.00
1,307,2020-06-23,Albania,,41.15,20.17,2047.00,1195.00,45.00
2,461,2020-06-23,Algeria,,28.03,1.66,12076.00,8674.00,861.00
3,615,2020-06-23,Andorra,,42.51,1.52,855.00,797.00,52.00
4,769,2020-06-23,Angola,,-11.20,17.87,189.00,77.00,10.00
...,...,...,...,...,...,...,...,...,...
262,40501,2020-06-23,West Bank and Gaza,,31.95,35.23,1169.00,442.00,3.00
263,40655,2020-06-23,Western Sahara,,24.22,-12.89,10.00,8.00,1.00
264,40809,2020-06-23,Yemen,,15.55,48.52,992.00,356.00,261.00
265,40963,2020-06-23,Zambia,,-15.42,28.28,1477.00,1213.00,18.00


In [5]:
full_latest["Confirmed"].sum(axis=0)

9263466.0

In [6]:
data_groupby= full_latest.groupby('Country/Region')['Confirmed','Recovered','Deaths'].sum().reset_index()
data_groupby['Mortality Rate']= round((data_groupby['Deaths']/data_groupby['Confirmed'])*100, 2)
data_groupby.head()

Unnamed: 0,Country/Region,Confirmed,Recovered,Deaths,Mortality Rate
0,Afghanistan,29481.0,9260.0,618.0,2.1
1,Albania,2047.0,1195.0,45.0,2.2
2,Algeria,12076.0,8674.0,861.0,7.13
3,Andorra,855.0,797.0,52.0,6.08
4,Angola,189.0,77.0,10.0,5.29


## Top 30 Countries By Confirmed Cases


In [7]:
data_groupby.sort_values(by='Confirmed', ascending=False, ignore_index=True).head(30).style.background_gradient(cmap='Reds')



Unnamed: 0,Country/Region,Confirmed,Recovered,Deaths,Mortality Rate
0,US,2347022.0,647548.0,121228.0,5.17
1,Brazil,1145906.0,627963.0,52645.0,4.59
2,Russia,598878.0,355847.0,8349.0,1.39
3,India,456183.0,258685.0,14476.0,3.17
4,United Kingdom,307682.0,1330.0,43011.0,13.98
5,Peru,260810.0,148437.0,8404.0,3.22
6,Chile,250767.0,210570.0,4505.0,1.8
7,Spain,246752.0,150376.0,28325.0,11.48
8,Italy,238833.0,184585.0,34675.0,14.52
9,Iran,209970.0,169160.0,9863.0,4.7


In [8]:
data_groupby.sort_values?

# Top 30 Countries By Mortality rate


In [9]:
fig = plt.figure()
sns.set_color_codes("pastel")
ax = sns.barplot(x="Mortality Rate", y="Country/Region", data=data_groupby.sort_values(by='Mortality Rate', ascending=False).head(30))

<IPython.core.display.Javascript object>

In [17]:

data_groupby_date = data.groupby('Date')['Confirmed','Recovered','Deaths'].sum().reset_index()

data_groupby_date.head()


Unnamed: 0,Date,Confirmed,Recovered,Deaths
0,2020-01-22,555.0,28.0,17.0
1,2020-01-23,654.0,30.0,18.0
2,2020-01-24,941.0,36.0,26.0
3,2020-01-25,1434.0,39.0,42.0
4,2020-01-26,2118.0,52.0,56.0


In [18]:
data_groupby_date = data_groupby_date.melt(id_vars="Date",
                 value_vars=['Confirmed', 'Deaths', 'Recovered'])
data_groupby_date.head()

Unnamed: 0,Date,variable,value
0,2020-01-22,Confirmed,555.0
1,2020-01-23,Confirmed,654.0
2,2020-01-24,Confirmed,941.0
3,2020-01-25,Confirmed,1434.0
4,2020-01-26,Confirmed,2118.0


In [19]:
fig = plt.figure()
sns.set_color_codes("pastel")
ax = sns.lineplot(x="Date", y="value", hue="variable", data=data_groupby_date)

ax.xaxis.set_major_locator(plt.MaxNLocator(10));


<IPython.core.display.Javascript object>

In [5]:

url= 'https://raw.githubusercontent.com/datasets/covid-19/master/data/us_confirmed.csv'
data1 = pd.read_csv('https://raw.githubusercontent.com/datasets/covid-19/master/data/us_confirmed.csv', parse_dates=['Date'])
data1.head()

Unnamed: 0,UID,iso2,iso3,code3,FIPS,Admin2,Lat,Combined_Key,Date,Case,Long,Country/Region,Province/State
0,16,AS,ASM,16,60.0,,-14.27,"American Samoa, US",2020-01-22,0,-170.13,US,American Samoa
1,16,AS,ASM,16,60.0,,-14.27,"American Samoa, US",2020-01-23,0,-170.13,US,American Samoa
2,16,AS,ASM,16,60.0,,-14.27,"American Samoa, US",2020-01-24,0,-170.13,US,American Samoa
3,16,AS,ASM,16,60.0,,-14.27,"American Samoa, US",2020-01-25,0,-170.13,US,American Samoa
4,16,AS,ASM,16,60.0,,-14.27,"American Samoa, US",2020-01-26,0,-170.13,US,American Samoa


In [13]:
full = data1[data1['Date'] == max(data1['Date'])].reset_index()
full_groupby= full.groupby('Province/State')['Case'].sum().reset_index()
# full_groupby.sort_values(by='Case', ascending=False, ignore_index=True).head(30).style.background_gradient(cmap='Reds')
full1= full_groupby.sort_values(by='Case', ascending=False, ignore_index=True).head(20)
fig = plt.figure()
sns.set_color_codes("pastel")
ax = sns.barplot(x="Case", y="Province/State", data=full1)
plt.title('20 States in the U.S. hit hard by COVID-19') 
  


#Saving the plot as an image
fig.savefig('line plot.jpg', bbox_inches='tight', dpi=150)
plt.show()


<IPython.core.display.Javascript object>