# Sem-IV Mini Project Phase-II

# Topic: Covid-19 Visualisation and Prediction

### Group Members:
##### Vaibhavi Chincholkar 2019140012
##### Shria Srivastava 2019140064
##### Karen Castelino 2019140010

Coronavirus disease 2019 (COVID-19) is a contagious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). 

# What is SARS-CoV-2 ? ###

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the virus that causes coronavirus disease 2019 (COVID-19), the respiratory illness responsible for the COVID-19 pandemic.
SARS-CoV-2 is a positive-sense single-stranded RNA virus (and hence Baltimore class IV) that is contagious in humans. As described by the US National Institutes of Health, it is the successor to SARS-CoV-1, the virus that caused the 2002–2004 SARS outbreak. 

# Infection and transmission

Human-to-human transmission of SARS-CoV-2 was confirmed on 20 January 2020, during the COVID-19 pandemic. Transmission was initially assumed to occur primarily via respiratory droplets from coughs and sneezes within a range of about 1.8 metres (6 ft). Laser light scattering experiments suggest that speaking is an additional mode of transmission and a far-reaching and under-researched one, indoors, with little air flow. Other studies have suggested that the virus may be airborne as well, with aerosols potentially being able to transmit the virus. During human-to-human transmission, an average 1000 infectious SARS-CoV-2 virions are thought to initiate a new infection.

The degree to which the virus is infectious during the incubation period is uncertain, but research has indicated that the pharynx reaches peak viral load approximately four days after infection[64][65] or the first week of symptoms, and declines after

# Covid-19 Data Visualization

In [None]:
import pandas as pd    # Data Handling library
import numpy as np     # Numrical Handling Library
import matplotlib.pyplot as plt    #Data Visualization
import seaborn as sns              #Data Visualization

# COVID-19 Interactive Analysis Dashboard

In [None]:
# importing libraries
from __future__ import print_function
from ipywidgets import interact, interactive, fixed, interact_manual
from IPython.core.display import display, HTML

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
import folium
import plotly.graph_objects as go
import seaborn as sns
import ipywidgets as widgets

In [None]:
# loading data right from the source:
death_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv')
recovered_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv')
country_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/web-data/data/cases_country.csv')
confirmed_df1 = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
#confirmed_df=confirmed_df1.dropna()

In [None]:
confirmed_df = pd.read_csv('time_series_covid19_confirmed_global.csv')
confirmed_df.head()

In [None]:
recovered_df.head()

In [None]:
death_df.head()

In [None]:
country_df.head()

In [None]:
# data cleaning

# renaming the df column names to lowercase
country_df.columns = map(str.lower, country_df.columns)
confirmed_df.columns = map(str.lower, confirmed_df.columns)
death_df.columns = map(str.lower, death_df.columns)
recovered_df.columns = map(str.lower, recovered_df.columns)

# changing province/state to state and country/region to country
confirmed_df = confirmed_df.rename(columns={'province/state': 'state', 'country/region': 'country'})
recovered_df = confirmed_df.rename(columns={'province/state': 'state', 'country/region': 'country'})
death_df = death_df.rename(columns={'province/state': 'state', 'country/region': 'country'})
country_df = country_df.rename(columns={'country_region': 'country'})
# country_df.head()

In [None]:
# total number of confirmed, death and recovered cases
confirmed_total = int(country_df['confirmed'].sum())
deaths_total = int(country_df['deaths'].sum())
recovered_total = int(country_df['recovered'].sum())
active_total = int(country_df['active'].sum())

In [None]:
# displaying the total stats

display(HTML("<div style = 'background-color: #504e4e; padding: 30px '>" +
             "<span style='color: #fff; font-size:30px;'> Confirmed: "  + str(confirmed_total) +"</span>" +
             "<span style='color: red; font-size:30px;margin-left:20px;'> Deaths: " + str(deaths_total) + "</span>"+
             "<span style='color: lightgreen; font-size:30px; margin-left:20px;'> Recovered: " + str(recovered_total) + "</span>"+
             "</div>")
       )

## COVID-19 Confirmed/Death/Recovered cases by countries

### Enter number of countries you want the data for

In [None]:
# sorting the values by confirmed descednding order
# country_df.sort_values('confirmed', ascending= False).head(10).style.background_gradient(cmap='copper')
fig = go.FigureWidget( layout=go.Layout() )
def highlight_col(x):
    r = 'background-color: red'
    y = 'background-color: purple'
    g = 'background-color: grey'
    df1 = pd.DataFrame('', index=x.index, columns=x.columns)
    df1.iloc[:, 4] = y
    df1.iloc[:, 5] = r
    df1.iloc[:, 6] = g
    
    return df1

def show_latest_cases(n):
    n = int(n)
    return country_df.sort_values('confirmed', ascending= False).head(n).style.apply(highlight_col, axis=None)

interact(show_latest_cases, n='10')

ipywLayout = widgets.Layout(border='solid 2px green')
ipywLayout.display='none' # uncomment this, run cell again - then the graph/figure disappears
widgets.VBox([fig], layout=ipywLayout)

In [None]:
sorted_country_df = country_df.sort_values('confirmed', ascending= False)

In [None]:
# # plotting the 20 worst hit countries

def bubble_chart(n):
    fig = px.scatter(sorted_country_df.head(n), x="country", y="confirmed", size="confirmed", color="country",
               hover_name="country", size_max=60)
    fig.update_layout(
    title=str(n) +" Worst hit countries",
    xaxis_title="Countries",
    yaxis_title="Confirmed Cases",
    width = 700
    )
    fig.show();

interact(bubble_chart, n=10)

ipywLayout = widgets.Layout(border='solid 2px green')
ipywLayout.display='none'
widgets.VBox([fig], layout=ipywLayout)

In [None]:
def plot_cases_of_a_country(country):
    labels = ['confirmed', 'deaths']
    colors = ['blue', 'red']
    mode_size = [6, 8]
    line_size = [4, 5]
    
    df_list = [confirmed_df1, death_df]
    
    fig = go.Figure();
    
    for i, df in enumerate(df_list):
        if country == 'World' or country == 'world':
            x_data = np.array(list(df.iloc[:, 20:].columns))
            y_data = np.sum(np.asarray(df.iloc[:,4:]),axis = 0)
            
        else:    
            x_data = np.array(list(df.iloc[:, 20:].columns))
            y_data = np.sum(np.asarray(df[df['country'] == country].iloc[:,20:]),axis = 0)
            
        fig.add_trace(go.Scatter(x=x_data, y=y_data, mode='lines+markers',
        name=labels[i],
        line=dict(color=colors[i], width=line_size[i]),
        connectgaps=True,
        text = "Total " + str(labels[i]) +": "+ str(y_data[-1])
        ));
    
    fig.update_layout(
        title="COVID 19 cases of " + country,
        xaxis_title='Date',
        yaxis_title='No. of Confirmed Cases',
        margin=dict(l=20, r=20, t=40, b=20),
        paper_bgcolor="lightgrey",
        width = 800,
        
    );
    
    fig.update_yaxes(type="linear")
    fig.show();


# Check the details of your country or the World

* Enter the name of your country(in capitalized format(e.g. Italy)) and world for total cases

In [None]:
interact(plot_cases_of_a_country, country='World')

ipywLayout = widgets.Layout(border='solid 2px green')
ipywLayout.display='none' 
widgets.VBox([fig], layout=ipywLayout)

# 10 worst hit countries - Confirmed cases

In [None]:
px.bar(
    sorted_country_df.head(10),
    x = "country",
    y = "confirmed",
    title= "Top 10 worst affected countries", # the axis names
    color_discrete_sequence=["pink"], 
    height=500,
    width=800
)

# 10 worst hit countries - Death cases

In [None]:
px.bar(
    sorted_country_df.head(10),
    x = "country",
    y = "deaths",
    title= "Top 10 worst affected countries", # the axis names
    color_discrete_sequence=["pink"], 
    height=500,
    width=800
)

# Worst hit countries - Recovering cases

In [None]:
px.bar(
    sorted_country_df.head(10),
    x = "country",
    y = "recovered",
    title= "Top 10 worst affected countries", # the axis names
    color_discrete_sequence=["pink"], 
    height=500,
    width=800
)

# Global spread of COVID-19

In [None]:
world_map = folium.Map(location=[11,0], tiles="cartodbpositron", zoom_start=2, max_zoom = 6, min_zoom = 2)


for i in range(0,len(confirmed_df)):
    folium.Circle(
        location=[confirmed_df.iloc[i]['lat'], confirmed_df.iloc[i]['long']],
        fill=True,
        radius=(int((np.log(confirmed_df.iloc[i,-1]+1.00001)))+0.2)*50000,
        color='red',
        fill_color='indigo',
        tooltip = "<div style='margin: 0; background-color: black; color: white;'>"+
                    "<h4 style='text-align:center;font-weight: bold'>"+confirmed_df.iloc[i]['country'] + "</h4>"
                    "<hr style='margin:10px;color: white;'>"+
                    "<ul style='color: white;;list-style-type:circle;align-item:left;padding-left:20px;padding-right:20px'>"+
                        "<li>Confirmed: "+str(confirmed_df.iloc[i,-1])+"</li>"+
                        "<li>Deaths:   "+str(death_df.iloc[i,-1])+"</li>"+
                        "<li>Death Rate: "+ str(np.round(death_df.iloc[i,-1]/(confirmed_df.iloc[i,-1]+1.00001)*100,2))+ "</li>"+
                    "</ul></div>",
        ).add_to(world_map)

world_map


# Covid-19 Prediction using Machine Learning

# Linear Regression

In [None]:
import pandas as pd 
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error,r2_score, mean_absolute_error
from sklearn.svm import SVR
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from xgboost import XGBRegressor

In [None]:
#pip install XGBOOST

In [None]:
df = pd.read_csv("covid_19_data.csv.zip",index_col=[0])

In [None]:
df

In [None]:
df.shape

In [None]:
# Checking for null values
print("Checking for null values:\n",df.isnull().sum())

In [None]:
#Checking data type of each column
print("Checking Data-type of each column:\n",df.dtypes)

In [None]:
# Droping the Province/State column as it has a lot of missing values

In [None]:
df.drop(["Province/State"],1,inplace=True)

In [None]:
df

In [None]:
df["ObservationDate"]=pd.to_datetime(df["ObservationDate"])

In [None]:
grouped_country=df.groupby(["Country/Region","ObservationDate"]).agg({"Confirmed":'sum',"Recovered":'sum',"Deaths":'sum'})

In [None]:
grouped_country

In [None]:
grouped_country["Active Cases"]=grouped_country["Confirmed"]-grouped_country["Recovered"]-grouped_country["Deaths"]

In [None]:
grouped_country

In [None]:
datewise=df.groupby(["ObservationDate"]).agg({"Confirmed":'sum',"Recovered":'sum',"Deaths":'sum'})

In [None]:
datewise["Days Since"]=datewise.index-datewise.index.min()

In [None]:
datewise

In [None]:
print("Basic Information")
print("=======================================================================================")
print("Totol number of countries with Disease Spread: ",len(df["Country/Region"].unique()))
print("=======================================================================================")
print("Total number of Confirmed Cases around the World: ",datewise["Confirmed"].iloc[-1])
print("=======================================================================================")
print("Total number of Recovered Cases around the World: ",datewise["Recovered"].iloc[-1])
print("=======================================================================================")
print("Total number of Deaths Cases around the World: ",datewise["Deaths"].iloc[-1])
print("=======================================================================================")
print("Total number of Active Cases around the World: ",(datewise["Confirmed"].iloc[-1]-datewise["Recovered"].iloc[-1]-datewise["Deaths"].iloc[-1]))
print("=======================================================================================")
print("Approximate number of Confirmed Cases per Day around the World: ",np.round(datewise["Confirmed"].iloc[-1]/datewise.shape[0]))
print("=======================================================================================")
print("Approximate number of Recovered Cases per Day around the World: ",np.round(datewise["Recovered"].iloc[-1]/datewise.shape[0]))
print("=======================================================================================")
print("Approximate number of Death Cases per Day around the World: ",np.round(datewise["Deaths"].iloc[-1]/datewise.shape[0]))
print("=======================================================================================")
print("Approximate number of Confirmed Cases per hour around the World: ",np.round(datewise["Confirmed"].iloc[-1]/((datewise.shape[0])*24)))
print("=======================================================================================")
print("Approximate number of Recovered Cases per hour around the World: ",np.round(datewise["Recovered"].iloc[-1]/((datewise.shape[0])*24)))
print("=======================================================================================")
print("Approximate number of Death Cases per hour around the World: ",np.round(datewise["Deaths"].iloc[-1]/((datewise.shape[0])*24)))
print("=======================================================================================")
print("Number of Confirmed Cases in last 24 hours: ",datewise["Confirmed"].iloc[-1]-datewise["Confirmed"].iloc[-2])
print("=======================================================================================")
print("Number of Recovered Cases in last 24 hours: ",datewise["Recovered"].iloc[-1]-datewise["Recovered"].iloc[-2])
print("=======================================================================================")
print("Number of Death Cases in last 24 hours: ",datewise["Deaths"].iloc[-1]-datewise["Deaths"].iloc[-2])
print("=======================================================================================")

In [None]:
fig=px.bar(x=datewise.index,y=datewise["Confirmed"]-datewise["Recovered"]-datewise["Deaths"])
fig.update_layout(title="Distribution of Number of Active Cases",
                  xaxis_title="Date",yaxis_title="Number of Cases",)
fig.show()

In [None]:
fig=px.bar(x=datewise.index,y=datewise["Recovered"]+datewise["Deaths"])
fig.update_layout(title="Distribution of Number of Closed Cases",
                  xaxis_title="Date",yaxis_title="Number of Cases")
fig.show()

In [None]:
datewise["WeekOfYear"]=datewise.index.weekofyear

week_num=[]
weekwise_confirmed=[]
weekwise_recovered=[]
weekwise_deaths=[]
w=1
for i in list(datewise["WeekOfYear"].unique()):
    weekwise_confirmed.append(datewise[datewise["WeekOfYear"]==i]["Confirmed"].iloc[-1])
    weekwise_recovered.append(datewise[datewise["WeekOfYear"]==i]["Recovered"].iloc[-1])
    weekwise_deaths.append(datewise[datewise["WeekOfYear"]==i]["Deaths"].iloc[-1])
    week_num.append(w)
    w=w+1

fig=go.Figure()
fig.add_trace(go.Scatter(x=week_num, y=weekwise_confirmed,
                    mode='lines+markers',
                    name='Weekly Growth of Confirmed Cases'))
fig.add_trace(go.Scatter(x=week_num, y=weekwise_recovered,
                    mode='lines+markers',
                    name='Weekly Growth of Recovered Cases'))
fig.add_trace(go.Scatter(x=week_num, y=weekwise_deaths,
                    mode='lines+markers',
                    name='Weekly Growth of Death Cases'))
fig.update_layout(title="Weekly Growth of different types of Cases in India",
                 xaxis_title="Week Number",yaxis_title="Number of Cases",legend=dict(x=0,y=1,traceorder="normal"))
fig.show()

In [None]:
fig, (ax1,ax2) = plt.subplots(1, 2,figsize=(25,5))
sns.barplot(x=week_num,y=pd.Series(weekwise_confirmed).diff().fillna(0),ax=ax1)
sns.barplot(x=week_num,y=pd.Series(weekwise_deaths).diff().fillna(0),ax=ax2)
ax1.set_xlabel("Week Number")
ax2.set_xlabel("Week Number")
ax1.set(ylim = (0,7000000))
ax2.set(ylim = (0,110000))
ax1.set_ylabel("Number of Confirmed Cases")
ax2.set_ylabel("Number of Death Cases")
ax1.set_title("Weekly increase in Number of Confirmed Cases")
ax2.set_title("Weekly increase in Number of Death Cases")

In [None]:
fig=go.Figure()
fig.add_trace(go.Scatter(x=datewise.index, y=datewise["Confirmed"],
                    mode='lines+markers',
                    name='Confirmed Cases'))
fig.add_trace(go.Scatter(x=datewise.index, y=datewise["Recovered"],
                    mode='lines+markers',
                    name='Recovered Cases'))
fig.add_trace(go.Scatter(x=datewise.index, y=datewise["Deaths"],
                    mode='lines+markers',
                    name='Death Cases'))
fig.update_layout(title="Growth of different types of cases",
                 xaxis_title="Date",yaxis_title="Number of Cases",legend=dict(x=0,y=1,traceorder="normal"))
fig.show()

In [None]:
datewise["Mortality Rate"]=(datewise["Deaths"]/datewise["Confirmed"])*100
datewise["Recovery Rate"]=(datewise["Recovered"]/datewise["Confirmed"])*100
datewise["Active Cases"]=datewise["Confirmed"]-datewise["Recovered"]-datewise["Deaths"]
datewise["Closed Cases"]=datewise["Recovered"]+datewise["Deaths"]

print("Average Mortality Rate",datewise["Mortality Rate"].mean())
print("Median Mortality Rate",datewise["Mortality Rate"].median())
print("Average Recovery Rate",datewise["Recovery Rate"].mean())
print("Median Recovery Rate",datewise["Recovery Rate"].median())

#Plotting Mortality and Recovery Rate 
fig = make_subplots(rows=2, cols=1,
                   subplot_titles=("Recovery Rate", "Mortatlity Rate"))
fig.add_trace(
    go.Scatter(x=datewise.index, y=(datewise["Recovered"]/datewise["Confirmed"])*100,name="Recovery Rate"),
    row=1, col=1
)
fig.add_trace(
    go.Scatter(x=datewise.index, y=(datewise["Deaths"]/datewise["Confirmed"])*100,name="Mortality Rate"),
    row=2, col=1
)
fig.update_layout(height=1000,legend=dict(x=-0.1,y=1.2,traceorder="normal"))
fig.update_xaxes(title_text="Date", row=1, col=1)
fig.update_yaxes(title_text="Recovery Rate", row=1, col=1)
fig.update_xaxes(title_text="Date", row=1, col=2)
fig.update_yaxes(title_text="Mortality Rate", row=1, col=2)
fig.show()

In [None]:
# print("Average increase in number of Confirmed Cases every day: ",np.round(datewise["Confirmed"].diff().fillna(0).mean()))
print("Average increase in number of Recovered Cases every day: ",np.round(datewise["Recovered"].diff().fillna(0).mean()))
print("Average increase in number of Deaths Cases every day: ",np.round(datewise["Deaths"].diff().fillna(0).mean()))

fig=go.Figure()
fig.add_trace(go.Scatter(x=datewise.index, y=datewise["Confirmed"].diff().fillna(0),mode='lines+markers',
                    name='Confirmed Cases'))
fig.add_trace(go.Scatter(x=datewise.index, y=datewise["Recovered"].diff().fillna(0),mode='lines+markers',
                    name='Recovered Cases'))
fig.add_trace(go.Scatter(x=datewise.index, y=datewise["Deaths"].diff().fillna(0),mode='lines+markers',
                    name='Death Cases'))
fig.update_layout(title="Daily increase in different types of Cases",
                 xaxis_title="Date",yaxis_title="Number of Cases",legend=dict(x=0,y=1,traceorder="normal"))
fig.show()

In [None]:
datewise["Days Since"]=datewise.index-datewise.index[0]

In [None]:
datewise["Days Since"]=datewise["Days Since"].dt.days

In [None]:
train_ml=datewise.iloc[:int(datewise.shape[0]*0.95)]
valid_ml=datewise.iloc[int(datewise.shape[0]*0.95):]

In [None]:
lin_reg=LinearRegression()

In [None]:
lin_reg.fit(np.array(train_ml["Days Since"]).reshape(-1,1),np.array(train_ml["Confirmed"]).reshape(-1,1))

In [None]:
prediction_valid_linreg=lin_reg.predict(np.array(valid_ml["Days Since"]).reshape(-1,1))

In [None]:
np.sqrt(mean_squared_error(valid_ml["Confirmed"],prediction_valid_linreg))
print("Root Mean Square Error for Linear Regression: ",np.sqrt(mean_squared_error(valid_ml["Confirmed"],prediction_valid_linreg)))

In [None]:
r2_score(valid_ml["Confirmed"],prediction_valid_linreg)

In [None]:
plt.scatter(valid_ml["Confirmed"],valid_ml["Confirmed"], color='blue')
plt.plot(prediction_valid_linreg, prediction_valid_linreg, color='red')

# SVR

In [None]:
train_ml=datewise.iloc[:int(datewise.shape[0]*0.95)]
valid_ml=datewise.iloc[int(datewise.shape[0]*0.95):]

In [None]:
svr = SVR()

In [None]:
svr.fit(np.array(train_ml["Days Since"]).reshape(-1,1),np.array(train_ml["Confirmed"]).reshape(-1,1))

In [None]:
prediction_valid_svr=svr.predict(np.array(valid_ml["Days Since"]).reshape(-1,1))

In [None]:
np.sqrt(mean_squared_error(valid_ml["Confirmed"],prediction_valid_svr))
print("Root Mean Square Error for SVR: ",np.sqrt(mean_squared_error(valid_ml["Confirmed"],prediction_valid_svr)))

# XGBoost

In [None]:
xg = XGBRegressor()

In [None]:
xg.fit(np.array(train_ml["Days Since"]).reshape(-1,1),np.array(train_ml["Confirmed"]).reshape(-1,1))

In [None]:
prediction_valid_xg=xg.predict(np.array(valid_ml["Days Since"]).reshape(-1,1))

In [None]:
np.sqrt(mean_squared_error(valid_ml["Confirmed"],prediction_valid_xg))
print("Root Mean Square Error for XGBoost: ",np.sqrt(mean_squared_error(valid_ml["Confirmed"],prediction_valid_xg)))

# Random Forrest Regressor

In [None]:
rtr = RandomForestRegressor()

In [None]:
rtr.fit(np.array(train_ml["Days Since"]).reshape(-1,1),np.array(train_ml["Confirmed"]).reshape(-1,1))

In [None]:
prediction_valid_rtr=rtr.predict(np.array(valid_ml["Days Since"]).reshape(-1,1))

In [None]:
np.sqrt(mean_squared_error(valid_ml["Confirmed"],prediction_valid_xg))
print("Root Mean Square for Random Forrest Regressor: ",np.sqrt(mean_squared_error(valid_ml["Confirmed"],prediction_valid_xg)))

# Decision Tree Regressor

In [None]:
train_ml=datewise.iloc[:int(datewise.shape[0]*0.95)]
valid_ml=datewise.iloc[int(datewise.shape[0]*0.95):]

In [None]:
dtr = DecisionTreeRegressor()

In [None]:
dtr.fit(np.array(train_ml["Days Since"]).reshape(-1,1),np.array(train_ml["Confirmed"]).reshape(-1,1))

In [None]:
prediction_valid_dtr=dtr.predict(np.array(valid_ml["Days Since"]).reshape(-1,1))

In [None]:
np.sqrt(mean_squared_error(valid_ml["Confirmed"],prediction_valid_dtr))
print("Root Mean Square Error for Decision Tree Regressor: ", np.sqrt(mean_squared_error(valid_ml["Confirmed"],prediction_valid_dtr)))

# Polynomial Regression

In [None]:
from sklearn.preprocessing import PolynomialFeatures

In [None]:
poly = PolynomialFeatures(degree = 8)

In [None]:
train_poly = poly.fit_transform(np.array(train_ml["Days Since"]).reshape(-1,1))
valid_poly = poly.fit_transform(np.array(valid_ml["Days Since"]).reshape(-1,1))

In [None]:
lin_reg.fit(train_poly,np.array(train_ml["Confirmed"]).reshape(-1,1))

In [None]:
prediction_poly = lin_reg.predict(valid_poly)

In [None]:
np.sqrt(mean_squared_error(valid_ml["Confirmed"],prediction_poly))
print("Root Mean Square Error for Polynomial Regression with degree 8: ",np.sqrt(mean_squared_error(valid_ml["Confirmed"],prediction_poly)))

In [None]:
r2_score(valid_ml["Confirmed"],prediction_poly)

# FbProphet

In [None]:
#!pip install pystan
#!pip install fbprophet

In [None]:
import fbprophet
from fbprophet import Prophet
from fbprophet.diagnostics import performance_metrics
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
from fbprophet.plot import plot_cross_validation_metric

In [None]:
### to check all the function & modules inside Prophet
dir(Prophet)

In [None]:
df = pd.read_csv('covid_19_clean_complete.csv')

In [None]:
df.shape

In [None]:
df.dtypes

In [None]:
df['Date']=pd.to_datetime(df['Date'])

In [None]:
df.dtypes

In [None]:
df.isnull().sum()

In [None]:
df['Date'].nunique()

In [None]:
total=df.groupby(['Date'])['Confirmed','Deaths','Recovered','Active'].sum().reset_index()

In [None]:
total.head()

In [None]:
df_prophet=total.rename(columns={'Date':'ds','Confirmed':'y'})

In [None]:
df_prophet.head()

In [None]:
m=Prophet()

In [None]:
model=m.fit(df_prophet)

In [None]:
#Doing forecasting(need some Future Days) 
future_global=model.make_future_dataframe(periods=7,freq='D')

In [None]:
future_global.head()

In [None]:
df_prophet.shape

In [None]:
future_global.shape

In [None]:
df_prophet['ds'].tail()

In [None]:
future_global.tail()

In [None]:
prediction=model.predict(future_global)
prediction[['ds','yhat','yhat_lower','yhat_upper']].tail()

In [None]:
#### plot the predictions u will see these are with respect to yhat
model.plot(prediction)

### Conclusion: This is what our prediction looks like. The direction of overall case numbers is probably true, u will observe how cases rises exponentially

In [None]:
# Visualize Each Components[Trends,Weekly]
model.plot_components(prediction)

In [None]:
from fbprophet.plot import add_changepoints_to_plot

In [None]:
fig=model.plot(prediction)

a=add_changepoints_to_plot(fig.gca(),model,prediction)

In [None]:
from fbprophet.diagnostics import cross_validation

In [None]:
df_cv=cross_validation(model,horizon='7 days',period='4 days',initial='21 days')

In [None]:
df_cv.head()

In [None]:
df_cv.shape

#### Obtaining the Performance Metrics
    We use the performance_metrics utility to compute the Mean Squared Error(MSE), Root Mean Squared Error(RMSE),Mean Absolute Error(MAE), Mean Absolute Percentage Error(MAPE) and the coverage of the the yhat_lower and yhat_upper estimates.

In [None]:
df_performance=performance_metrics(df_cv)
df_performance[['horizon','rmse']].head()

In [None]:
print("The R2 score is",r2_score(df_cv.y, df_cv.yhat))

# Country Wise FbProphet

In [None]:
division = 'country'  #regional data is available for some countries
region=input("region")
prediction = 'ConfirmedCases' #ConfirmedDeaths is also available for forecasting.

In [None]:
DATA_URL = 'https://raw.githubusercontent.com/OxCGRT/covid-policy-tracker/master/data/OxCGRT_latest.csv'
full_df = pd.read_csv(DATA_URL,
                usecols=['Date','CountryName','RegionName','Jurisdiction',
                           'ConfirmedCases','ConfirmedDeaths'],
                parse_dates=['Date'],
                encoding="ISO-8859-1",
                dtype={"RegionName": str,
                        "CountryName":str})

#Filter the region we want to predict
if division == 'country':
    df = full_df[(full_df['Jurisdiction'] == 'NAT_TOTAL') 
    & (full_df['CountryName'] == region)][:-1]
elif division == 'state':
    df = full_df[(full_df['Jurisdiction'] == 'STATE_TOTAL') 
    & (full_df['RegionName'] == region)][:-1]

    
df = df[['Date',prediction]].rename(columns = {'Date':'ds', prediction:'y'})

# set how many days to forecast
forecast_length = 7
# instantiate and fit the model
m = Prophet()
model=m.fit(df)
# create the prediction dataframe 'forecast_length' days past the fit data
future = m.make_future_dataframe(periods=forecast_length)
# make the forecast to the end of the 'future' dataframe
forecast = m.predict(future)

to_plot = forecast[forecast.ds > '2020-12-01'].merge(df, how='left')

plt.figure(figsize = (10,7))
plt.plot(to_plot['ds'], to_plot['yhat'], label='Forecasted Cases')
plt.plot(to_plot['ds'], to_plot['y'], label='True Cases')
plt.fill_between(to_plot['ds'], to_plot['yhat_upper'], to_plot['yhat_lower'],
                 alpha=.2, label='Confidence')
plt.title('Facebook Prophet Forecasted COVID-19 cases')
plt.legend()
plt.savefig('prophet_forecast.png')
plt.show()
print('\n The "forecast" DataFrame \n')
forecast[['ds','yhat','yhat_lower','yhat_upper']]



In [None]:
from fbprophet.diagnostics import cross_validation
df_cv=cross_validation(model,horizon='7 days',period='4 days',initial='21 days')
from fbprophet.diagnostics import performance_metrics
df_performance=performance_metrics(df_cv)
df_performance.head()

In [None]:
#from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
print("R2 score is",r2_score(df_cv.y,df_cv.yhat))