# 📉PREDICTING COVID-19 

in this notebook . We are going to go through an example machine learning project with the goal of predicting the covid cases in the Philippines📉.

Coronavirus is a family of viruses that are named after their spiky crown. The novel coronavirus, also known as SARS-CoV-2, is a contagious respiratory virus that first reported in Wuhan, China. On 2/11/2020, the World Health Organization designated the name COVID-19 for the disease caused by the novel coronavirus. This notebook aims at exploring COVID-19 through data analysis and projections.

Currently the goal of all scientists around the world is to "Flatten the Curve". COVID-19 currently has exponential growth rate around the world which we will be seeing in the notebook ahead. Flattening the Curve typically implies even if the number of Confirmed Cases are increasing but the distribution of those cases should be over longer timestamp. To put it in simple words if say suppose COVID-19 is going infect 100K people then those many people should be infected in 1 year but not in a month.

# 1.) Problem definition
> How well can we predict the future covid cases, given it's properties and previous cases of how many cases had been infected.
> Objective of this notebook is to study COVID-19 outbreak with the help of some basic visualizations techniques. Comparison of China where the COVID-19 originally originated from with the Rest of the World. Perform predictions and Time Series forecasting in order to study the impact and spread of the COVID-19 in comming days.



# 2.) Data
> The data has been downloaded from WHO and extracted the phillipines dataset for Covid-19 cases. Coronavirus Case Data is provided by Johns Hopkins University


# 3.) Evaluation

# 4.) Features

In [1]:
import pandas as pd 
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt 
import sklearn 

In [2]:
philippine_df = pd.read_csv("https://raw.githubusercontent.com/zalven/covid-19-prediction-2021/main/covid_prediction_data/philippines_covid_cases.csv")

In [3]:


philippine_df.head()

Unnamed: 0,iso_code,continent,location,date,total_cases,new_cases,new_cases_smoothed,total_deaths,new_deaths,new_deaths_smoothed,...,extreme_poverty,cardiovasc_death_rate,diabetes_prevalence,female_smokers,male_smokers,handwashing_facilities,hospital_beds_per_thousand,life_expectancy,human_development_index,excess_mortality
0,PHL,Asia,Philippines,2020-01-30,1.0,1.0,,,,,...,,370.437,7.07,7.8,40.8,78.463,1.0,71.23,0.718,
1,PHL,Asia,Philippines,2020-01-31,1.0,0.0,,,,,...,,370.437,7.07,7.8,40.8,78.463,1.0,71.23,0.718,2.83
2,PHL,Asia,Philippines,2020-02-01,1.0,0.0,,,,,...,,370.437,7.07,7.8,40.8,78.463,1.0,71.23,0.718,
3,PHL,Asia,Philippines,2020-02-02,2.0,1.0,,1.0,1.0,,...,,370.437,7.07,7.8,40.8,78.463,1.0,71.23,0.718,
4,PHL,Asia,Philippines,2020-02-03,2.0,0.0,,1.0,0.0,,...,,370.437,7.07,7.8,40.8,78.463,1.0,71.23,0.718,


In [4]:
# train, test = train_test_split( philippine_df , test_size = 0.25 , random_state = 42)

In [5]:
# train2, validation = train_test_split(train , test_size = 0.15 , random_state = 42)

In [6]:
print("Size/Shape of the dataset: ",philippine_df.shape)
print("Checking for null values:\n",philippine_df.isnull().sum())
print("Checking Data-type of each column:\n",philippine_df.dtypes)

Size/Shape of the dataset:  (499, 60)
Checking for null values:
 iso_code                                   0
continent                                  0
location                                   0
date                                       0
total_cases                                0
new_cases                                  0
new_cases_smoothed                         5
total_deaths                               3
new_deaths                                 3
new_deaths_smoothed                        5
total_cases_per_million                    0
new_cases_per_million                      0
new_cases_smoothed_per_million             5
total_deaths_per_million                   3
new_deaths_per_million                     3
new_deaths_smoothed_per_million            5
reproduction_rate                         47
icu_patients                             499
icu_patients_per_million                 499
hosp_patients                            499
hosp_patients_per_million          

# Converting "Observation Date" into Datetime format

In [7]:
philippine_df["date"]=pd.to_datetime(philippine_df["date"])

# Grouping different types of cases as per the date

In [8]:
datewise=philippine_df.groupby(["date"]).agg({"new_cases":'sum',"new_deaths":'sum' , 'people_vaccinated': 'sum'})
datewise["Days Since"]=datewise.index-datewise.index.min()
datewise

Unnamed: 0_level_0,new_cases,new_deaths,people_vaccinated,Days Since
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2020-01-30,1.0,0.0,0.0,0 days
2020-01-31,0.0,0.0,0.0,1 days
2020-02-01,0.0,0.0,0.0,2 days
2020-02-02,1.0,1.0,0.0,3 days
2020-02-03,0.0,0.0,0.0,4 days
...,...,...,...,...
2021-06-07,6526.0,71.0,4491948.0,494 days
2021-06-08,4769.0,95.0,4632826.0,495 days
2021-06-09,5444.0,126.0,0.0,496 days
2021-06-10,7470.0,122.0,0.0,497 days


In [10]:
print("Basic Information")

print("Total number of Confirmed Cases around the World: ",datewise["new_cases"].iloc[-1])
print("Total number of Recovered Cases around the World: ",datewise["people_vaccinated"].iloc[-1])
# print("Total number of Deaths Cases around the World: ",datewise["Deaths"].iloc[-1])
# print("Total number of Active Cases around the World: ",(datewise["Confirmed"].iloc[-1]-datewise["Recovered"].iloc[-1]-datewise["Deaths"].iloc[-1]))
# print("Total number of Closed Cases around the World: ",datewise["Recovered"].iloc[-1]+datewise["Deaths"].iloc[-1])
# print("Approximate number of Confirmed Cases per Day around the World: ",np.round(datewise["Confirmed"].iloc[-1]/datewise.shape[0]))
# print("Approximate number of Recovered Cases per Day around the World: ",np.round(datewise["Recovered"].iloc[-1]/datewise.shape[0]))
# print("Approximate number of Death Cases per Day around the World: ",np.round(datewise["Deaths"].iloc[-1]/datewise.shape[0]))
# print("Approximate number of Confirmed Cases per hour around the World: ",np.round(datewise["Confirmed"].iloc[-1]/((datewise.shape[0])*24)))
# print("Approximate number of Recovered Cases per hour around the World: ",np.round(datewise["Recovered"].iloc[-1]/((datewise.shape[0])*24)))
# print("Approximate number of Death Cases per hour around the World: ",np.round(datewise["Deaths"].iloc[-1]/((datewise.shape[0])*24)))
# print("Number of Confirmed Cases in last 24 hours: ",datewise["Confirmed"].iloc[-1]-datewise["Confirmed"].iloc[-2])
# print("Number of Recovered Cases in last 24 hours: ",datewise["Recovered"].iloc[-1]-datewise["Recovered"].iloc[-2])
# print("Number of Death Cases in last 24 hours: ",datewise["Deaths"].iloc[-1]-datewise["Deaths"].iloc[-2])

Basic Information
Total number of Confirmed Cases around the World:  6662.0
Total number of Recovered Cases around the World:  0.0


# Problem solving
