## References
* [Kaggle, Coronavirus (COVID-19) Visualization & Prediction](https://www.kaggle.com/code/therealcyberlord/coronavirus-covid-19-visualization-prediction/notebook#US-Medical-Data-on-Testing)  </br>

* [Kaggle, COVID-19 - Analysis, Visualization & Comparisons](https://www.kaggle.com/code/imdevskp/covid-19-analysis-visualization-comparisons#Date-vs) </br>
* [Worldmeters Coronavirus](https://www.worldometers.info/coronavirus/#countries) </br>
* [Johns Hopkins Center for Systems Science and Engineering COVID-19 GitHub](https://github.com/CSSEGISandData/COVID-19) </br>
* [Johns Hopkins Coronavirus Resouce Center](https://coronavirus.jhu.edu/map.html) </br> 
* [World Population](https://worldpopulationreview.com/countries)</br>

# Libraries 

In [2]:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors 
import pandas as pd
import random 
import math 
import time 
from sklearn.linear_model import LinearRegression, BayesianRidge
from sklearn.model_selection import RandomizedSearchCV, train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.svm import SVR 
from sklearn.metrics import mean_squared_error, mean_absolute_error 
import datetime
import operator 
plt.style.use('seaborn-poster')
%matplotlib inline 
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('retina')
import warnings 
warnings.filterwarnings('ignore')

# interactive visualization
import plotly.express as px
import plotly.graph_objs as go
# import plotly.figure_factory as ff
from plotly.subplots import make_subplots

  set_matplotlib_formats('retina')


# Data

Query Countries

In [3]:
countries = ['Taiwan*', 'US', 'Hong Kong', 'Vietnam', 'China', 'India']
# countries = ['US']

Import Data

In [4]:
confirmed_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
deaths_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv')
recoveries_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv')
population_df = pd.read_csv('population.csv')

In [5]:
recoveries_df

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,3/27/22,3/28/22,3/29/22,3/30/22,3/31/22,4/1/22,4/2/22,4/3/22,4/4/22,4/5/22
0,,Afghanistan,33.939110,67.709953,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,,Albania,41.153300,20.168300,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,,Algeria,28.033900,1.659600,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,,Andorra,42.506300,1.521800,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,,Angola,-11.202700,17.873900,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
264,,West Bank and Gaza,31.952200,35.233200,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
265,,Winter Olympics 2022,39.904200,116.407400,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
266,,Yemen,15.552727,48.516388,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
267,,Zambia,-13.133897,27.849332,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [6]:
population_df

Unnamed: 0,Rank,name,pop2022,pop2021,GrowthRate,area,Density
0,203,Monaco,39.783,39.511,1.0069,2,19891.5000
1,115,Singapore,5943.546,5896.686,1.0079,710,8371.1915
2,105,Hong Kong,7604.299,7552.810,1.0068,1104,6887.9520
3,156,Bahrain,1783.983,1748.296,1.0204,765,2332.0039
4,176,Maldives,540.985,543.617,0.9952,300,1803.2833
...,...,...,...,...,...,...,...
204,183,Iceland,345.393,343.353,1.0059,103000,3.3533
205,146,Namibia,2633.874,2587.344,1.0180,825615,3.1902
206,173,Western Sahara,626.161,611.875,1.0233,266000,2.3540
207,137,Mongolia,3378.078,3329.289,1.0147,1564110,2.1597


Process Data

In [7]:
def daily_increase(data):
    d = [] 
    for i in range(len(data)):
        if i == 0:
            d.append(data[0])
        else:
            d.append(data[i] - data[i-1])
    return d 

def moving_average(data, window_size):
    moving_average = []
    for i in range(len(data)):
        if i < window_size:
            moving_average.append(np.mean(data[ : i]))
        else:
            moving_average.append(np.mean(data[i - window_size : i]))
    return moving_average 

window = 7


In [8]:
# ideally should be able to set start date and end date

# Get dates 
offsetDays = 300
cols = confirmed_df.keys()
date = np.array(cols)[4:-1]

for country in countries: 
    # get population 
    population = population_df[population_df['name']==country]['pop2022']
    population = int(population)/1e3
    print( population )

    
    # Search from 'Country/Region' or 'Province/State'
    searchFrom = 'Country/Region'
    aRow = confirmed_df[confirmed_df[searchFrom]==country]
    if aRow.shape[0] == 0:
        searchFrom = 'Province/State'

    # Get data
    totalConfirmedRows = confirmed_df[confirmed_df[searchFrom]==country]
    totalDeathsRows = deaths_df[deaths_df[searchFrom]==country]

    totalConfirmed = []
    totalDeaths = []
    # skip the first 4 columns. they are state, country, lat, and long 
    for i in range(4, len(date)+4):
        ConfirmedVal = sum(np.array(totalConfirmedRows)[:,i])
        DeathVal = sum(np.array(totalDeathsRows)[:,i])
        totalConfirmed.append(ConfirmedVal)
        totalDeaths.append(DeathVal)

    totalMortality = []
    for i in range(len(totalConfirmed)):
        if totalConfirmed[i] == 0:
            totalMortality.append(0)
        else: 
            totalMortality.append(totalDeaths[i]/totalConfirmed[i])

    # Create DataFrame 
    new_df = pd.DataFrame( 
            {   'Date': date[300:-1], 
                'Total Confirmed': moving_average(totalConfirmed, window)[300:-1], 
                'Daily Confirmed': moving_average(daily_increase(totalConfirmed), window)[300:-1], 
                'Total Confirmed Per 1M Population' : np.array(moving_average(totalConfirmed, window)[300:-1])/population, 
                'Total Deaths': moving_average(totalDeaths, window)[300:-1],
                'Daily Deaths': moving_average(daily_increase(totalDeaths), window)[300:-1], 
                'Total Deaths Per 1M Population' : np.array(moving_average(totalDeaths, window)[300:-1])/population, 
                'Total Mortality': moving_average(totalMortality, window)[300:-1],
                # 'Daily Mortality': daily_increase(totalMortality), 
            } 
        )


    # Save DataFrame 
    country = country.split('*')
    country = ''.join(country)
    new_df.to_csv(country+'.csv')

    print('Processing: {}'.format(country))
    # print(new_df.tail())




23.888
Processing: Taiwan
334.805
Processing: US
7.604
Processing: Hong Kong
98.953
Processing: Vietnam
1448.471
Processing: China
1406.631
Processing: India


# Visualization

In [9]:
def ploty_line(col, log_plot=False):
    df2 = pd.DataFrame()
    for country in countries:
        # read data 
        country = country.split('*')
        country = ''.join(country)
        df = pd.read_csv(country+'.csv')
        # add another column 
        df['Country'] = [country for _ in range(df['Date'].shape[0])]
        df2 = df2.append(df, ignore_index=True)

    fig = px.line(
        df2, x='Date', y=col, color='Country',
        height=600, width=750, title=col, log_y=log_plot, 
        # color_discrete_sequence = px.colors.cyclical.mygbm 
    )
    fig.update_layout(showlegend=True)     
    fig.show()
    # fig.write_image("fig.png")
    # fig.to_image(format="png", engine="orca")

    
ploty_line('Total Confirmed', log_plot=True)

ValueError: 
The orca executable is required to export figures as static images,
but it could not be found on the system path.

Searched for executable 'orca' on the following path:
    c:\Dev\MachineLearning\env\Scripts
    C:\Dev\MachineLearning\env\Scripts
    C:\Program Files\Microsoft Visual Studio\2022\Community\Msbuild\Current\Bin
    C:\Program Files\Python310\Scripts\
    C:\Program Files\Python310\
    C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin
    C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\libnvvp
    C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common
    C:\Program Files\NVIDIA Corporation\NVIDIA NvDLISR
    C:\Program Files\NVIDIA Corporation\Nsight Compute 2022.1.0\
    C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\extras\CUPTI\lib64
    C:\Dev\Cuda\zlib123dllx64\dll_x64
    C:\Windows\system32
    C:\Windows
    C:\Windows\System32\Wbem
    C:\Windows\System32\WindowsPowerShell\v1.0\
    C:\Windows\System32\OpenSSH\
    C:\Program Files\Git\cmd
    C:\msys64\mingw64\bin
    C:\Qt\6.2.3\mingw_64\bin\
    C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.30.30705\bin\Hostx64\x64
    C:\Qt\6.2.3\msvc2019_64\bin
    C:\Program Files\CMake\bin
    C:\Dev\openCV_build\build\install\x64\vc17\bin
    C:\Program Files\dotnet\
    C:\Users\centr\AppData\Local\Microsoft\WindowsApps
    
    C:\Users\centr\AppData\Local\Programs\Microsoft VS Code\bin

If you haven't installed orca yet, you can do so using conda as follows:

    $ conda install -c plotly plotly-orca

Alternatively, see other installation methods in the orca project README at
https://github.com/plotly/orca

After installation is complete, no further configuration should be needed.

If you have installed orca, then for some reason plotly.py was unable to
locate it. In this case, set the `plotly.io.orca.config.executable`
property to the full path of your orca executable. For example:

    >>> plotly.io.orca.config.executable = '/path/to/orca'

After updating this executable property, try the export operation again.
If it is successful then you may want to save this configuration so that it
will be applied automatically in future sessions. You can do this as follows:

    >>> plotly.io.orca.config.save()

If you're still having trouble, feel free to ask for help on the forums at
https://community.plot.ly/c/api/python


In [None]:
ploty_line('Total Confirmed', log_plot=True)


ValueError: 
Cannot infer image type from output path 'Total Confirmedpng'.
Please add a file extension or specify the type using the format parameter.
For example:

    >>> import plotly.io as pio
    >>> pio.write_image(fig, file_path, format='png')


In [None]:
# ,Date,Total Confirmed,Daily Confirmed,Total Deaths,Daily Deaths,Total Mortality
ploty_line('Daily Confirmed', log_plot=True)

In [None]:
ploty_line('Total Confirmed Per 1M Population', log_plot=True)

In [None]:
ploty_line('Total Deaths', log_plot=True)

In [None]:
ploty_line('Daily Deaths', log_plot=True)

In [None]:
ploty_line('Total Mortality', log_plot=True)


In [None]:
ploty_line('Total Deaths Per 1M Population', log_plot=True)