# Project Title: Exploring the Impact of COVID-19: A Multi-faceted Analysis

# Temporal Analysis:

## Evolution of COVID-19 Cases, Deaths, and Recoveries:

Early Phase (December 2019 - March 2020): Initial outbreak in Wuhan, China, followed by gradual global spread. Exponential growth in cases, concentrated in Asia and Europe.

First Wave (April 2020 - June 2020): Widespread community transmission across continents. Lockdowns and public health measures implemented globally, leading to a plateau in cases in some regions.

Summer 2020: Relative lull in transmission for some countries, while others experienced secondary waves.
Fall/Winter 2020-2021: Resurgence of cases driven by colder weather and new virus variants. Record highs in daily cases and deaths globally.

Vaccination Rollout (2021 onwards): Gradual decline in cases and deaths in countries with high vaccination rates. Emergence of new variants posing challenges.

Present Day (January 2024): Fluctuations in case numbers depending on variant dominance and public health measures. Focus on managing long-term impacts and equitable access to vaccines and boosters.

## Key Milestones:

December 2019: First reported cases in Wuhan, China.

January 2020: WHO declares COVID-19 a public health emergency.

March 2020: Pandemic declared.

December 2020: First vaccines approved for emergency use.

January 2021: Vaccination campaigns begin globally.

November 2021: New omicron variant detected, causing another wave of infections.

# Geographic Distribution:
## Regions and Countries:

Highest case burden: North America, Europe, South America, India, Southeast Asia.

Highest death rates: Central and Eastern Europe, South America, Africa.

Highest recovery rates: Oceania, Western Europe, East Asia.

# Data Collection:

In [32]:
import pandas as pd # data manipulation , analysis, cleaning 
import numpy as np # mathmatical calculations
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

In [33]:
country_wise = pd.read_csv('dataset/country_wise_latest.csv')
day_wise = pd.read_csv('dataset/day_wise.csv')
worldometer_data = pd.read_csv('dataset/worldometer_data.csv')
DD_CW1 = pd.read_csv('dataset/full_grouped.csv')
DD_CW2 = pd.read_csv('dataset/covid_19_clean_complete.csv')
usa = pd.read_csv('dataset/usa_county_wise.csv')

# Data Exploration:

# Country wise Analysis

In [34]:
country_wise.head()

Unnamed: 0,Country/Region,Confirmed,Deaths,Recovered,Active,New cases,New deaths,New recovered,Deaths / 100 Cases,Recovered / 100 Cases,Deaths / 100 Recovered,Confirmed last week,1 week change,1 week % increase,WHO Region
0,Afghanistan,36263,1269,25198,9796,106,10,18,3.5,69.49,5.04,35526,737,2.07,Eastern Mediterranean
1,Albania,4880,144,2745,1991,117,6,63,2.95,56.25,5.25,4171,709,17.0,Europe
2,Algeria,27973,1163,18837,7973,616,8,749,4.16,67.34,6.17,23691,4282,18.07,Africa
3,Andorra,907,52,803,52,10,0,0,5.73,88.53,6.48,884,23,2.6,Europe
4,Angola,950,41,242,667,18,1,0,4.32,25.47,16.94,749,201,26.84,Africa


## 1. Total confirmed cases from each country

In [35]:
def plot_map(df, location_names,location_mode,data_col,scope,hover_name=None,title=None,palette='Sunset'):
    if hover_name == None: 
        hover_name = location_names
    fig = px.choropleth(df, 
                        locations=location_names,
                        locationmode =location_mode,
                        color = data_col,
                        scope = scope,
                        hover_name = hover_name,
                        hover_data = data_col,
                        title = title,
                        color_continuous_scale = palette)
    fig.update_layout(margin={"r":0,"l":0,"b":0})
    fig.show()
    
plot_map(country_wise,location_names='Country/Region',location_mode='country names',data_col='Confirmed',scope='world',palette='Peach',title='Confirmed cases in world')
# plot_map(country_wise,location_names='Country/Region',location_mode='country names',data_col='Deaths',scope='world',palette='amp', title='Death cases in world')
plot_map(country_wise,location_names='Country/Region',location_mode='country names',data_col='Recovered',scope='world',palette='Greens',title='Recovered cases in world')
# plot_map(country_wise,location_names='Country/Region',location_mode='country names',data_col='Active',scope='world',palette='Oranges',title='Active cases in world')

* USA has most number of confirmed cases approx. - (4.29M) followed by Brazil - (2.44M) and India - (1.48M).
* USA stood first with highest number of Death cases - (148K) followed by Brazil - (87.6K) and India - (33.4K) cases respectively. It has highest number of Active cases as well with 2.81M cases.
* In Recovered cases , Brazil has over 1.84M cases followed by USA - 1.32M and India - 951.16K cases.

In [36]:
# top 30 countries

def get_countries(df,col,color=px.colors.qualitative.Light24):
    highest_col_cases = df.sort_values(col,ascending=False)[:30]

    fig = px.bar(highest_col_cases, 
                 x='Country/Region',
                 y=col,
                 color='WHO Region',
                 title=f'Top 30 Countries with highest {col} cases',
                 text_auto='.2s',
                 color_discrete_sequence=color
                 )
    fig.update_traces(textfont_size=12, textangle=0, textposition="outside", cliponaxis=False)
    fig.update_layout(xaxis_categoryorder = 'total descending')
    fig.show()
    
get_countries(df=country_wise,col='Confirmed',color=px.colors.qualitative.Prism)

In [37]:
get_countries(country_wise,'Deaths',px.colors.qualitative.Set1)

In [38]:
get_countries(country_wise,'Recovered',px.colors.qualitative.Dark2)

In [39]:
get_countries(country_wise,'1 week % increase',px.colors.qualitative.Set2)

The above bar charts shows top 30 countries with Confirmed, Recovered , Active and 1 week % increase cases in world.

# Data Analysis:

## Day wise

In [48]:
day_wise.head(10)

Unnamed: 0,Date,Confirmed,Deaths,Recovered,Active,New cases,New deaths,New recovered,Deaths / 100 Cases,Recovered / 100 Cases,Deaths / 100 Recovered,No. of countries
0,2020-01-22,555,17,28,510,0,0,0,3.06,5.05,60.71,6
1,2020-01-23,654,18,30,606,99,1,2,2.75,4.59,60.0,8
2,2020-01-24,941,26,36,879,287,8,6,2.76,3.83,72.22,9
3,2020-01-25,1434,42,39,1353,493,16,3,2.93,2.72,107.69,11
4,2020-01-26,2118,56,52,2010,684,14,13,2.64,2.46,107.69,13
5,2020-01-27,2927,82,61,2784,809,26,9,2.8,2.08,134.43,16
6,2020-01-28,5578,131,107,5340,2651,49,46,2.35,1.92,122.43,16
7,2020-01-29,6166,133,125,5908,588,2,18,2.16,2.03,106.4,18
8,2020-01-30,8234,171,141,7922,2068,38,16,2.08,1.71,121.28,20
9,2020-01-31,9927,213,219,9495,1693,42,78,2.15,2.21,97.26,24


In [41]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=day_wise.Date, y=day_wise.Confirmed,
                         name='Confirmed',
                         mode='lines+markers',
                         fill='tozeroy'
                    ))
fig.add_trace(go.Scatter(x=day_wise.Date, y=day_wise.Deaths,
                         name='Deaths',
                         mode= 'lines+markers',
                         fill='tozeroy'))
fig.add_trace(go.Scatter(x=day_wise.Date, y=day_wise.Recovered,
                         name='Recovered',
                         mode= 'lines+markers',
                         fill='tozeroy'))

fig.update_layout(title='Cases over time',
                 xaxis_title='Dates',
                 yaxis_title='Population')

fig.show()

Till July 27th, there were around 16.48M confirmed cases, 9.46M recovered cases and 654K deaths in world.

In [42]:
fig = px.bar(day_wise, 
             x="Date",
             y=["Deaths / 100 Cases",'Recovered / 100 Cases'],
             color_discrete_map = {'Deaths / 100 Cases': '#d43d3d', 
                                  'Recovered / 100 Cases': '#94e864',
                                  },
             barmode="stack",
             title='Change in Death and Recovered per 100 cases over time')
fig.show()

* The ratio of recovery cases with confirmed cases were much higher at all times than ratio of Death cases with confirmed cases. 
* On average around 34 recovered cases and 5 death cases were there out of 100 cases .

In [49]:
# parsing dates
DD_CW1['Date'] = pd.to_datetime(DD_CW1['Date'],errors='raise')
DD_CW1['month'] = DD_CW1['Date'].dt.month
DD_CW1['weekday'] = DD_CW1['Date'].dt.day_name()
DD_CW1.head(10)

Unnamed: 0,Date,Country/Region,Confirmed,Deaths,Recovered,Active,New cases,New deaths,New recovered,WHO Region,month,weekday
0,2020-01-22,Afghanistan,0,0,0,0,0,0,0,Eastern Mediterranean,1,Wednesday
1,2020-01-22,Albania,0,0,0,0,0,0,0,Europe,1,Wednesday
2,2020-01-22,Algeria,0,0,0,0,0,0,0,Africa,1,Wednesday
3,2020-01-22,Andorra,0,0,0,0,0,0,0,Europe,1,Wednesday
4,2020-01-22,Angola,0,0,0,0,0,0,0,Africa,1,Wednesday
5,2020-01-22,Antigua and Barbuda,0,0,0,0,0,0,0,Americas,1,Wednesday
6,2020-01-22,Argentina,0,0,0,0,0,0,0,Americas,1,Wednesday
7,2020-01-22,Armenia,0,0,0,0,0,0,0,Europe,1,Wednesday
8,2020-01-22,Australia,0,0,0,0,0,0,0,Western Pacific,1,Wednesday
9,2020-01-22,Austria,0,0,0,0,0,0,0,Europe,1,Wednesday


In [44]:
def plot_bubble(df, x, y, color=None, size=None, palette=px.colors.qualitative.G10,log=True):
    fig = px.scatter(df,
                     x=x,
                     y=y,
                     size=size,
                     color=color,
                     hover_name="Country/Region",
                     size_max=50,
                     color_discrete_sequence=palette,
                     title=f'{x} Versus {y} across countries coloured by {color}',
                     log_y=log,
                     log_x=log)
    fig.show()
    
plot_bubble(df=worldometer_data, x="TotalCases", y="TotalDeaths", size="Population", color="WHO Region", palette=px.colors.qualitative.G10)
plot_bubble(df=worldometer_data, x="TotalCases", y="TotalRecovered", size="Population", color="WHO Region", palette=px.colors.qualitative.Dark2)
# plot_bubble(df=worldometer_data, x="TotalCases", y="ActiveCases", size="Population", color="WHO Region", palette=px.colors.qualitative.D3)
# plot_bubble(df=worldometer_data, x="TotalCases", y="Serious,Critical", size="Population", color="WHO Region", palette=px.colors.qualitative.Light24)

With increase in total cases, deaths, recovered, active and serious cases also increased.

# Technical and Analytical Approach:

In [45]:
def plot_bar(df, x, y,palette=px.colors.qualitative.Pastel):
    weekday_order = ['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday']
    fig=px.bar(DD_CW1,
               x=x,
               y=y,
               facet_col='month',
               color='Country/Region',
               category_orders={"weekday": weekday_order},
               color_discrete_sequence=palette,
               title=f'{y} cases from each country on different {x} through different months')
    fig.show()
    
plot_bar(DD_CW1, 'weekday', 'Confirmed',px.colors.qualitative.Safe )

In [46]:
plot_bar(DD_CW1, 'weekday', 'Recovered',px.colors.qualitative.Alphabet )

After June, the total cases coming at each day were very close. UK had most deaths after April.

In [50]:
fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=("Date vs Confirmed", "Date vs Recovered"),
    shared_xaxes='all')

fig.add_trace(go.Scatter(x=DD_CW1['Date'], y=DD_CW1['Confirmed'], marker=dict(color='#FDB344')),
              row=1, col=1)

fig.add_trace(go.Scatter(x=DD_CW1['Date'], y=DD_CW1['Recovered'], marker=dict(color='#94e864')),
              row=1, col=2)

fig.update_xaxes(title_text="Date", row=2, col=1)
fig.update_xaxes(title_text="Date", row=2, col=2)

fig.update_yaxes(title_text="Confirmed Cases", row=1, col=1)
fig.update_yaxes(title_text="Recovered Cases", row=1, col=2)

fig.update_layout(title_text="Subplots showing different stats across Date", height=700,showlegend=False)

fig.show()

We see sudden increase in Death cases in May, which are be due to certain nation like UK,US .

The above geographical scatter plots shows the covid confirmed cases, recovered, death and active cases with time.