# Will Covid_19 Coexist with Human-Beings Forever?

#### Covid-19 is caused by a virus called SARS-CoV-2. It is a member of the coronavirus family, which also includes common viruses that cause more serious but less frequent disorders. Coronaviruses, like many other respiratory viruses, spread swiftly by droplets that are emitted from the mouth or nose when breathing, coughing, sneezing, or speaking. Covid-19 disease was firstly discovered in December, 2019 in Wuhan, China. It's very infectious and has spread rapidly over the world. Nowadays, it's almost been three years since first discovered, the dsease is still spreading in most of the countries in the world. And it may raise the problem,will Covid-19 coexist with human-beings forever? 

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import altair as alt

In [2]:
alt.data_transformers.disable_max_rows()

DataTransformerRegistry.enable('default')

#### In order to dig into the topic further more, we searched all types of dataset online and decided to use the dataset called "United States COVID-19 Cases and Deaths by State over Time" from DATA.GOV, "time_series_covid19_deaths_global.csv", and "time_series_covid19_recovered_global.csv" from John Hopkins University CSSEGISandData. The CDC publishes daily online total figures of COVID-19 cases and deaths. Based on these most recent figures provided by states, territories, and other jurisdictions, data on the COVID-19 website and the CDC's COVID Data Tracker are used. 

In [3]:
# main dataset:
URL = 'https://data.cdc.gov/api/views/9mfq-cb36/rows.csv?accessType=DOWNLOAD'
pandemic = pd.read_csv(URL)

# supportive datasets:
URL1 = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv'
global_confirm = pd.read_csv(URL1)

URL2 = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv'
global_deaths = pd.read_csv(URL2)

## Global Covid_19 Trending Analysis

In [4]:
# countries we selected
countries = ['China', 'US', 'United Kingdom', 'Australia', 'Argentina', 
             'Brazil', 'Canada', 'Denmark', 'France', 'India', 'Korea, South', 
             'Japan', 'Russia', 'Singapore', 'Ukraine', 'Qatar', 'South Africa', 
             'Italy', 'Poland', 'Spain', 'Saudi Arabia', 'Norway']

# Extract these countries' data from the big data frame
def extract_specific_country_df(countries, process_df, df):
    
    for country in countries:
        sub_data = df.loc[df['Country/Region'] == country]
        
        # 4 sepecific countries we need to do the drop abnormal data
        if country in ['China', 'United Kingdom', 'Denmark', 'France']:
            sub_data = sub_data.dropna()
        
        process_df.append(sub_data)
    
    return process_df


# Concatenate these countries' data into the big data frame
def concatenate_df(df_to_process, concat_df, val_name, columns):
    
    for x in df_to_process:
        x = x.reset_index().melt('Country/Region', value_vars=columns, var_name='Date', value_name=val_name)
        x['Year-Month'] = pd.to_datetime(x['Date']).dt.strftime('%Y-%m')
        data = x.groupby(['Country/Region', 'Year-Month']).sum().reset_index()
        concat_df.append(data)
        
    final_df = pd.concat(concat_df)    
    
    return final_df

In [5]:
process_df_confirm = [] # an array contains dataframes of the confirmed cases we need to process
concate_df_confirm = [] # an array contains dataframes we need to concatenate
columns = global_confirm.columns[4:] # drop the columns we will not use
df_to_process_confirmed = extract_specific_country_df(countries, process_df_confirm, global_confirm)
global_confirmed_data = concatenate_df(df_to_process_confirmed, concate_df_confirm, 'Confirmed Cases', columns)

process_df_death = [] # an array contains dataframes of the death cases we need to process
concate_df_death = [] # an array contains dataframes we need to concatenate
columns = global_deaths.columns[4:] # drop the columns we will not use
df_to_process_death = extract_specific_country_df(countries, concate_df_death, global_deaths)
global_death_data = concatenate_df(df_to_process_death, process_df_death, 'Death Cases', columns)

print('Other Countries Confirmed Cases Distribution:')
print(global_confirmed_data)
print()
print('Other Countries Death Cases Distribution:')
print(global_death_data)

Other Countries Confirmed Cases Distribution:
   Country/Region Year-Month  Confirmed Cases
0           China    2020-01            38008
1           China    2020-02          1633361
2           China    2020-03          2515196
3           China    2020-04          2500064
4           China    2020-05          2605289
..            ...        ...              ...
31         Norway    2022-08         45207680
32         Norway    2022-09         43840346
33         Norway    2022-10         45370654
34         Norway    2022-11         44009523
35         Norway    2022-12          1469233

[792 rows x 3 columns]

Other Countries Death Cases Distribution:
   Country/Region Year-Month  Death Cases
0           China    2020-01          889
1           China    2020-02        46417
2           China    2020-03        98488
3           China    2020-04       118290
4           China    2020-05       143763
..            ...        ...          ...
31         Norway    2022-08       118593

#### To take a glance at the virus' spread global trending, we import two datasets recorded by JHU(link: https://github.com/CSSEGISandData/COVID-19), which have included both deaths data and confirmed data. Since the dataset contains almost 200 countries, we pick 22 countries from every corner of the world as targets. 
#### As we can see in these particular data visualizations below. You may get confused by the great drop of number at the point of December, 2022. That is because the time this report was formed, it was just the very beginning of December and the number cannot prove anything. But still in the period between January 2020 and November 2022, we can get an estimate trend of Covid-19. In January 2020, some countries like China started to have cases. Then later, some other countries started to have cases as well. The curves shows very clear that it is gradually flatten. This means the cases and deaths numbers are dropping down gradually. From these two visualizations, we can conclude that the death rate of Covid is dropping very quick, but never falls on zero. The Covid-19 seems will be coexisted with human-beings for quite a unexpected period.  

In [6]:
# process the dataframe, draw the line chart depends on which dataframe we use
def draw_line_chart(df):
    if 'Confirmed Cases' in df.columns:
        category = 'Confirmed Cases'
    else:
        category = 'Death Cases'

    df.loc[df[category]==0, category] = np.nan
    dropdown = alt.binding_select(options=df['Country/Region'].unique(), name='Country/Region')
    highlight = alt.selection(type='single', on='mouseover',
                      fields=['Country/Region'], nearest=True)
    selection = alt.selection_single(fields=['Country/Region'], bind=dropdown)

    color = alt.condition(selection,
                 alt.Color('Country/Region:N'), # the country in the dropdown list is selected
                 alt.value('lightgray')) # the country in the dropdown list is not selected

    # opacity inspired by code from "https://github.com/UIUC-iSchool-DataViz/is445_oauoag_fall2022/blob/main/week12/inClass_week12.ipynb"
    opacity = alt.condition(selection, alt.value(1.0), alt.value(0.15))

    chart_base = alt.Chart(df, title= category + " of Different Countries").encode(
        x='Year-Month',
        y=alt.Y(category+':Q', scale=alt.Scale(type='log')),
        color='Country/Region:N',
        tooltip=['Country/Region:N', category+':Q'],
    )
    
    # create the highlight part of the line through increasing the opacity of the circle
    point_chart = chart_base.mark_circle().encode(
        opacity=alt.value(0)
    ).add_selection(highlight).properties(width=800)

    line_chart = chart_base.mark_line().encode(
        color=color, 
        opacity=opacity, 
        size=alt.condition(~highlight, alt.value(1), alt.value(5))
    ).add_selection(selection)

    if category == 'Confirmed Cases':
        confirmed_line_chart = point_chart + line_chart
        return confirmed_line_chart
    else:
        death_line_chart = point_chart + line_chart
        return death_line_chart


In [7]:
countries_confirmed_lines = draw_line_chart(global_confirmed_data)
countries_confirmed_lines

In [8]:
countries_death_lines = draw_line_chart(global_death_data)
countries_death_lines

## US Domestic Trending Analysis

In [9]:
# Change date format, keep months and year only.
pandemic['submission_date'] = pd.to_datetime(pandemic['submission_date']).dt.strftime('%Y-%m')

# Eliminating negative new_case and new_death values via replacing with 0.
def flat_values(sig):
    for i in np.arange(np.size(sig)):
        if sig[i] < 0:
            sig[i] = 0
    return sig

flat_values(pandemic['new_case'])
flat_values(pandemic['new_death'])

#Extracting needed columns, and group by States and Times.
sub = pandemic.groupby(['state', 'submission_date'])['new_case','new_death'].sum().reset_index()
sub.loc[sub['new_case']==0, 'new_case']=np.nan

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  sig[i] = 0
  sub = pandemic.groupby(['state', 'submission_date'])['new_case','new_death'].sum().reset_index()


In [10]:
# Define domains for the visualizations.
state = sub.state.unique()
time = sub.submission_date.unique()

scale = alt.Scale(domain=state)
color = alt.Color('state:N',scale = scale)
brush = alt.selection_interval(encodings=['x','y'])

#### Through making comparisons among countries, there is a similar trend that the number of people influenced by the virus increases sharply at the front, than remain stable, and decreases gradually in this year. The toxicity of the virus is gradually weakening. What about the United States? Does each state has the same trend? 
#### To understand domestic situation better, we find data recorded by CDC (link: https://data.cdc.gov/). As we can see in the bubble plot below, it shows the new cases number of Covid-19 in United States within different states. In the graph, the bigger the bubble is, the bigger the number it represent case number is. Because of the difference in population, some big states like California, Texas and Florida always have the top new cases numbers. This fact is also reflected in the bar chart of sum of new cases below. Even though there are a lot of elements in the graph, we can still get an estimate trend of new cases count. For the data we have here, from March 2020 to October 2022, it some times increase and some times decrease and we don't see any significant decrease or increase trend of it. So after the analysis, we thought the Covid-19 diease is still going to be out there and may coexist with human-beings forever.

In [11]:
# First Visualization: Bubble Plot
points = alt.Chart(sub).mark_point().encode(
    alt.X('submission_date', title = 'Time/Month'),
    alt.Y('new_case', title = 'New Cases Number', scale=alt.Scale(type='log')),
    color=alt.condition(brush, color, alt.value('lightgray')),
    size=alt.Size('new_death', scale=alt.Scale(range=[0, 1178]))).add_selection(brush)

#Second Visualization: Bar Chart
bars = alt.Chart(sub).mark_bar().encode(
    y='state:N',
    x='sum(new_case):Q',
    color = alt.condition(~brush,color,alt.value('lightgray'))
).add_selection(brush).transform_filter(brush)


# Forming a Dashboard
dashboard = alt.vconcat(
    points,
    bars,
    data=sub,
    title="New Cases Count of Each State by Month (2020-Current)"
)

dashboard

#### The most likely long-term result of Covid-19 is that the disease becomes endemic in significant portions of the world, continuously circulating among people but generating fewer occurrences of severe illness. COVID-19 may eventually turn into a minor pediatric sickness, similar to the four endemic human coronaviruses that cause the common cold, years or even decades from now.

## Work Cited



1. Multi-line highlight. Multi-Line Highlight - Altair 4.2.0 documentation. (n.d.). Retrieved December 2, 2022, from https://altair-viz.github.io/gallery/multiline_highlight.html 
2. Main Dataset: https://data.cdc.gov/
3. Contextual Datasets: https://github.com/CSSEGISandData/COVID-19