## **Data Visualization**

Data Source : https://health-infobase.canada.ca/covid-19/#a2

Data Visualization part in this notebook is focusing the above data source any by doing so we are focusing on date, gender, age_group, status, rate_per_100000 columns. By visualising these data points we are trying to form segments which can be used to categorize the passengers who need immediate attention for potential RT PCR test.

In [None]:
import pandas as pd
import pandas as pd
import plotly.express as px
import plotly.graph_objs as go
import numpy as np
from matplotlib.pyplot import figure
from IPython.display import display
from plotly.graph_objects import Layout

import matplotlib as mpl
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

plt.style.use('ggplot') #available styles: https://matplotlib.org/3.1.1/gallery/style_sheets/style_sheets_reference.html

## **Data Extraction**

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import os
os.listdir('/content/drive/MyDrive')

['Untitled document.gdoc',
 'Colab Notebooks',
 'TTC_Delays-main',
 'BANK DATASET',
 'Five Alchemists',
 'ONTARIO-COVID']

In [None]:
AgeGenderDataFile = pd.read_csv('/content/drive/MyDrive/ONTARIO-COVID/ageGender.csv')
AgeGenderDataFile.drop_duplicates(keep='last',inplace=True)
AgeGenderDataFile.dropna(inplace=True)

In [None]:
AgeGenderDataFile

Unnamed: 0,date,status,age_group,gender,count,rate_per_100000
0,2020-01-18,cases,0 to 11,male,1,0.040786
1,2020-01-18,cases,0 to 11,female,1,0.042840
2,2020-01-18,cases,0 to 11,all,2,0.041788
3,2020-01-18,cases,12 to 19,female,1,0.059791
4,2020-01-18,cases,12 to 19,all,1,0.029328
...,...,...,...,...,...,...
18633,2023-05-20,deaths,80+,female,1,0.096246
18634,2023-05-20,deaths,80+,all,2,0.113615
18635,2023-05-20,deaths,all,male,1,0.005166
18636,2023-05-20,deaths,all,female,1,0.005109


In [None]:
AgeGenderDataFile.shape

(18100, 6)



```
`# This is formatted as code`
```

## **Basic bar graph of age_group as compared to rate_per_100000**
It can be easily observed that number of people above age 70 holds large proportion.

In [None]:
# Group the data by age_group and calculate the sum of count for each age group
age_group_counts = AgeGenderDataFile.loc[(AgeGenderDataFile['date'] >= '2022-12-01') & (AgeGenderDataFile['age_group'] != 'all') & (AgeGenderDataFile['gender'] != 'all')]
age_group_counts = age_group_counts.groupby('age_group')['rate_per_100000'].mean().reset_index()

# Create a bar plot using Plotly Express
fig = px.bar(age_group_counts, x='age_group', y='rate_per_100000', color='age_group')

# Update layout and labels
fig.update_layout(
    title='Count of Age Segments',
    xaxis_title='Age Segment',
    yaxis_title='Count'
)

# Show the plot
fig.show()


### **Age segmentation according to status (deaths, hospitalizations, icu)**

 Patients who falls under the age segment of 70 and above are highly likely to be hospitalized and that's why priority of getting hospital bed should be on top for these patients rather than people are between age segment 50 to 69(second priority) and 0 to 49(third priority).

In [None]:
# Group the data by age_group and calculate the sum of count for each age group
age_group_counts = AgeGenderDataFile.loc[(AgeGenderDataFile['date'] >= '2022-12-01') & (AgeGenderDataFile['age_group'] != 'all') & (AgeGenderDataFile['status'] != 'cases') & (AgeGenderDataFile['gender'] != 'all')]
age_group_counts = age_group_counts.groupby(['status','age_group'])['rate_per_100000'].sum().reset_index()

# Create a bar plot using Plotly Express
fig = px.bar(age_group_counts, x='age_group', y='rate_per_100000', color='status')

# Update layout and labels
fig.update_layout(
    title='Count of Age Segments',
    xaxis_title='Age Segment',
    yaxis_title='Count'
)

# Show the plot
fig.show()



## **Basic bar graph of age_group as compared to rate_per_100000 and gender**
It can be seen that the proportion for male and female remains the almost same for pople above age 60+ and the proportion gets slightly changes for people aged 50 and below where number of females are bit higher as compared to males

In [None]:
# Group the data by age_group and calculate the sum of count for each age group
age_group_counts = AgeGenderDataFile.loc[(AgeGenderDataFile['date'] >= '2022-12-01') & (AgeGenderDataFile['gender'] != 'all') & (AgeGenderDataFile['age_group'] != 'all')]
age_group_counts = age_group_counts.groupby(['gender','age_group'])['rate_per_100000'].mean().reset_index()

# Create a bar plot using Plotly Express
fig = px.bar(age_group_counts, x='age_group', y='rate_per_100000', color='gender')

# Update layout and labels
fig.update_layout(
    title='Count of Age Segments',
    xaxis_title='Age Segment',
    yaxis_title='Count'
)

# Show the plot
fig.show()


## **Age segmentation according to sex_age**
Overall, rate_per_100000 noted for males as compared to females is lower but the more hospital beds were occupied by males across all the age segment except 12-19 but the priority for getting a bed at hospital should be given on the basis of age segment 70+ as first priority, 50-69 as second priority and 0-49 as third priority drilling down through this category the next priority can be based on the date gap of RT PCR Report.

In [None]:
# Group the data by age_group and calculate the sum of count for each age group
age_group_counts = AgeGenderDataFile.loc[(AgeGenderDataFile['date'] >= '2022-12-01') & (AgeGenderDataFile['age_group'] != 'all') & (AgeGenderDataFile['status'] != 'cases') & (AgeGenderDataFile['gender'] != 'all')]
age_group_counts = age_group_counts.groupby(['status','gender','age_group'])['rate_per_100000'].sum().reset_index()
age_group_counts['sex_age'] = age_group_counts['gender'] + '-' + age_group_counts['status']
age_group_counts

# Create a bar plot using Plotly Express
fig = px.bar(age_group_counts, x='age_group', y='rate_per_100000', color='sex_age', barmode='group')

# Update layout and labels
fig.update_layout(
    title='Count of Age Segments',
    xaxis_title='Age Segment',
    yaxis_title='Count'
)

# Show the plot
fig.show()


In [None]:
AgeGenderDataFile['age_segment'] = AgeGenderDataFile['age_group'].apply(lambda x: '70+' if ((x == '70 to 79') | (x == '80+')) else ('50-69' if ((x == '50 to 59') | (x == '60 to 69')) else '0-49'))

In [None]:
AgeGenderDataFile['gender_age'] = AgeGenderDataFile['gender'] + '-' + AgeGenderDataFile['age_segment']


### **Hospitalization according to gender and age_group**

It can be observed that,

**Females:**
- Females aged 0-49's hospitalization rate was at pick in december at 1.7 per 100000 population which slightly decresed as time passed and by the end of may the rate dropped at 0.1.
- For females aged 50-69's hospitalization rate was at pick in december at 3.45 approximately and it remained fluctuating around that figure for a month and then it decreased drastically and reached at 0.76 by the end of may.
- Females aged 70 and more than that were having the heighest number of hospitalization per 100000 population and one of the major fall was seen in that figure from 25.84 to approximately 1.07 at may 20.



**Males:**
- Males aged 0-49 had the similar trend as females aged 0-49 where in initial days of december had 1.70 rate_per_100000 and at may 20 it dropped at 0.09
- Males aged 50-69 also had the same trend as females aged 0-49 but some fluctuation were not same after feb 18, 2023.
- Males aged 70 or more than that even had similar trend with females with same age group but it it clear that males are getting more hospitalized in the month of april.

In [None]:
age_group_counts = AgeGenderDataFile.loc[(AgeGenderDataFile['date'] >= '2022-12-01') & (AgeGenderDataFile['status'] == 'hospitalizations') & (AgeGenderDataFile['gender'] != 'all')]
age_group_by_date = age_group_counts.groupby(['date','gender_age'])['rate_per_100000'].mean().reset_index()
fig = px.area(age_group_by_date, x="date", y="rate_per_100000", color='gender_age')
fig.show()

### **ICU according to gender and age_group**

It can be observed that females and males aged 70 and more than that are having more critical issue so number are high for getting admitted to icu,

**Aged 70+:**
- Females and Males aged 70 and more than that having almost the same trend for numbers but the numbers of males surged to almost double from december 10 to december 24 while this same trend didn't match with females numbers.

**Aged 50-69+:**
- The trend for male and female for this age segment was the same except for the period 18 feb to 18 march where the numbers for females decreased and surged one again but the numbers for males surged and formed a upside down arc.

**Aged 50-69+:**
- Males and Females aged between 50 to 69 had the same trend except from initial days of feb to mid of february where the trend was totally opposite where numbers for males were increading and numbers for womens were decresing.


In [None]:
age_group_counts = AgeGenderDataFile.loc[(AgeGenderDataFile['date'] >= '2022-12-01') & (AgeGenderDataFile['status'] == 'icu') & (AgeGenderDataFile['gender'] != 'all')]
age_group_by_date = age_group_counts.groupby(['date','gender_age'])['rate_per_100000'].mean().reset_index()
fig = px.area(age_group_by_date, x="date", y="rate_per_100000", color='gender_age')
fig.show()

### **Deaths according to gender and age_group**

It is easily visible that there is ideal distribution through all the age_group and sex however the number of people in age group 70+ are having mojor proportion of numbers.

- There is a ideal distribution beween male and female in age group 70+ 
- The trend between males and females in age group 50-69 is almost ideal but the number of females remains steady from feb 12 march 12 however the numbers for males keep on fluctuating.
- The numbers for male aged 0-49 surges from December 3 to 10 and again the same surge seen from January 28 to February 11 except than that the trend was almost the same.



In [None]:
age_group_counts = AgeGenderDataFile.loc[(AgeGenderDataFile['date'] >= '2022-12-01') & (AgeGenderDataFile['status'] == 'deaths') & (AgeGenderDataFile['gender'] != 'all')]
age_group_by_date = age_group_counts.groupby(['date','gender_age'])['rate_per_100000'].mean().reset_index()
fig = px.area(age_group_by_date, x="date", y="rate_per_100000", color='gender_age')
fig.show()