# Technical Report: Confirmed Positive Cases of COVID-19 in Ontario Analysis

**Introduction**

The objective of this analysis is to understand the patterns and trends in the COVID-19 case data for Ontario, Canada. The dataset contains detailed information about confirmed positive cases, including demographic information, dates of reporting and testing, and geographical details of public health units. By exploring and analyzing this data, we aim to gain insights into the spread of COVID-19 across different regions and over time. This report will detail the data preprocessing steps, exploratory data analysis, model selection and evaluation, and the key findings from our analysis.


**Brief description of the dataset**
COVID-19 dataset containing information about COVID-19 cases in Ontario, Canada. The dataset includes several key attributes such as the dates related to the case (Accurate Episode Date, Case Reported Date, Test Reported Date, Specimen Date), demographic information (Age Group, Client Gender), and health unit details. The primary objective of this analysis is to clean the data and explore it through visualizations to gain insights into the COVID-19 cases.

**dataset Description**

In [1]:
import pandas as pd

# Path to the downloaded CSV file
data_path = r"C:\Users\ENG WAHEED\Downloads\Confirmed Positive Cases of COVID-19 in Ontario.csv"

# Load the dataset into a DataFrame
df = pd.read_csv(data_path)

# Display the first few rows of the DataFrame
print(df.head())

   Row_ID Accurate_Episode_Date Case_Reported_Date Test_Reported_Date  \
0       1            1934-09-28         2022-09-29         2022-09-29   
1       2            1989-02-21         2022-11-08         2022-11-07   
2       3            2000-03-01         2022-01-30                NaN   
3       4            2002-07-06         2022-07-06         2022-07-07   
4       5            2002-08-08         2022-08-15         2022-08-15   

  Specimen_Date Age_Group Client_Gender Outcome1  Reporting_PHU_ID  \
0    2022-09-27       <20        FEMALE      NaN              2262   
1    2022-11-06       <20        FEMALE      NaN              2270   
2    2000-03-01       <20        FEMALE      NaN              2243   
3    2002-07-06       20s        FEMALE      NaN              2270   
4    2022-08-14       60s          MALE      NaN              2233   

                                      Reporting_PHU  Reporting_PHU_Address  \
0                  Thunder Bay District Health Unit    999 Bal

In [2]:
# Display basic information about the dataset
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1717434 entries, 0 to 1717433
Data columns (total 16 columns):
 #   Column                     Dtype  
---  ------                     -----  
 0   Row_ID                     int64  
 1   Accurate_Episode_Date      object 
 2   Case_Reported_Date         object 
 3   Test_Reported_Date         object 
 4   Specimen_Date              object 
 5   Age_Group                  object 
 6   Client_Gender              object 
 7   Outcome1                   object 
 8   Reporting_PHU_ID           int64  
 9   Reporting_PHU              object 
 10  Reporting_PHU_Address      object 
 11  Reporting_PHU_City         object 
 12  Reporting_PHU_Postal_Code  object 
 13  Reporting_PHU_Website      object 
 14  Reporting_PHU_Latitude     float64
 15  Reporting_PHU_Longitude    float64
dtypes: float64(2), int64(2), object(12)
memory usage: 209.6+ MB
None


In [3]:
# Display summary statistics of the dataset
print(df.describe())

             Row_ID  Reporting_PHU_ID  Reporting_PHU_Latitude  \
count  1.717434e+06      1.717434e+06            1.717434e+06   
mean   8.587175e+05      2.685810e+03            4.396700e+01   
std    4.957806e+05      7.631429e+02            1.153449e+00   
min    1.000000e+00      2.226000e+03            4.230880e+01   
25%    4.293592e+05      2.244000e+03            4.346288e+01   
50%    8.587175e+05      2.257000e+03            4.365659e+01   
75%    1.288076e+06      3.895000e+03            4.404802e+01   
max    1.717434e+06      5.183000e+03            4.976961e+01   

       Reporting_PHU_Longitude  
count             1.717434e+06  
mean             -7.973390e+01  
std               2.396228e+00  
min              -9.448825e+01  
25%              -7.987134e+01  
50%              -7.948024e+01  
75%              -7.937936e+01  
max              -7.473630e+01  
