## Parameters of dataset

| Column Name          | Description |
|----------------------|------------------------------------------------------------------|
| **Date** (YYYY-MM-DD) | The specific day for which data is recorded. Data spans from 2018 to 2022. |
| **Malaria_Cases** (count) | Number of reported malaria cases per day. Dependent on rainfall, humidity, population density, and vaccinations. |
| **Chikungunya_Cases** (count) | Number of reported chikungunya cases per day. Correlated with humidity, temperature, and population density. |
| **Dengue_Cases** (count) | Number of reported dengue cases per day. Influenced by rainfall, humidity, and temperature. |
| **Avg_Temperature** (°C) | The average daily temperature (in Celsius). Higher temperatures can influence mosquito activity and disease transmission. |
| **Rainfall** (mm) | Amount of rainfall recorded in millimeters per day. Heavy rainfall creates breeding grounds for mosquitoes, affecting disease spread. |
| **Humidity** (%) | The average daily humidity percentage. Higher humidity levels are favorable for mosquito survival and disease transmission. |
| **Population_Density** (people/km²) | Number of people per square kilometer in the recorded region. Denser populations generally experience higher disease transmission rates. |
| **Day_Length** (hours) | The duration of daylight in hours for each day. Indirectly affects temperature and humidity. |
| **Vaccinations_Given** (count) | The number of vaccinations administered on a given day. A higher count is expected to reduce the number of disease cases over time. |


## Load the dataset

In [20]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

df = pd.read_csv("../../data/raw/data.csv")

df["Date"] = pd.to_datetime(df["Date"])
df["Year"] = df["Date"].dt.year
df["Month"] = df["Date"].dt.month
df["Day"] = df["Date"].dt.day
df["Weekday"] = df["Date"].dt.day_name()


## Basic Information

In [3]:
df.info()
df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1826 entries, 0 to 1825
Data columns (total 12 columns):
 #   Column              Non-Null Count  Dtype         
---  ------              --------------  -----         
 0   Date                1826 non-null   datetime64[ns]
 1   Malaria_Cases       1826 non-null   int64         
 2   Chikungunya_Cases   1826 non-null   int64         
 3   Dengue_Cases        1826 non-null   int64         
 4   Avg_Temperature     1826 non-null   float64       
 5   Rainfall            1826 non-null   float64       
 6   Humidity            1826 non-null   float64       
 7   Population_Density  1826 non-null   float64       
 8   Day_Length          1826 non-null   float64       
 9   Vaccinations_Given  1826 non-null   int64         
 10  Year                1826 non-null   int32         
 11  Month               1826 non-null   int32         
dtypes: datetime64[ns](1), float64(5), int32(2), int64(4)
memory usage: 157.1 KB


Unnamed: 0,Date,Malaria_Cases,Chikungunya_Cases,Dengue_Cases,Avg_Temperature,Rainfall,Humidity,Population_Density,Day_Length,Vaccinations_Given,Year,Month
0,2018-01-01,7,6,7,29.267709,24.254733,63.297068,1457.209854,13.41847,70,2018,1
1,2018-01-02,43,3,43,33.033575,16.68142,62.621327,980.844837,10.921979,7,2018,1
2,2018-01-03,11,7,11,28.559146,39.57791,54.264945,640.280185,11.310265,456,2018,1
3,2018-01-04,40,4,40,20.455806,22.564706,64.671933,1140.66867,10.839571,385,2018,1
4,2018-01-05,33,6,33,33.96423,9.17211,53.979787,693.131656,11.967504,400,2018,1


## Check Missing Values

In [4]:
df.isnull().sum()

Date                  0
Malaria_Cases         0
Chikungunya_Cases     0
Dengue_Cases          0
Avg_Temperature       0
Rainfall              0
Humidity              0
Population_Density    0
Day_Length            0
Vaccinations_Given    0
Year                  0
Month                 0
dtype: int64

## Summary Statistics

In [5]:
df.describe()

Unnamed: 0,Date,Malaria_Cases,Chikungunya_Cases,Dengue_Cases,Avg_Temperature,Rainfall,Humidity,Population_Density,Day_Length,Vaccinations_Given,Year,Month
count,1826,1826.0,1826.0,1826.0,1826.0,1826.0,1826.0,1826.0,1826.0,1826.0,1826.0,1826.0
mean,2020-07-01 12:00:00,39.763965,4.445783,48.694962,27.468268,24.660515,69.877969,1226.807182,11.995921,248.413472,2020.0,6.523549
min,2018-01-01 00:00:00,5.0,0.0,5.0,20.000175,0.001536,50.057134,500.361507,10.000211,0.0,2018.0,1.0
25%,2019-04-02 06:00:00,19.0,2.0,21.0,23.81692,12.044708,59.82182,861.906017,11.035964,129.0,2019.0,4.0
50%,2020-07-01 12:00:00,33.0,4.0,35.0,27.542864,24.4003,69.447367,1217.33517,11.994003,249.5,2020.0,7.0
75%,2021-09-30 18:00:00,47.0,7.0,50.0,31.104997,37.094683,80.251225,1577.051212,12.955781,367.0,2021.0,10.0
max,2022-12-31 00:00:00,146.0,9.0,237.0,34.991206,49.977885,89.978427,1998.35705,13.998021,499.0,2022.0,12.0
std,,28.802942,2.910268,43.460561,4.313082,14.345292,11.614937,425.683922,1.137093,142.444192,1.414214,3.449478


## Effect of Temperature on Dengue, Malaria, and Chikungunya Cases

In [None]:
fig = go.Figure()

fig.add_trace(go.Scatter(x=df["Avg_Temperature"], y=df["Dengue_Cases"], 
                         mode="markers", name="Dengue Cases",
                         marker=dict(size=8, color="red", opacity=0.6)))

fig.add_trace(go.Scatter(x=df["Avg_Temperature"], y=df["Malaria_Cases"], 
                         mode="markers", name="Malaria Cases",
                         marker=dict(size=8, color="blue", opacity=0.6)))

fig.add_trace(go.Scatter(x=df["Avg_Temperature"], y=df["Chikungunya_Cases"], 
                         mode="markers", name="Chikungunya Cases",
                         marker=dict(size=8, color="green", opacity=0.6)))

fig.update_layout(title="Effect of Temperature on Disease Cases",
                  xaxis_title="Average Temperature (°C)",
                  yaxis_title="Number of Cases",
                  legend_title="Disease Type")

fig.show()

## Effect of Rainfall on Dengue, Malaria, and Chikungunya Cases

In [22]:
fig = go.Figure()

fig.add_trace(go.Scatter(x=df["Rainfall"], y=df["Dengue_Cases"], 
                         mode="markers", name="Dengue Cases",
                         marker=dict(size=8, color="red", opacity=0.6)))

fig.add_trace(go.Scatter(x=df["Rainfall"], y=df["Malaria_Cases"], 
                         mode="markers", name="Malaria Cases",
                         marker=dict(size=8, color="blue", opacity=0.6)))

fig.add_trace(go.Scatter(x=df["Rainfall"], y=df["Chikungunya_Cases"], 
                         mode="markers", name="Chikungunya Cases",
                         marker=dict(size=8, color="green", opacity=0.6)))

fig.update_layout(title="Effect of Rainfall on Disease Cases",
                  xaxis_title="Rainfall (mm)",
                  yaxis_title="Number of Cases",
                  legend_title="Disease Type")

fig.show()


## Effect of Humidity on Dengue, Malaria, and Chikungunya Cases

In [23]:
fig = go.Figure()

fig.add_trace(go.Scatter(x=df["Humidity"], y=df["Dengue_Cases"], 
                         mode="markers", name="Dengue Cases",
                         marker=dict(size=8, color="red", opacity=0.6)))

fig.add_trace(go.Scatter(x=df["Humidity"], y=df["Malaria_Cases"], 
                         mode="markers", name="Malaria Cases",
                         marker=dict(size=8, color="blue", opacity=0.6)))

fig.add_trace(go.Scatter(x=df["Humidity"], y=df["Chikungunya_Cases"], 
                         mode="markers", name="Chikungunya Cases",
                         marker=dict(size=8, color="green", opacity=0.6)))

fig.update_layout(title="Effect of Humidity on Disease Cases",
                  xaxis_title="Humidity (%)",
                  yaxis_title="Number of Cases",
                  legend_title="Disease Type")

fig.show()


## Effect of Population Density on Dengue, Malaria, and Chikungunya Cases

In [24]:
fig = go.Figure()

fig.add_trace(go.Scatter(x=df["Population_Density"], y=df["Dengue_Cases"], 
                         mode="markers", name="Dengue Cases",
                         marker=dict(size=8, color="red", opacity=0.6)))

fig.add_trace(go.Scatter(x=df["Population_Density"], y=df["Malaria_Cases"], 
                         mode="markers", name="Malaria Cases",
                         marker=dict(size=8, color="blue", opacity=0.6)))

fig.add_trace(go.Scatter(x=df["Population_Density"], y=df["Chikungunya_Cases"], 
                         mode="markers", name="Chikungunya Cases",
                         marker=dict(size=8, color="green", opacity=0.6)))

fig.update_layout(title="Effect of Population Density on Disease Cases",
                  xaxis_title="Population Density (people/km²)",
                  yaxis_title="Number of Cases",
                  legend_title="Disease Type")

fig.show()


## Time Series Trend of Disease

In [25]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=df["Date"], y=df["Malaria_Cases"], mode='lines', name="Malaria Cases"))
fig.add_trace(go.Scatter(x=df["Date"], y=df["Chikungunya_Cases"], mode='lines', name="Chikungunya Cases"))
fig.add_trace(go.Scatter(x=df["Date"], y=df["Dengue_Cases"], mode='lines', name="Dengue Cases"))
fig.update_layout(title="Trends of Disease Cases Over Time", xaxis_title="Date", yaxis_title="Number of Cases")
fig.show()