# Philippine Cities Temperature Prediction

This will predict hourly temperature for 135 Philippine cities. The data set is from 2020 to 2024.

Due to file size, I stored the datasets in google drive as a backup.

Original Source:
- [Philippine Cities Weather Data (2024)](https://www.kaggle.com/datasets/bwandowando/philippine-cities-weather-data-ytd-2024)
- [Philippine Cities Weather Data (2020-2023)](https://www.kaggle.com/datasets/bwandowando/philippine-cities-weather-data-2020-2023)

Google Drive Links:
- [Philippine Cities Weather Data (2024)](https://drive.google.com/file/d/10wxJ3x6NKIGQCL5xpngcgbja6M7hvqJf/view?usp=drive_link)
- [Philippine Cities Weather Data (2020-2023)](https://drive.google.com/file/d/1cwdqQC3Idr7rfCS2AMbROv3be0eZ0SIB/view?usp=drive_link)

In [None]:
import pandas as pd
import numpy as np

import seaborn as sns
from matplotlib import  pyplot as plt

In [None]:
!curl -L -o './data/Philippine_Cities_Weather_Data_(2024).zip' 'https://www.kaggle.com/api/v1/datasets/download/bwandowando/philippine-cities-weather-data-ytd-2024'

!curl -L -o './data/Philippine_Cities_Weather_Data_(2020-2023).zip' 'https://www.kaggle.com/api/v1/datasets/download/bwandowando/philippine-cities-weather-data-2020-2023'

In [None]:
!unzip './data/Philippine_Cities_Weather_Data_(2024).zip' -d './data/2024'
!unzip './data/Philippine_Cities_Weather_Data_(2020-2023).zip' -d './data/2020-2023'

In [None]:
df_hourly_2020_2023 = pd.read_csv('data/2020-2023/hourly_data_combined_2020_to_2023.csv')
df_hourly_2024 = pd.read_csv('data/2024/hourly_data_combined_2024.csv')

In [None]:
df_hourly_2020_2023.info()

In [None]:
df_hourly_2024.info()

## Data Preparation

### Check if all the cities match in both dataset.

In [None]:
cities_2020_2023 = df_hourly_2020_2023.city_name.unique()
cities_2024 = df_hourly_2024.city_name.unique()
diff_city_2020_2023 = set(cities_2020_2023) - set(cities_2024)
diff_city_2024 = set(cities_2024) - set(cities_2020_2023)

diff_city_2020_2023, diff_city_2024

In [None]:
df_hourly_2020_2023 = df_hourly_2020_2023[df_hourly_2020_2023.city_name != 'Bago City']
df_hourly_2024 = df_hourly_2024[df_hourly_2024.city_name != 'Santiago']

In [None]:
cities_2020_2023 = df_hourly_2020_2023.city_name.unique()
cities_2024 = df_hourly_2024.city_name.unique()
diff_city_2020_2023 = set(cities_2020_2023) - set(cities_2024)
diff_city_2024 = set(cities_2024) - set(cities_2020_2023)

diff_city_2020_2023, diff_city_2024

### Fix Null Values

In [None]:
df_hourly_2020_2023.isnull().sum()

In [None]:
df_hourly_2024.isnull().sum()

In [None]:
pd.to_datetime(df_hourly_2024[df_hourly_2024.temperature_2m.isnull()].datetime).dt.date.unique()

In [None]:
df_hourly_2024.datetime.max()

Since there are too many null values in 10/23/2024 and 10/24/2024, and the max date is at 10/24/2024, these 2 dates will be removed from the dataset of 2024

In [None]:
df_hourly_2024 = df_hourly_2024[df_hourly_2024.datetime < '2024-10-23']

Snow is irrelevant to this country, since this is a tropical country. Therefore, `snowfall` and `snow_depth` values will be replaced to 0.

In [None]:
df_hourly_2020_2023['snowfall'] = 0.0
df_hourly_2020_2023['snow_depth'] = 0.0

df_hourly_2024['snowfall'] = 0.0
df_hourly_2024['snow_depth'] = 0.0

Fix null values on 2020-2023 data set.

In [None]:
pd.to_datetime(df_hourly_2020_2023[df_hourly_2020_2023.global_tilted_irradiance.isnull()].datetime)

In [None]:
pd.to_datetime(df_hourly_2020_2023[df_hourly_2020_2023.global_tilted_irradiance_instant.isnull()].datetime)

In [None]:
df_hourly_2020_2023[df_hourly_2020_2023.global_tilted_irradiance.isnull()]

In [None]:
df_hourly_2020_2023[(df_hourly_2020_2023.datetime > '2022-08-03 0000:00') & 
                    (df_hourly_2020_2023.datetime < '2022-08-04 00:00:00') &
                    (df_hourly_2020_2023.city_name == 'Vigan')].global_tilted_irradiance

In [None]:
df_hourly_2020_2023[(df_hourly_2020_2023.datetime > '2022-08-03 00:00:00') & 
                    (df_hourly_2020_2023.datetime < '2022-08-04 00:00:00') &
                    (df_hourly_2020_2023.city_name == 'Vigan')].global_tilted_irradiance_instant

Fill NA with 0

In [None]:
df_hourly_2020_2023 = df_hourly_2020_2023.fillna(0)

Recheck Null Values

In [None]:
df_hourly_2020_2023.isnull().sum()

In [None]:
df_hourly_2024.isnull().sum()

### Combine data of 2020-2023 and 2024

In [None]:
df_full = pd.concat([df_hourly_2020_2023, df_hourly_2024])

In [None]:
df_full = df_full.reset_index(drop=True)

In [None]:
df_full