# Philippine Cities Temperature Prediction

This will predict hourly temperature for 135 Philippine cities. The data set is from 2020 to 2024.

Due to file size, I stored the datasets in google drive as a backup.

Original Source:
- [Philippine Cities Weather Data (2024)](https://www.kaggle.com/datasets/bwandowando/philippine-cities-weather-data-ytd-2024)
- [Philippine Cities Weather Data (2020-2023)](https://www.kaggle.com/datasets/bwandowando/philippine-cities-weather-data-2020-2023)

Google Drive Links:
- [Philippine Cities Weather Data (2024)](https://drive.google.com/file/d/10wxJ3x6NKIGQCL5xpngcgbja6M7hvqJf/view?usp=drive_link)
- [Philippine Cities Weather Data (2020-2023)](https://drive.google.com/file/d/1cwdqQC3Idr7rfCS2AMbROv3be0eZ0SIB/view?usp=drive_link)

In [1]:
import pandas as pd
import numpy as np

import seaborn as sns
from matplotlib import  pyplot as plt

In [2]:
!curl -L -o './data/Philippine_Cities_Weather_Data_(2024).zip' 'https://www.kaggle.com/api/v1/datasets/download/bwandowando/philippine-cities-weather-data-ytd-2024'

!curl -L -o './data/Philippine_Cities_Weather_Data_(2020-2023).zip' 'https://www.kaggle.com/api/v1/datasets/download/bwandowando/philippine-cities-weather-data-2020-2023'

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 54.3M  100 54.3M    0     0  96.3M      0 --:--:-- --:--:-- --:--:-- 96.3M
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  273M  100  273M    0     0   179M      0  0:00:01  0:00:01 --:--:--  220M


In [None]:
!mkdir data
!mkdir data/2024
!mkdir data/2020-2023

!unzip './data/Philippine_Cities_Weather_Data_(2024).zip' -d './data/2024'
!unzip './data/Philippine_Cities_Weather_Data_(2020-2023).zip' -d './data/2020-2023'

Archive:  ./data/Philippine_Cities_Weather_Data_(2024).zip
  inflating: ./data/2024/cities.csv  
  inflating: ./data/2024/daily_data_combined_2024.csv  
  inflating: ./data/2024/daily_units_2024.csv  
  inflating: ./data/2024/hourly_data_combined_2024.csv  
  inflating: ./data/2024/hourly_units_2024.csv  
Archive:  ./data/Philippine_Cities_Weather_Data_(2020-2023).zip
  inflating: ./data/2020-2023/cities.csv  
  inflating: ./data/2020-2023/daily_data_combined_2020_to_2023.csv  
  inflating: ./data/2020-2023/daily_units_2020_to_2023.csv  
  inflating: ./data/2020-2023/hour_units_2020_to_2023.csv  
  inflating: ./data/2020-2023/hourly_data_combined_2020_to_2023.csv  


In [4]:
df_hourly_2020_2023 = pd.read_csv('data/2020-2023/hourly_data_combined_2020_to_2023.csv')
df_hourly_2024 = pd.read_csv('data/2024/hourly_data_combined_2024.csv')

In [5]:
df_hourly_2020_2023.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4944024 entries, 0 to 4944023
Data columns (total 44 columns):
 #   Column                            Dtype  
---  ------                            -----  
 0   city_name                         object 
 1   datetime                          object 
 2   temperature_2m                    float64
 3   relative_humidity_2m              float64
 4   dew_point_2m                      float64
 5   apparent_temperature              float64
 6   precipitation                     float64
 7   rain                              float64
 8   snowfall                          float64
 9   snow_depth                        float64
 10  weather_code                      float64
 11  pressure_msl                      float64
 12  surface_pressure                  float64
 13  cloud_cover                       float64
 14  cloud_cover_low                   float64
 15  cloud_cover_mid                   float64
 16  cloud_cover_high                  fl

In [6]:
df_hourly_2024.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 979824 entries, 0 to 979823
Data columns (total 44 columns):
 #   Column                            Non-Null Count   Dtype  
---  ------                            --------------   -----  
 0   city_name                         979824 non-null  object 
 1   datetime                          979824 non-null  object 
 2   temperature_2m                    974481 non-null  float64
 3   relative_humidity_2m              974481 non-null  float64
 4   dew_point_2m                      974481 non-null  float64
 5   apparent_temperature              974481 non-null  float64
 6   precipitation                     974481 non-null  float64
 7   rain                              974481 non-null  float64
 8   snowfall                          974481 non-null  float64
 9   snow_depth                        787952 non-null  float64
 10  weather_code                      974481 non-null  float64
 11  pressure_msl                      974481 non-null  f

## Data Preparation

### Check if all the cities match in both dataset.

In [7]:
cities_2020_2023 = df_hourly_2020_2023.city_name.unique()
cities_2024 = df_hourly_2024.city_name.unique()
diff_city_2020_2023 = set(cities_2020_2023) - set(cities_2024)
diff_city_2024 = set(cities_2024) - set(cities_2020_2023)

diff_city_2020_2023, diff_city_2024

({'Bago City'}, {'Santiago'})

In [8]:
df_hourly_2020_2023 = df_hourly_2020_2023[df_hourly_2020_2023.city_name != 'Bago City']
df_hourly_2024 = df_hourly_2024[df_hourly_2024.city_name != 'Santiago']

In [9]:
cities_2020_2023 = df_hourly_2020_2023.city_name.unique()
cities_2024 = df_hourly_2024.city_name.unique()
diff_city_2020_2023 = set(cities_2020_2023) - set(cities_2024)
diff_city_2024 = set(cities_2024) - set(cities_2020_2023)

diff_city_2020_2023, diff_city_2024

(set(), set())

### Fix Null Values

In [10]:
df_hourly_2020_2023.isnull().sum()

city_name                                0
datetime                                 0
temperature_2m                           0
relative_humidity_2m                     0
dew_point_2m                             0
apparent_temperature                     0
precipitation                            0
rain                                     0
snowfall                                 0
snow_depth                          596088
weather_code                             0
pressure_msl                             0
surface_pressure                         0
cloud_cover                              0
cloud_cover_low                          0
cloud_cover_mid                          0
cloud_cover_high                         0
et0_fao_evapotranspiration               0
vapour_pressure_deficit                  0
wind_speed_10m                           0
wind_speed_100m                          0
wind_direction_10m                       0
wind_direction_100m                      0
wind_gusts_

In [11]:
df_hourly_2024.isnull().sum()

city_name                                0
datetime                                 0
temperature_2m                        5304
relative_humidity_2m                  5304
dew_point_2m                          5304
apparent_temperature                  5304
precipitation                         5304
rain                                  5304
snowfall                              5304
snow_depth                          191232
weather_code                          5304
pressure_msl                          5304
surface_pressure                      5304
cloud_cover                           5304
cloud_cover_low                       5304
cloud_cover_mid                       5304
cloud_cover_high                      5304
et0_fao_evapotranspiration            5304
vapour_pressure_deficit               5304
wind_speed_10m                        5304
wind_speed_100m                       5304
wind_direction_10m                    5304
wind_direction_100m                   5304
wind_gusts_

In [12]:
pd.to_datetime(df_hourly_2024[df_hourly_2024.temperature_2m.isnull()].datetime).dt.date.unique()

array([datetime.date(2024, 10, 23), datetime.date(2024, 10, 24)],
      dtype=object)

In [13]:
df_hourly_2024.datetime.max()

'2024-10-24 23:00:00'

Since there are too many null values in 10/23/2024 and 10/24/2024, and the max date is at 10/24/2024, these 2 dates will be removed from the dataset of 2024

In [14]:
df_hourly_2024 = df_hourly_2024[df_hourly_2024.datetime < '2024-10-23']

Snow is irrelevant to this country, since this is a tropical country. Therefore, `snowfall` and `snow_depth` values will be replaced to 0.

In [15]:
df_hourly_2020_2023['snowfall'] = 0.0
df_hourly_2020_2023['snow_depth'] = 0.0

df_hourly_2024['snowfall'] = 0.0
df_hourly_2024['snow_depth'] = 0.0

Fix null values on 2020-2023 data set.

In [16]:
pd.to_datetime(df_hourly_2020_2023[df_hourly_2020_2023.global_tilted_irradiance.isnull()].datetime)

4896582   2022-08-03 06:00:00
Name: datetime, dtype: datetime64[ns]

In [17]:
pd.to_datetime(df_hourly_2020_2023[df_hourly_2020_2023.global_tilted_irradiance_instant.isnull()].datetime)

4896582   2022-08-03 06:00:00
Name: datetime, dtype: datetime64[ns]

In [18]:
df_hourly_2020_2023[df_hourly_2020_2023.global_tilted_irradiance.isnull()]

Unnamed: 0,city_name,datetime,temperature_2m,relative_humidity_2m,dew_point_2m,apparent_temperature,precipitation,rain,snowfall,snow_depth,...,diffuse_radiation,direct_normal_irradiance,global_tilted_irradiance,terrestrial_radiation,shortwave_radiation_instant,direct_radiation_instant,diffuse_radiation_instant,direct_normal_irradiance_instant,global_tilted_irradiance_instant,terrestrial_radiation_instant
4896582,Vigan,2022-08-03 06:00:00,24.8,94.0,23.9,29.9,1.9,1.9,0.0,0.0,...,2.0,0.0,,38.5,5.0,0.0,5.0,0.0,,96.5


In [19]:
df_hourly_2020_2023[(df_hourly_2020_2023.datetime > '2022-08-03 0000:00') & 
                    (df_hourly_2020_2023.datetime < '2022-08-04 00:00:00') &
                    (df_hourly_2020_2023.city_name == 'Vigan')].global_tilted_irradiance

4896576      0.0
4896577      0.0
4896578      0.0
4896579      0.0
4896580      0.0
4896581      0.0
4896582      NaN
4896583     50.0
4896584     76.0
4896585    375.0
4896586    386.0
4896587    630.0
4896588    556.0
4896589    710.0
4896590    684.0
4896591    439.0
4896592    362.0
4896593    113.0
4896594     30.0
4896595      6.0
4896596      0.0
4896597      0.0
4896598      0.0
4896599      0.0
Name: global_tilted_irradiance, dtype: float64

In [20]:
df_hourly_2020_2023[(df_hourly_2020_2023.datetime > '2022-08-03 00:00:00') & 
                    (df_hourly_2020_2023.datetime < '2022-08-04 00:00:00') &
                    (df_hourly_2020_2023.city_name == 'Vigan')].global_tilted_irradiance_instant

4896577      0.0
4896578      0.0
4896579      0.0
4896580      0.0
4896581      0.0
4896582      NaN
4896583     80.7
4896584     95.6
4896585    430.2
4896586    419.9
4896587    659.3
4896588    563.2
4896589    697.0
4896590    648.9
4896591    399.2
4896592    309.2
4896593     85.5
4896594     14.4
4896595      0.0
4896596      0.0
4896597      0.0
4896598      0.0
4896599      0.0
Name: global_tilted_irradiance_instant, dtype: float64

Fill NA with 0

In [21]:
df_hourly_2020_2023 = df_hourly_2020_2023.fillna(0)

Recheck Null Values

In [22]:
df_hourly_2020_2023.isnull().sum()

city_name                           0
datetime                            0
temperature_2m                      0
relative_humidity_2m                0
dew_point_2m                        0
apparent_temperature                0
precipitation                       0
rain                                0
snowfall                            0
snow_depth                          0
weather_code                        0
pressure_msl                        0
surface_pressure                    0
cloud_cover                         0
cloud_cover_low                     0
cloud_cover_mid                     0
cloud_cover_high                    0
et0_fao_evapotranspiration          0
vapour_pressure_deficit             0
wind_speed_10m                      0
wind_speed_100m                     0
wind_direction_10m                  0
wind_direction_100m                 0
wind_gusts_10m                      0
soil_temperature_0_to_7cm           0
soil_temperature_7_to_28cm          0
soil_tempera

In [23]:
df_hourly_2024.isnull().sum()

city_name                           0
datetime                            0
temperature_2m                      0
relative_humidity_2m                0
dew_point_2m                        0
apparent_temperature                0
precipitation                       0
rain                                0
snowfall                            0
snow_depth                          0
weather_code                        0
pressure_msl                        0
surface_pressure                    0
cloud_cover                         0
cloud_cover_low                     0
cloud_cover_mid                     0
cloud_cover_high                    0
et0_fao_evapotranspiration          0
vapour_pressure_deficit             0
wind_speed_10m                      0
wind_speed_100m                     0
wind_direction_10m                  0
wind_direction_100m                 0
wind_gusts_10m                      0
soil_temperature_0_to_7cm           0
soil_temperature_7_to_28cm          0
soil_tempera

### Combine data of 2020-2023 and 2024

In [24]:
df_full = pd.concat([df_hourly_2020_2023, df_hourly_2024])

In [25]:
df_full = df_full.reset_index(drop=True)

In [26]:
df_full

Unnamed: 0,city_name,datetime,temperature_2m,relative_humidity_2m,dew_point_2m,apparent_temperature,precipitation,rain,snowfall,snow_depth,...,diffuse_radiation,direct_normal_irradiance,global_tilted_irradiance,terrestrial_radiation,shortwave_radiation_instant,direct_radiation_instant,diffuse_radiation_instant,direct_normal_irradiance_instant,global_tilted_irradiance_instant,terrestrial_radiation_instant
0,Alaminos,2020-01-01 00:00:00,24.8,82.0,21.6,28.3,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Alaminos,2020-01-01 01:00:00,23.8,81.0,20.3,26.3,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Alaminos,2020-01-01 02:00:00,23.3,81.0,19.8,25.4,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Alaminos,2020-01-01 03:00:00,23.3,80.0,19.7,25.3,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Alaminos,2020-01-01 04:00:00,23.1,80.0,19.5,25.1,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5875099,Zamboanga City,2024-10-22 19:00:00,27.5,80.0,23.7,30.2,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5875100,Zamboanga City,2024-10-22 20:00:00,27.5,82.0,24.2,30.3,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5875101,Zamboanga City,2024-10-22 21:00:00,27.8,77.0,23.5,30.3,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5875102,Zamboanga City,2024-10-22 22:00:00,27.8,77.0,23.3,30.2,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
