# RAMP on relative humidity prediction in Morocco

*Zakaria Akil, Zakarya Elmimouni, Ahmed Khairaldin, Yassine Oj, Khadija slim*

## Introduction

Relative Humidity is a measure of how much moisture is in the air compared to the maximum amount it can hold at a given temperature. It is expressed as a percentage, where 100% means the air is fully saturated with water vapor and cannot hold any more, leading to possible condensation (such as dew or fog).

The ability of air to hold moisture depends on temperature—warmer air can contain more water vapor, while cooler air holds less. This is why humidity often feels higher in warm weather, even if the actual amount of water vapor in the air hasn’t changed. For example, if the relative humidity is 50%, it means the air contains half the moisture it could potentially hold at that temperature. A high relative humidity (like 80–90%) makes the air feel damp and heavy, while a low relative humidity (like 20–30%) makes it feel dry, which can cause discomfort such as dry skin or irritation.


### Why relative humidity in Morocco ?

In Morocco, relative humidity plays a vital role in predicting droughts, managing water resources, and understanding climate variability. The country’s semi-arid and arid regions are highly sensitive to fluctuations in humidity, which affect soil moisture and crop yields. Low relative humidity accelerates evaporation from reservoirs and irrigation systems, intensifying water scarcity. In coastal and mountainous areas, humidity variations influence cloud formation and precipitation patterns, impacting agriculture and hydropower generation. By integrating relative humidity data into climate models, scientists and policymakers can better anticipate drought risks and optimize water management strategies to mitigate their effects.

### Dataset

Our data comes from the ERA5 data archive, which provides hourly estimates for a large number of atmospheric, ocean-wave, and land-surface quantities. We collected data on pressure levels from 2020 to the present (01/03/2025) for both public and private datasets.  

Our dataset contains observations spaced 4 days apart within this period for geographical coordinates in Morocco and some adjacent land areas. The variables included in our dataset are:

- **Temperature**:  
  Temperature affects the saturation point of air, directly influencing relative humidity. Warmer air can hold more moisture, impacting humidity levels.

- **Wind (`u-component`, `v-component`)**:  
  Wind transports moisture and heat, influencing local humidity levels. Wind patterns determine moisture advection and drying effects in a region.

- **Cloud Cover**:  
  Clouds impact the radiation balance and surface evaporation, affecting local humidity levels.

- **Vertical Velocity**:  
  Vertical air movements influence condensation and evaporation processes, regulating relative humidity through adiabatic cooling and warming.

- **Geopotential**:  
  Geopotential height is related to atmospheric pressure and influences moisture transport and condensation processes.

- **Ozone**:  
  Ozone affects atmospheric temperature and stability, indirectly impacting humidity distribution.

- **Vorticity**:  
  Vorticity is an indicator of swirling air movements, which can redistribute moisture and influence humidity patterns.

- **Divergence**:  
  Wind divergence is related to the convergence and divergence of air masses, affecting humidity by regulating moisture fluxes in the atmosphere.

- **Target variable**: relative_humidity


### Table Structure:
Each row in the table represents an observation at a given time, at a fixed pressure level (1000 hPa), and at a specific geographic position (`latitude`, `longitude`). The columns contain the values of the different variables.

For the public dataset, we kept observations from 2020 until the end of 2022 as the training set and observations from 2023 as the test set. For the private data, we aim to predict relative humidity for the year 2024 and the beginning of 2025 using data from 2020 until the end of 2023 as the training set. The idea is that predicting relative humidity one year in advance is crucial for effective planning in various sectors, including agriculture, water resource management, and climate adaptation strategies.


## Submission

The goal of the RAMP is to predict the relative humidity at a specific coordinate at a future time instant, spaced 4 time steps apart. Here is an example of a scikit-learn pipeline to fit the training input data using a subset of the columns.

In [17]:
import numpy as np
import pandas as pd

from sklearn.preprocessing import FunctionTransformer
from sklearn.compose import make_column_transformer
from sklearn.pipeline import make_pipeline
from sklearn.ensemble import RandomForestRegressor
from sklearn.impute import SimpleImputer


def extract_date_components(df):
    df = df.copy()
    df["valid_time"] = pd.to_datetime(df["valid_time"])
    df["year"] = df["valid_time"].dt.year
    df["month"] = df["valid_time"].dt.month
    df["day"] = df["valid_time"].dt.day
    
    return df.drop(columns=["valid_time"])

datetime_transformer = FunctionTransformer(extract_date_components)

cols = [
    'latitude',
    'longitude',
    'temperature',
    'divergence',
    'u_component_wind',
    'v_component_wind',
    'cloud_cover'
]

transformer = make_column_transformer(
    (datetime_transformer, ["valid_time"]),
    ('passthrough', cols)
)

pipe = make_pipeline(
    transformer,
    SimpleImputer(strategy='mean'),
    RandomForestRegressor(max_depth=5, n_estimators=10)
)


def get_estimator():
    return pipe

### Testing using a scikit-learn pipeline

In [18]:
from sklearn.model_selection import cross_val_score

df = pd.read_csv("Data_extraction/data/public/train.csv")
X_df, y = df.drop(columns=["relative_humidity"]), df["relative_humidity"]

scores = cross_val_score(get_estimator(), X_df, y, cv=2, scoring='neg_mean_squared_error')
print(scores)

[-134.18945296 -146.9081392 ]
