<table style="width:100%; border: none;">
    <tr>
        <td colspan="3" style="text-align:center; border: none;">
            <img src="assets/banner.svg" alt="Banner Image" style="width:100%;">
        </td>
    </tr>
    <!-- Add other rows and cells below if needed -->
</table>


This Notebook provides a complete workflow for downloading, processing, analyzing, and visualizing temperature data for various capital cities. Here's a step-by-step explanation of what each part of the code does:

1. **Loading Configuration**: The function `load_capitals_coordinates` reads a YAML configuration file to get the coordinates (latitude and longitude) of different capital cities.

2. **Downloading the Dataset**: The `get_cacheB_dataset` function downloads data from a specified Dataset URL. It uses the xarray library to handle the dataset efficiently.

3. **Preprocessing the Data**: The `preprocess` function extracts temperature data for a specific city, averages it over desire period, converts the temperature from Kelvin to Celsius and load the data in the machine memory.

4. **Basic Plotting**: The `basic_plot` function creates a simple line plot of the daily average temperature for a given city.

5. **Training a Forecast Model**: The `train_model` function prepares the data and trains a forecasting model using the Prophet library. It splits the data into training and testing sets [80%-20%]and fits the model to the training data. For more information please checkout the following link: [Prophet](https://facebook.github.io/prophet/docs/quick_start.html)


6. **Making Predictions**: The `make_predictions` function uses the trained model to make temperature predictions on the test data and calculates error metrics like Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE).

7. **Plotting Forecasts**: The `plot_forecast` function plots the training data, test data, and forecasted temperatures, allowing you to visually compare the model's predictions with the actual temperatures. It also provides options to display the plot and save it as an SVG file.



In [None]:
from utils import (load_capitals_coordinates, get_cacheB_dataset,
                   basic_plot, make_predictions,
                   plot_forecast, preprocess, train_model)
import ipywidgets as widgets

In [None]:
# Load capital coordinates from the YAML file
capitals_coordinates = load_capitals_coordinates('config.yaml')
# Create and display the dropdown
# Variable to store the selected coordinates
def display_coordinates(city):

    global selected_coordinates
    global selected_city
    selected_coordinates = capitals_coordinates[city]
    selected_city = city
    return selected_coordinates, selected_city

# Create and display the dropdown
widgets.interact(display_coordinates, city=sorted(list(capitals_coordinates.keys())))

In [None]:
from datetime import datetime, timedelta

# Start and end dates
start_date = datetime.strptime("20200101", "%Y%m%d")
end_date = datetime.strptime("20200110", "%Y%m%d")
# Generate dates for the whole year with a daily timestep
current_date = start_date
date = ""
while current_date <= end_date:
    # Update the "date" field in the request dictionary
    # polytope_request["date"] = current_date.strftime("%Y%m%d")
    # data = earthkit.data.from_source("polytope", "destination-earth", polytope_request, address=polytope_url, stream=False)
    # # Process the request (e.g., print it, send it to an API, etc.)
    tmp_date = current_date.strftime("%Y%m%d")
    date = f"{date}/{tmp_date}"
    # print(date)
    # # Move to the next day
    current_date += timedelta(days=1)
date=date[1:]

In [None]:
import yaml
with open("config.yaml", "r") as config_file:
    config = yaml.safe_load(config_file)

polytope_url = config["polytope_url"]
polytope_request = config["polytope_request"]
grid = config['grid']
# polytope_request["date"] = date
polytope_request

In [None]:
import earthkit.data
import earthkit.regrid
URL_DATASET = "polytope.lumi.apps.dte.destination-earth.eu"
data = earthkit.data.from_source("polytope", "destination-earth", polytope_request, address=polytope_url, stream=False)
out_grid = {"grid": [grid['lat'], grid['lon']]}
data_latlon = earthkit.regrid.interpolate(data, out_grid=out_grid, method=grid['method'])
ds = data_latlon.to_xarray()
ds = ds["t2m"]
dataset = ds.sel(latitude=selected_coordinates[0], longitude=selected_coordinates[1], method="nearest")


In [None]:

# dataset = dataset.resample(time="D").mean(dim="time")

In [None]:
import pandas as pd
index = dataset.time
df = pd.DataFrame(data={"time": index,
                        "temperature": dataset.values.flatten()})
df["temperature"] = df["temperature"] - 273
basic_plot(df, city=selected_city, coord=selected_coordinates)

In [None]:
import xarray as xr
# Load the configuration
with open("config.yaml", "r") as config_file:
    config = yaml.safe_load(config_file)

polytope_url = config["polytope_url"]
polytope_request = config["polytope_request"]
grid = config['grid']

# Generate list of dates for N days
N = 30  # Number of days
start_date = pd.to_datetime(polytope_request["date"])  # Assume the start date is provided in the config
dates = [start_date + pd.Timedelta(days=i) for i in range(N)]

# Initialize an empty list to store datasets
datasets = []

for date in dates:
    # Modify the polytope_request for the current date
    polytope_request["date"] = date.strftime("%Y%m%d")

    # Query the data
    data = earthkit.data.from_source("polytope", "destination-earth", polytope_request, address=polytope_url, stream=False)
    out_grid = {"grid": [grid['lat'], grid['lon']]}
    data_latlon = earthkit.regrid.interpolate(data, out_grid=out_grid, method=grid['method'])

    # Convert to xarray
    ds = data_latlon.to_xarray()
    ds = ds["t2m"]

    # Select the nearest point
    selected_ds = ds.sel(latitude=selected_coordinates[0], longitude=selected_coordinates[1], method="nearest")

    # Add the dataset to the list
    datasets.append(selected_ds)

# Concatenate all datasets along a new dimension 'time'
final_dataset = xr.concat(datasets, dim="time")

In [None]:
index = final_dataset.time
df = pd.DataFrame(data={"time": index,
                        "temperature": final_dataset.values.flatten()})
df["temperature"] = df["temperature"] - 273
basic_plot(df, city=selected_city, coord=selected_coordinates)