This service provides Streamflow predictions based on the weather for 3 locations in Sweden: Abisko, Spånga and Uppsala. It consists of a Feature Pipeline and a Training Pipeline which runs on Modal as Serverless Functions. The Training Pipeline performs Parallel Hyperparameter tuning for XGBoost. A Batch Inference pipeline has also been implemented to predict and create diagrams. Moreover, our service provides a UI with a map with the future predictions as well as a monitor UI.
- Olivia Höft
- Chrysoula Dikonimaki
The APIs used are the following:
- Hydrological Observations (https://opendata-download-hydroobs.smhi.se/api) from The Swedish Meteorological and Hydrological Institute (SMHI) to get the Streamflow Data.
- Weather Open API (https://open-meteo.com/en/docs) from Open-Meteo for the Weather Data. The parameters used are the following: temperature_2m_max, temperature_2m_min, precipitation_sum, rain_sum,snowfall_sum, precipitation_hours, windspeed_10m_max, windgusts_10m_max, winddirection_10m_dominant, et0_fao_evapotranspiration.
- Finding open APIs. Finding free data that combines historical data and future predictions was challenging. Especially for the weather, many datasets were evaluated.
- Data exploration and the decision of how to use the data.
- Get historical data and store them in Feature Groups in Hopsworks.
- Create a Feature Pipeline to get data daily and store them in the Feature Groups.
- Deploy the Feature Pipeline in Modal.
- Create training pipelines to get the new data from Hopsworks as a Feature View, train a model and save it in the Model Registry in Hopsworks.
- Create a batch inference pipeline to predict the streamflow daily, save the predictions in a Feature Group in Hopsworks and create images for the monitor app.
- A UI was implemented to show a map with the locations and the predictions for these locations the following 7 days.
- A monitor UI was implemented to show graphs and tables for both historical and future predictions.
Two training pipelines were developed.
training-pipeline.py: This pipeline trains a Gradient Boost Regressor on the training set and save it.training-pipeline-2.py: This pipeline performs Parallel Hyperparameter Tuning on an XGBoost Regressor. This has been done by running a separate serverless function for each set of hyperparameters. Then the set of hyperparameters that gives us the best score in the test data is used to train the XGBoost Regressor on the whole dataset.training-pipeline-3.py: XGBoost performed better when we were doing experiments offline, so it was selected and deployed to Modal as a serverless function to run every 7 days on the new data that was generated the latest 7 days.
- Multipe Models were tried: Gradient Boost Regressor and XGBoost Regressor.
- Parallel Hyperparameter Tuning: Parallel Hyperparameter Tuning is used to decide on the hyperparameters of XGBoost
- Multiple days forecast: The Recursive Multi-step Forecast technique is used to predict the streamflow for the next 6 days.
- Training Pipeline that is retrained every 7 days on the latest data
- Tests using pytest: We have refactor the code into functions and tested them using pytest tests.
- Batch Inference Pipeline
- Monitor UI: A monitor UI was implemented to monitor the predictions. The UI provides predictions for historical data for the last days and future predictions.
- Diagrams for a better user experience
The UI is deployed in HuggingFace: https://huggingface.co/spaces/Chrysoula/StreamflowPredictions.
It is developed using a Python framework called Streamlit.

The Monitor UI is deployed in HuggingFace: https://huggingface.co/spaces/Chrysoula/StreamflowMonitor. It is developed using a Python framework called Gradio.
The feature pipeline that runs on Modal every day as a serverless function.
The training pipeline that runs on Modal every 7 days as a serverless function and uses multiple serverless function to perform Hyperparameter tuning.
Parallel Hyperparameter tuning on Modal:
The Batch Inference Pipeline that runs on Modal every day.










