TIME is a task-centric time series forecasting benchmark comprising various fresh datasets, tailored for zero-shot TSFM evaluation. This codebase provides a full workflow spanning from data preprocessing to model evaluation.
- Official release of our TIME codebase.
- Clean features and ProcessedCSV on HuggingFace.
- Leaderboard Results Updates:
- First release of our arxiv paper and leaderboard.
- We recommend using Conda to manage the environment
conda create -n timebench python=3.11 -y
conda activate timebench
pip install -e .-
Download the dataset from huggingface
-
Define the path for HF datasets in
.env. (Used asstorage_env_varinDataset).
echo "TIME_DATASET=PATH_TO_DATASET" >> .envWe provide the complete codebase and scripts required to reproduce all results from our benchmark.
For each model, use the corresponding script in the scripts/ directory to automatically set up the Conda environment and run evaluations across all tasks.
# Example: Running the evaluation for Chronos2
bash scripts/run_chronos2.sh
# We recommand using nohup to run the scripts in the background
nohup bash scripts/run_chronos2.sh > run_chronos2.txt 2>&1 &
For each task, window-level predictions (quantiles) and metrics will be saved in output/results/{model_name}/{dataset}/{freq}/{term}/.
Once the evaluations are complete, use the following script to aggregate the raw outputs into the overall metrics in leaderboard. This process automatically fetches the Seasonal Naive results from Hugging Face and computes the aggregated metrics across all tasks.
# Compute Overall Leaderboard based on `output/results` (sorted by MASE)
python scripts/compute_local_leaderboard.py
For deeper analysis, including dataset-level breakdowns, pattern-level evaluation and visualizations, you can download and locally run our Leaderboard App.
To add a new model, follow these steps:
-
Implement your model in
experiments/Create a new Python script in the
experiments/directory (e.g.,experiments/your_model.py). You can use existing implementations likeexperiments/chronos2.pyas a reference template.
-
Use the Dataset class
The
Datasetclass is adapted from Gift-Eval and provides a unified interface for loading time series data:from timebench.evaluation.data import Dataset, get_dataset_settings, load_dataset_config # β οΈ Important: Set to_univariate based on your model's capabilities # If your model only supports univariate forecasting: to_univariate = False if Dataset(name=dataset_name, term=term, to_univariate=False).target_dim == 1 else True # If your model supports multivariate forecasting natively: to_univariate = False dataset = Dataset( name=dataset_name, term=term, # "short", "medium", or "long" to_univariate=to_univariate, prediction_length=prediction_length, test_length=test_length, val_length=val_length, )
-
Generate predictions and save results
TIME uses a flexible evaluation interface that doesn't rely on GluonTS. Simply compute quantile predictions (
fc_quantiles) externally and pass them tosave_window_predictions:from timebench.evaluation.saver import save_window_predictions # Generate fc_quantiles with shape: # - (num_total_instances, num_quantiles, prediction_length) for univariate # - (num_total_instances, num_quantiles, num_variates, prediction_length) for multivariate # where num_total_instances = num_series_exp * num_windows save_window_predictions( dataset=dataset, fc_quantiles=fc_quantiles, ds_config=f"{dataset_name}/{freq}/{term}", output_base_dir="output/results", seasonality=season_length, model_hyperparams={"model_name": "your_model"}, )
This function automatically computes per-window metrics and saves predictions, metrics, and configuration files to
output/results/{model_name}/{dataset}/{freq}/{term}/.
-
Create a run script in
scripts/Create a shell script (e.g.,
scripts/run_your_model.sh) to run your model across all tasks. The script should:- Set up the Conda environment with required dependencies
- Call your experiment script for each task
- Include specific hyperparams configuration and ensure reproducibility
Once your evaluation is complete and you are ready to feature on the TIME leaderboard:
- Open a Pull Request to upload your
output/results/{model_name}/folder to the TIME-Output repository on Hugging Face.from huggingface_hub import HfApi api = HfApi() model_name = "YOUR_MODEL_NAME" api.upload_folder( folder_path=f"output/results/{model_name}", # Path to your local results folder path_in_repo=f"results/{model_name}", repo_id="Real-TSF/TIME-Output", repo_type="dataset", commit_message=f"Submit evaluation results for {model_name}", create_pr=True )
- The results will be automatically included in the leaderboard after review
- To ensure reproducibility, we highly recommend contributing your experiment code and execution scripts to this GitHub repository.
Our codebase provides utilities for data preprocessing and computing time series features. For detailed instructions, please refer to the documentation in the docs/ directory:
- Data Preprocessing Guide: Screen,preprocess and clean raw CSV datasets
- Data Format Specification: Convert processed CSV files into the efficient Arrow format
- Time Series Features: Compute TSfeatures from processed csv files
If you want to add a new dataset to TIME:
-
Preprocess your data following the documentation in
docs/:- Generate processed CSV files
- Create Arrow Datasets (hf_dataset)
- Compute time series features
-
Upload processed data to HuggingFace by PR:
- Upload processed CSV files to TIME-ProcessedCSV
- Upload hf_dataset to TIME
- Upload features to TIME-Output
-
Update the configuration:
- Update
src/timebench/config/datasets.yamlon GitHub to include your forecasting tasks - Open a Pull Request with your changes
- Update
-
Review and integration:
After review and approval, we will:
- Add your dataset to TIME
- Evaluate existing models on your new datasets
- Update the leaderboard with new results
The core components of this repository include code adapted from the following excellent projects:
We also extend our sincere gratitude to the authors of the evaluated TSFMs for open-sourcing their work and driving progress in the time series community.
If you find this benchmark useful, please consider citing:
@article{qiao2026s,
title={It's TIME: Towards the Next Generation of Time Series Forecasting Benchmarks},
author={Qiao, Zhongzheng and Pan, Sheng and Wang, Anni and Zhukova, Viktoriya and Liu, Yong and Jiang, Xudong and Wen, Qingsong and Long, Mingsheng and Jin, Ming and Liu, Chenghao},
journal={arXiv preprint arXiv:2602.12147},
year={2026}
}