Cast-R1 is a novel framework that reformulates Time Series Forecasting as an evidence-driven sequential decision-making process. It combines memory-based state management with tool-augmented reasoning and a two-stage learning strategy (supervised fine-tuning + multi-turn reinforcement learning) to enable adaptive evidence gathering, reasoning-based prediction, and iterative forecast refinement.
📝 "Cast-R1: Learning Tool-Augmented Sequential Decision Policies for Time Series Forecasting"
Preprint | 📄 Paper
Existing time series forecasting methods often adopt a model-centric, single-pass formulation that treats prediction as a fixed mapping from historical observations to future values. Cast-R1 introduces an agentic paradigm:
- Memory-based State Management: Maintains decision-relevant context across forecasting steps, enabling accumulation of contextual evidence for long-horizon reasoning.
- Tool-Augmented Workflow: Relies on a modular Forecasting Toolkit—data quality assessment, global statistical characterization, structural and dynamic analysis, event-level summarization, residual diagnostics, and forecasting model invocation—for context-aware predictive analysis.
- Two-Stage Learning: Supervised fine-tuning initializes basic forecasting competence and stable tool-usage behaviors; multi-turn reinforcement learning (GRPO) optimizes long-horizon decision policies under curriculum learning.
- Sequential Decision Formulation: Forecasting as a sequence of interdependent decisions (feature preparation → model selection → reasoning-based prediction → iterative refinement) rather than single-pass inference.
- Forecasting Toolkit: Data quality assessment, statistical characterization, structural analysis, event summarization, residual diagnostics, and adaptive model invocation (ARIMA, PatchTST, iTransformer, Chronos-2, etc.).
- Memory-based State Abstraction: Dynamic state management that retains critical historical information across decision steps, enabling effective credit assignment.
- Self-Reflective Refinement: Agent validates intermediate forecasts against historical constraints and refines unreasonable artifacts (e.g., impossible spikes, negative prices).
- Strong Benchmark Performance: Consistent improvements over statistical, deep learning, foundation, and LLM-based baselines on ETT, Wind, and electricity price forecasting (NP, PJM, BE, FR, DE).
git clone https://github.com/Xiaoyu-Tao/Cast-R1-TS
cd Cast-R1-TSconda create -n cast-r1 python=3.10
conda activate cast-r1
pip install -r requirements.txtDownload the Chronos-2 model and Train PatchTST and iTransformer using the repository:TSLib
After training, place the checkpoints into:
recipe/time_series_forecast/models/
# run model_server
sh recipe/time_series_forecast/start_model_server.shExample structure:
recipe/time_series_forecast/models/
├── chronos-2/
├── PatchTST/
│ └── checkpoint.pth
└── iTransformer/
└── checkpoint.pthCast-R1 is evaluated on diverse real-world benchmarks:
Place datasets in the dataset directory:
mkdir -p dataset
# Download datasets to ./dataset/# Example: run long-term forecasting on ETT
sh scripts/run_ett.sh
# Example: run short-term electricity price forecasting
sh examples/time_series_forecast/run_qwen3-1.7B.shMain Results (MSE / MAE):
Cast-R1 achieves the lowest MSE on all evaluated datasets and ranks first or second on most MAE metrics, outperforming statistical (ARIMA, Prophet), deep learning (PatchTST, iTransformer, ConvTimeNet, TimeXer, DLinear), foundation (Chronos-2, TimesFM), and LLM-based (OFA, TimeLLM, TimeReasoner) baselines.
Ablation Studies:
- Toolkit Components: Removing feature extraction or model prediction degrades performance; model prediction has the largest impact on complex datasets.
- Memory-based State: Removing dynamic memory consistently increases forecasting error across all benchmarks.
- Training Strategy: Both supervised fine-tuning and reinforcement learning are essential; removing either stage leads to suboptimal results.
- Curriculum Learning: Exclusion of curriculum RL results in higher errors, especially on volatile benchmarks like PJM.
Cast-R1-TS/
├── README.md
├── requirements.txt
├── dataset/ # Place benchmark data here
├── assets/ # Figures for README (main.png, ablation plots, etc.)
└── scripts/ # Run scripts for each benchmark
This project is licensed under the MIT License — see the LICENSE file for details.
@article{tao2026cast,
title={Cast-R1: Learning Tool-Augmented Sequential Decision Policies for Time Series Forecasting},
author={Tao, Xiaoyu and Cheng, Mingyue and Jiang, Chuang and Gao, Tian and Zhang, Huanjian and Liu, Yaguo},
journal={arXiv preprint arXiv:2602.13802},
year={2026}
}



