VisionTS++: Cross-Modal Time Series Foundation Model with Continual Pre-trained Visual Backbones
Paper Link | Huggingface_model
- 🚩 News (Aug 2025): VisionTSpp preprint has been made available on arXiv. And VisionTSpp-1.0-base model is available in Huggingface, which is continually pre-trained on Large-scale Open Time Series Archive (LOTSA data) based on Masked AutoEncoder (MAE) visual backbone.
- In this paper, we propose a new time series foundation model, which performs continual pre-training on the visual backbones.
- Compared to VisionTS, VisionTS++ is equipped with three key innovations, therefore more effectively supports multivariate and probablistic time series forecasting.
- Clone repository:
git clone https://github.com/HALF111/VisionTSpp.git
cd VisionTSpp- Create virtual environment:
virtualenv venv
. venv/bin/activate- Build from source:
pip install -e '.[notebook]'- Create a
.envfile:
touch .envWe provide the scripts for starting the continual pre-training process on Large-scale Open Time Series Archive (LOTSA data) based on Masked AutoEncoder base (MAE-base) visual backbone.
- You should start with preparing the data for pre-training first, by downloading the Large-scale Open Time Series Archive (LOTSA data).
Assuming you've already created a
.envfile, run the following commands.
huggingface-cli download Salesforce/lotsa_data --repo-type=dataset --local-dir PATH_TO_SAVE
echo "LOTSA_V1_PATH=PATH_TO_SAVE" >> .env- Afterwards, you should download MAE-base model from following links: MAE-base. You can choose to download MAE-large or MAE-huge as well.
You should also write the path where you save the MAE models in the .env file, for example:
echo "VISIONTS_CHECKPOINT_PATH=./project/benchmarks/ckpt" >> .env- Finally, you can simply run the following script to start the continual pre-training (the same as in run.sh).
# base model
python -m cli.train -cp conf/pretrain run_name=VisionTSpp_base model=visionts data=lotsa_v1_weightedYou can also try continual pre-training on MAE-large or MAE-huge:
# large model:
python -m cli.train -cp conf/pretrain run_name=VisionTSpp_large model=visionts_large data=lotsa_v1_weighted
# huge model:
python -m cli.train -cp conf/pretrain run_name=VisionTSpp_huge model=visionts_huge data=lotsa_v1_weightedIf you're using VisionTSpp in your research or applications, please cite it using this BibTeX:
@misc{shen2025visiontscrossmodaltimeseries,
title={VisionTS++: Cross-Modal Time Series Foundation Model with Continual Pre-trained Visual Backbones},
author={Lefei Shen and Mouxiang Chen and Xu Liu and Han Fu and Xiaoxue Ren and Jianling Sun and Zhuo Li and Chenghao Liu},
year={2025},
eprint={2508.04379},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2508.04379},
}We deeply appreciate the following github repos for their valuable code base or datasets:

