Ran Single Dataset with UniST, got weird results. #14

KL4805 · 2024-07-23T07:06:12Z

Hello Yuan,

This is Yilun Jin (HKUST). Thanks very much for your insightful work and sharing the code!

I was trying to have a glimpse at how your code should work, so I did a simple experiment, which is training (pretrain.sh) on one dataset, and zero-shot evaluation on the same dataset (zero_shot.sh), which (I suppose) should be equivalent to ordinary, dataset-specific spatio-temporal forecasting.

What I did was:

Run the pretrain.sh with the following line:
python main.py --device_id 0 --machine machine --dataset BikeNYC --task short --size middle --mask_strategy_random 'batch' --lr 3e-4 --used_data 'diverse' --prompt_ST 0 --few_ratio 1.0
and the model is saved at experiments/Pretrain_Dataset_BikeNYC_Task_short_FewRatio_1.0/model_save/model_best.
Run the zero_shot.sh with the following line.
python main.py --device_id 0 --machine machine --task short --size middle --prompt_ST 1 --pred_len 6 --his_len 6 --num_memory_spatial 512 --num_memory_temporal 512 --prompt_content 's_p_c' --dataset BikeNYC --used_data 'diverse' --file_load_path experiments/Pretrain_Dataset_BikeNYC_Task_short_FewRatio_0.5/model_save/model_best --few_ratio 0.0
and the evaluation results (I suppose) should be in src/experiments/Test_Dataset_BikeNYC_Task_short_FewRatio_0.0/result.txt.

However, the results I get was

stage:0, epoch:0, best rmse: 0.0052734476131455635
{'BikeNYC': {'temporal': {0.5: {'rmse': 11.118299285272428, 'mae': 5.709491623269124}}}}

which was even below HA.

I think I might be doing something wrong here, so I just list everything I did, and maybe you can help to see what I am doing wrong.

Best,
Yilun

The text was updated successfully, but these errors were encountered:

YuanYuan98 · 2024-07-23T09:09:55Z

Hi, Yilun,

Thank you for your question and for trying out our code!

The issue you're encountering is related to the initialization and training of certain modules in the model. Specifically, during the pretraining stage, we primarily focus on training the core model. However, the prompting stage introduces new components, such as memory pools, which are crucial for effective performance. These components are not optimized during the pretraining stage and are randomly initialized.

Therefore, if you run the zero-shot evaluation script immediately after the pretraining stage without going through the prompt-tuning stage, the results are not correct because these newly introduced modules haven't been trained yet.

The correct pipeline for zero-shot inference is as follows:

Pretrain the model using multiple datasets.
Run the prompt-tuning script with these same datasets to train the new modules and optimize the model parameters.
Finally, execute the zero-shot inference script using the fully optimized model parameters.

By following this pipeline, you will ensure that all components of the model are properly trained and optimized.

KL4805 · 2024-07-23T09:13:04Z

Hi Yuan,

Thanks for your reply!

Additional question, if I want to train on a dataset (say, BikeNYC), and zero-shot evaluate on another different dataset (say, 'TrafficCD'), do I need to run the prompt-tuning script or not? If yes, I need to run prompt-tuning on BikeNYC, right?

YuanYuan98 · 2024-07-23T09:19:58Z

You have two options:

1. Evaluate the Base Model without Prompting:

You can remove the prompting design by modifying the is_prompt hyperparameter. This allows you to evaluate the transfer capability of the base model directly. In this case, you do not need to run the prompt-tuning script.

2. Retain the Prompting Design:

You need to run the prompt-tuning script on the source dataset. This step is necessary to optimize the additional model parameters, such as the memory pools, before performing zero-shot evaluation on the target dataset.

KL4805 · 2024-07-23T09:32:03Z

I will try both. Thanks.

KL4805 · 2024-07-24T10:35:13Z

Thanks, I succeeded in getting reasonable results.

KL4805 mentioned this issue Jul 23, 2024

复现问题 #9

Open

KL4805 closed this as completed Jul 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ran Single Dataset with UniST, got weird results. #14

Ran Single Dataset with UniST, got weird results. #14

KL4805 commented Jul 23, 2024

YuanYuan98 commented Jul 23, 2024

KL4805 commented Jul 23, 2024 •

edited

Loading

YuanYuan98 commented Jul 23, 2024

KL4805 commented Jul 23, 2024 via email •

edited

Loading

KL4805 commented Jul 24, 2024

Ran Single Dataset with UniST, got weird results. #14

Ran Single Dataset with UniST, got weird results. #14

Comments

KL4805 commented Jul 23, 2024

YuanYuan98 commented Jul 23, 2024

KL4805 commented Jul 23, 2024 • edited Loading

YuanYuan98 commented Jul 23, 2024

KL4805 commented Jul 23, 2024 via email • edited Loading

KL4805 commented Jul 24, 2024

KL4805 commented Jul 23, 2024 •

edited

Loading

KL4805 commented Jul 23, 2024 via email •

edited

Loading