Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ran Single Dataset with UniST, got weird results. #14

Closed
KL4805 opened this issue Jul 23, 2024 · 5 comments
Closed

Ran Single Dataset with UniST, got weird results. #14

KL4805 opened this issue Jul 23, 2024 · 5 comments

Comments

@KL4805
Copy link
Contributor

KL4805 commented Jul 23, 2024

Hello Yuan,

This is Yilun Jin (HKUST). Thanks very much for your insightful work and sharing the code!

I was trying to have a glimpse at how your code should work, so I did a simple experiment, which is training (pretrain.sh) on one dataset, and zero-shot evaluation on the same dataset (zero_shot.sh), which (I suppose) should be equivalent to ordinary, dataset-specific spatio-temporal forecasting.

What I did was:

  1. Run the pretrain.sh with the following line:
    python main.py --device_id 0 --machine machine --dataset BikeNYC --task short --size middle --mask_strategy_random 'batch' --lr 3e-4 --used_data 'diverse' --prompt_ST 0 --few_ratio 1.0
    and the model is saved at experiments/Pretrain_Dataset_BikeNYC_Task_short_FewRatio_1.0/model_save/model_best.

  2. Run the zero_shot.sh with the following line.
    python main.py --device_id 0 --machine machine --task short --size middle --prompt_ST 1 --pred_len 6 --his_len 6 --num_memory_spatial 512 --num_memory_temporal 512 --prompt_content 's_p_c' --dataset BikeNYC --used_data 'diverse' --file_load_path experiments/Pretrain_Dataset_BikeNYC_Task_short_FewRatio_0.5/model_save/model_best --few_ratio 0.0
    and the evaluation results (I suppose) should be in src/experiments/Test_Dataset_BikeNYC_Task_short_FewRatio_0.0/result.txt.

However, the results I get was

stage:0, epoch:0, best rmse: 0.0052734476131455635
{'BikeNYC': {'temporal': {0.5: {'rmse': 11.118299285272428, 'mae': 5.709491623269124}}}}

which was even below HA.

I think I might be doing something wrong here, so I just list everything I did, and maybe you can help to see what I am doing wrong.

Best,
Yilun

@KL4805 KL4805 mentioned this issue Jul 23, 2024
@YuanYuan98
Copy link
Collaborator

Hi, Yilun,

Thank you for your question and for trying out our code!

The issue you're encountering is related to the initialization and training of certain modules in the model. Specifically, during the pretraining stage, we primarily focus on training the core model. However, the prompting stage introduces new components, such as memory pools, which are crucial for effective performance. These components are not optimized during the pretraining stage and are randomly initialized.

Therefore, if you run the zero-shot evaluation script immediately after the pretraining stage without going through the prompt-tuning stage, the results are not correct because these newly introduced modules haven't been trained yet.

The correct pipeline for zero-shot inference is as follows:

  1. Pretrain the model using multiple datasets.

  2. Run the prompt-tuning script with these same datasets to train the new modules and optimize the model parameters.

  3. Finally, execute the zero-shot inference script using the fully optimized model parameters.

By following this pipeline, you will ensure that all components of the model are properly trained and optimized.

@KL4805
Copy link
Contributor Author

KL4805 commented Jul 23, 2024

Hi Yuan,

Thanks for your reply!

Additional question, if I want to train on a dataset (say, BikeNYC), and zero-shot evaluate on another different dataset (say, 'TrafficCD'), do I need to run the prompt-tuning script or not? If yes, I need to run prompt-tuning on BikeNYC, right?

@YuanYuan98
Copy link
Collaborator

You have two options:

1. Evaluate the Base Model without Prompting:

You can remove the prompting design by modifying the is_prompt hyperparameter. This allows you to evaluate the transfer capability of the base model directly. In this case, you do not need to run the prompt-tuning script.

2. Retain the Prompting Design:

You need to run the prompt-tuning script on the source dataset. This step is necessary to optimize the additional model parameters, such as the memory pools, before performing zero-shot evaluation on the target dataset.

@KL4805
Copy link
Contributor Author

KL4805 commented Jul 23, 2024 via email

@KL4805
Copy link
Contributor Author

KL4805 commented Jul 24, 2024

Thanks, I succeeded in getting reasonable results.

@KL4805 KL4805 closed this as completed Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants