EarlyStopping's occurence #33

alexzhang0825 · 2021-02-23T20:37:44Z

Hello,

I am currently trying to run your code to see how it works, but every time the code terminates too soon based on EarlyStopping. The result MSE and MAE were also quite off compared with the results shown here. I have had no involvement with almost any programming-related things for a long time so my knowledge is too limited to solve the problem myself. With that being said, I did try to set the EarlyStopping patience to 100 instead, but the code still ended on its own despite saying that EarlyStopping counter is at 3 out of 100. Also, at the start, the code would show that Use GPU: cuda: 0, which made me concerned that if the training was done on CPU at first, but when I checked with the Task Manager the GPU use was at almost 100%, so I believed it was fine, but the fact that the code terminates itself too early every time still makes me wonder if it is using the GPU properly. It would be great if you could provide me some help on this.

In case if any information on specs are needed:
OS: Windows Server 2019 64-bit
Processor: Intel Xeon CPU @ 2.20GHz 2.20GHz
Memory: 30GB
GPU: Nvidia Tesla V100

Thank you in advance. Let me know if there is any additional information you need.

cookieminions · 2021-02-24T03:51:56Z

Hi, please try to set a larger train_epochs(default is 6) such as 20, and then set a larger EarlyStopping patience.
We add args.use_gpu = True if torch.cuda.is_available() else False in code main_informer.py. If the program show 'Use GPU:cuda:0', that means the program is using GPU.

zhouhaoyi · 2021-02-24T04:25:40Z

Hi.
You can upload some error logs such that we can do some basic analysis.

alexzhang0825 · 2021-02-25T02:47:20Z

Hi.
You can upload some error logs such that we can do some basic analysis.

Hello. Thank you for your swift response. I did what cookieminions told me and the code made it to Epoch 20 with EarlyStopping counter at 18 out of 100 before terminating. The result MSE and MAE were 0.45 and 0.50 respectively. However, soon after the code terminated, it ran the test again on its own but only made it to Epoch 20 again and with a EarlyStopping counter at 19 out of 100. The MSE and MAE were similar to those from the previous run. I have tried to find the error log files in the code but could not, and the code itself did not produce any error prompt anywhere during its run, so I was wondering if you could point me a place where I can look for it.

Thank you

cookieminions · 2021-02-25T03:12:49Z

Hi,
The program run again because the default number of repeated experiments itr is set to 2. The earlystopping patience means if the number of times that validation loss did not drop continuously reaches patience, the experiment will stop. But if train_epochs is reached first, the experiment will also stop.

alexzhang0825 · 2021-02-25T04:09:10Z

I see. But is running the code at its default configuration supposed to end up with the same result as the one on the chart here? Also, what could be the cause for the validation loss to not drop continuously?

…

Sent from my iPhone On Feb 24, 2021, at 19:13, Cookie <notifications@github.com> wrote: Hi, The program run again because the default number of repeated experiments itr is set to 2. The earlystopping patience means if the number of times that validation loss did not drop continuously reaches patience, the experiment will stop. But if train_epochs is reached first, the experiment will also stop. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#33 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AHXQM2QSNFBVH3K4IBTLYYDTAW535ANCNFSM4YDFS3VA>.

cookieminions · 2021-02-25T04:20:35Z

If you use the default configuration, you can get a multivariate prediction result with a prediction length of 24, which is in the upper left corner of Figure 5.
The validation loss does not drop, indicating that the model is overfitting, we need to use the parameters of the model saved before the validation loss increases (we have this operation in the code)

alexzhang0825 · 2021-02-25T04:24:08Z

If you use the default configuration, you can get a multivariate prediction result with a prediction length of 24, which is in the upper left corner of Figure 5.
The validation loss does not drop, indicating that the model is overfitting, we need to use the parameters of the model saved before the validation loss increases (we have this operation in the code)

Oh now I understand it. I was looking at figure 4 and confused why it would be so off. Now that I am looking at the correct figure everything seems to be working just fine.
As for the overfitting, I am guessing that's something caused by the code itself as it looks for the most optimal result. Does that mean that the only thing I can do about it is to increase the patience level to prevent early termination?

cookieminions · 2021-02-25T04:36:53Z

Maybe you can reduce the number of d_model and d_ff by using --d_model xxx --d_ff xxx to prevent the model to reach overfitting too quickly. Increasing the patience will make the model continue to train, but eventually the model will load the parameters saved in the minimum validation error. If you do not need earlystopping, you can comment out lines 198-200 if early_stopping.early_stop: print("Early stopping") break and line 205 self.model.load_state_dict(torch.load(best_model_path)) of the code exp/exp_informer.py, and the model will be trained until the train_epochs you set is reached.

zhouhaoyi · 2021-02-26T01:48:15Z

Suppose there is no more discussion. I will close this issue in 12h.

alexzhang0825 · 2021-02-26T01:49:21Z

Suppose there is no more discussion. I will close this issue in 12h.

Sorry that I forgot to check back to this thread. The issue is solved for now and I will close it.

zhouhaoyi assigned cookieminions Feb 24, 2021

alexzhang0825 closed this as completed Feb 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EarlyStopping's occurence #33

EarlyStopping's occurence #33

alexzhang0825 commented Feb 23, 2021

cookieminions commented Feb 24, 2021 •

edited

Loading

zhouhaoyi commented Feb 24, 2021

alexzhang0825 commented Feb 25, 2021

cookieminions commented Feb 25, 2021

alexzhang0825 commented Feb 25, 2021 via email

cookieminions commented Feb 25, 2021

alexzhang0825 commented Feb 25, 2021

cookieminions commented Feb 25, 2021

zhouhaoyi commented Feb 26, 2021

alexzhang0825 commented Feb 26, 2021

EarlyStopping's occurence #33

EarlyStopping's occurence #33

Comments

alexzhang0825 commented Feb 23, 2021

cookieminions commented Feb 24, 2021 • edited Loading

zhouhaoyi commented Feb 24, 2021

alexzhang0825 commented Feb 25, 2021

cookieminions commented Feb 25, 2021

alexzhang0825 commented Feb 25, 2021 via email

cookieminions commented Feb 25, 2021

alexzhang0825 commented Feb 25, 2021

cookieminions commented Feb 25, 2021

zhouhaoyi commented Feb 26, 2021

alexzhang0825 commented Feb 26, 2021

cookieminions commented Feb 24, 2021 •

edited

Loading