the question about the epoch number of data converging #4

xuyifeng-nwpu · 2017-05-13T02:22:41Z

Using the same hyper parameter :22632 the data will converge at about 1850 epoch.

After several times of training using other hyper parameter, however, the data will converge at the number outnumber 3500 using the same hyper parameter:22632.

Two experiments have the same hyper parameter and the same code.
The only difference of the second experiment related to the first one is that we have trained the data several times.

xgastaldi · 2017-05-13T06:40:35Z

Can you give me a bit more information? I'm not sure I understand your exact problem.
What do you exactly mean by converging? And what do you mean by trained the data several times?

xuyifeng-nwpu · 2017-05-13T16:06:34Z

@xgastaldi thank your reply.
the first image : test_top1

The image named "test_top1" showed the test top 1 in two experiments which had the same hyper parameter :22632. I run the code "CUDA_VISIBLE_DEVICES=0 th main.lua -dataset cifar10 -nGPU 1 -batchSize 64 -depth 26 -shareGradInput false -optnet true -nEpochs 1800 -netType shakeshake -lrShape cosine -widenFactor 2 -LR 0.1 -forwardShake true -backwardShake true -shakeImage true" which was copied from your readme.md. Why two same experiments had so much difference.

the second image :loss

The other image named "loss" showed the loss value in two same experiments which had the same hyper parameter and the same code. The blue line decreased more quickly.

First experiment which result indicated by blue line can achieve your result in your paper. Then I run the code by changing several parameter such as depth, width, short cut mode, and so on. Finally, I was surprised to find out that i can not achieve the result of the first experiment. In the last experiment, I changed the parameters back to the first experiment. I had used the code comparison software to compare the code of two experiments.

Will the train data change after several train?

xgastaldi · 2017-05-13T17:04:35Z

The training data does not change.
My guess is that you changed -nEpochs to something different than 1800. If it was 1800, it would have stopped at 1800. If you use more epochs, the learning rate decay schedule is "stretched". What I mean by that is that the learning rate is based on the progress percentage and not the absolute number of epochs:
lr at epoch 400 when -nEpochs 1800 = lr at epoch 800 when -nEpochs 3600

xuyifeng-nwpu · 2017-05-14T00:19:45Z

@xgastaldi Thank you.

epoch, training time

I think you are right. In the second experiment , showed in red line, the nEpochs really setted to 3600. Now I run the code in the nEpochs 1800 and hope the results will be the same as your essay. The whole 1800 epochs training time will take more than 2days in my pc which's graphic card is NVIDIA 980ti. Can you tell me about how long you train your code(epoch 1800,22632） in your pc and what is your graphic card?

the best epoch number

If i set nEpochs to 900, can I achieve the result showed as your paper. My second problem is whether 1800 is the best nEpochs. Is the number of 1800 obtained through multiple experiments?

learning rate

Do the learning rate descend with the epochs? linear or non linear?

xgastaldi · 2017-05-14T04:56:27Z

Training time:
1 TITAN X Pascal 26 2x32d -LR 0.1 -batchSize 64: 51s per epoch

Number of epochs:
Due to the ICLR dealine, I simply had to choose a conservative estimate that would give me the best possible result at 96d. I could not do that based on 32d tests because the higher capacity of 96d models also changes the ideal training time. It could very well be that even at 96d there is no need for so many epochs, but I didn't have the time nor the GPUs to run 3 tests at 900 epochs, 3 tests at 1200 epochs and 3 tests at 1500 epochs.

Learning rate:
The model uses a cosine annealing function as described in https://arxiv.org/abs/1608.03983.
You can find the code at the end of the train.lua.

xgastaldi · 2017-05-15T19:48:24Z

I will close this issue. Feel free to comment if you still have this problem.

xgastaldi closed this as completed May 15, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the question about the epoch number of data converging #4

the question about the epoch number of data converging #4

xuyifeng-nwpu commented May 13, 2017

xgastaldi commented May 13, 2017 •

edited

xuyifeng-nwpu commented May 13, 2017 •

edited

xgastaldi commented May 13, 2017

xuyifeng-nwpu commented May 14, 2017 •

edited

xgastaldi commented May 14, 2017

xgastaldi commented May 15, 2017

the question about the epoch number of data converging #4

the question about the epoch number of data converging #4

Comments

xuyifeng-nwpu commented May 13, 2017

xgastaldi commented May 13, 2017 • edited

xuyifeng-nwpu commented May 13, 2017 • edited

xgastaldi commented May 13, 2017

xuyifeng-nwpu commented May 14, 2017 • edited

epoch, training time

the best epoch number

learning rate

xgastaldi commented May 14, 2017

xgastaldi commented May 15, 2017

xgastaldi commented May 13, 2017 •

edited

xuyifeng-nwpu commented May 13, 2017 •

edited

xuyifeng-nwpu commented May 14, 2017 •

edited