New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
the question about the epoch number of data converging #4
Comments
Can you give me a bit more information? I'm not sure I understand your exact problem. |
@xgastaldi thank your reply. The image named "test_top1" showed the test top 1 in two experiments which had the same hyper parameter :22632. I run the code "CUDA_VISIBLE_DEVICES=0 th main.lua -dataset cifar10 -nGPU 1 -batchSize 64 -depth 26 -shareGradInput false -optnet true -nEpochs 1800 -netType shakeshake -lrShape cosine -widenFactor 2 -LR 0.1 -forwardShake true -backwardShake true -shakeImage true" which was copied from your readme.md. Why two same experiments had so much difference. the second image :loss The other image named "loss" showed the loss value in two same experiments which had the same hyper parameter and the same code. The blue line decreased more quickly. First experiment which result indicated by blue line can achieve your result in your paper. Then I run the code by changing several parameter such as depth, width, short cut mode, and so on. Finally, I was surprised to find out that i can not achieve the result of the first experiment. In the last experiment, I changed the parameters back to the first experiment. I had used the code comparison software to compare the code of two experiments. Will the train data change after several train? |
The training data does not change. |
@xgastaldi Thank you. epoch, training timeI think you are right. In the second experiment , showed in red line, the nEpochs really setted to 3600. Now I run the code in the nEpochs 1800 and hope the results will be the same as your essay. The whole 1800 epochs training time will take more than 2days in my pc which's graphic card is NVIDIA 980ti. Can you tell me about how long you train your code(epoch 1800,22632) in your pc and what is your graphic card? the best epoch numberIf i set nEpochs to 900, can I achieve the result showed as your paper. My second problem is whether 1800 is the best nEpochs. Is the number of 1800 obtained through multiple experiments? learning rateDo the learning rate descend with the epochs? linear or non linear? |
Training time: Number of epochs: Learning rate: |
I will close this issue. Feel free to comment if you still have this problem. |
Using the same hyper parameter :22632 the data will converge at about 1850 epoch.
After several times of training using other hyper parameter, however, the data will converge at the number outnumber 3500 using the same hyper parameter:22632.
Two experiments have the same hyper parameter and the same code.
The only difference of the second experiment related to the first one is that we have trained the data several times.
The text was updated successfully, but these errors were encountered: