Weird loss progression #10

RaphaelRoyerRivard · 2019-06-20T18:05:23Z

Since I am training the model on VLOG with a very small batch size, the training is going to take forever (8 days). And because I don't want to wait that long, I'll stop the training before 30 epochs. But the losses shown in the logs seem odd to me. Can someone provide me the log of a complete training so I can compare the losses and see if my early results are normal or not? Thanks

Learning Rate	Train Loss	Theta Loss	Theta Skip Loss	
0.000200	-0.002401	0.366067	0.331109	
0.000200	-0.002381	0.369635	0.328924	
0.000200	-0.001740	0.402181	0.374113	
0.000200	-0.001929	0.378956	0.342752

The text was updated successfully, but these errors were encountered:

xiaolonw · 2019-06-24T19:07:09Z

One example log will be in following.

Note that the current code will not give you the exact same loss, but the trend of how the loss is developed will be similar

Learning Rate Train Loss Theta Loss Theta Skip Loss
0.000200 -0.023201 0.223515 0.185768
0.000200 -0.082967 0.149054 0.120956
0.000200 -0.121153 0.138757 0.109839
0.000200 -0.141511 0.132837 0.103349
0.000200 -0.154124 0.130685 0.101065
0.000200 -0.164161 0.126941 0.097509
0.000200 -0.171910 0.124375 0.094423
0.000200 -0.177002 0.123230 0.092237
0.000200 -0.182402 0.120037 0.089529
0.000200 -0.186588 0.118543 0.086799
0.000200 -0.189803 0.116007 0.084808
0.000200 -0.192916 0.114425 0.082736
0.000200 -0.196440 0.112402 0.080228
0.000200 -0.198626 0.111003 0.079104
0.000200 -0.200321 0.109698 0.077720
0.000200 -0.201791 0.108161 0.076239
0.000200 -0.204281 0.105937 0.073543
0.000200 -0.207024 0.104847 0.071410
0.000200 -0.207578 0.102365 0.069629
0.000200 -0.209727 0.101646 0.069230
0.000200 -0.210965 0.100404 0.067125
0.000200 -0.213229 0.097842 0.064572
0.000200 -0.214765 0.096944 0.063795
0.000200 -0.215127 0.095416 0.062738
0.000200 -0.215839 0.094996 0.062121
0.000200 -0.217097 0.093684 0.060339
0.000200 -0.219261 0.092733 0.059287
0.000200 -0.219723 0.091869 0.058745
0.000200 -0.221097 0.091318 0.058428
0.000200 -0.221912 0.090675 0.058063

RaphaelRoyerRivard · 2019-06-25T14:29:16Z

The only things I modified in your code are the YOUR_DATASET_FOLDER to put my path and some other path that was hardcoded.
I ran the following command on VLOG (resized to 256)
python train_cycle_siple.py --checkpoint pytorch_checkpoints/release_model_simple --batchSize 4 --workers 4
but the losses are very different from yours...

Learning Rate	Train Loss	Theta Loss	Theta Skip Loss	
0.000200	-0.002401	0.366067	0.331109	
0.000200	-0.002381	0.369635	0.328924	
0.000200	-0.001740	0.402181	0.374113	
0.000200	-0.001929	0.378956	0.342752	
0.000200	-0.001893	0.402664	0.362544	
0.000200	-0.001851	0.384101	0.343538	
0.000200	-0.001888	0.392817	0.348998	
0.000200	-0.002026	0.373430	0.329414	
0.000200	-0.002127	0.374545	0.322591	
0.000200	-0.002059	0.373383	0.322823	
0.000200	-0.002283	0.347109	0.295166	
0.000200	-0.002365	0.354452	0.294233	
0.000200	-0.002127	0.369732	0.314337	
0.000200	-0.002101	0.369753	0.312066	
0.000200	-0.002192	0.354708	0.296371	
0.000200	-0.002064	0.373753	0.311506	
0.000200	-0.002031	0.386576	0.323555	
0.000200	-0.001990	0.379806	0.317385	
0.000200	-0.001882	0.391573	0.329034	
0.000200	-0.002011	0.374667	0.311523	
0.000200	-0.001822	0.412275	0.347809	
0.000200	-0.001636	0.460999	0.391921	
0.000200	-0.001858	0.373273	0.313632	
0.000200	-0.001881	0.371901	0.308502

The train loss is slightly increasing instead of getting lower like yours and the two other losses are not really changing... Do you have an idea of what is going on?
Thank you

xiaolonw · 2019-06-25T17:19:22Z

very small batch size will work badly for batch norm, you will also need to adjust the learning rate according to the batch size, if you divide the batch size by 8, you should also divide the lr by 8

RaphaelRoyerRivard · 2019-06-25T17:22:33Z

Thank you for your fast answer, I will try that.

RaphaelRoyerRivard closed this as completed Jul 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weird loss progression #10

Weird loss progression #10

RaphaelRoyerRivard commented Jun 20, 2019

xiaolonw commented Jun 24, 2019

RaphaelRoyerRivard commented Jun 25, 2019 •

edited

xiaolonw commented Jun 25, 2019

RaphaelRoyerRivard commented Jun 25, 2019

Weird loss progression #10

Weird loss progression #10

Comments

RaphaelRoyerRivard commented Jun 20, 2019

xiaolonw commented Jun 24, 2019

RaphaelRoyerRivard commented Jun 25, 2019 • edited

xiaolonw commented Jun 25, 2019

RaphaelRoyerRivard commented Jun 25, 2019

RaphaelRoyerRivard commented Jun 25, 2019 •

edited