about the loss function #40

guiji0812 · 2019-06-17T07:21:01Z

Hi, when I trained with the CharbonnierLoss , the loss is very very big, but when I trained with L1 loss, it is normal, what caused this phenomenon, could you give me some advice?

LI945 · 2019-06-19T09:20:02Z

when I rained with L1 loss, it was also big.
How many is your L1 loss?

LI945 · 2019-06-19T09:21:35Z

All of three loss is big, what is the problem?

yinnhao · 2019-06-19T11:02:15Z

All of three loss is big, what is the problem?

Same to me. My loss is about "e+4". How about you?

LI945 · 2019-06-19T12:40:34Z

All of three loss is big, what is the problem?

Same to me. My loss is about "e+4". How about you

My loss is about "e+4" too

zzzzwj · 2019-06-20T02:15:03Z

All of three loss is big, what is the problem?

Same to me. My loss is about "e+4". How about you

My loss is about "e+4" too

Cause the loss is reduced by sum, and when you try to divide it by (GT_sizeGT_sizebatch_size) you'll find the loss is just the common case like 1e-2. You can replace the reduce function by 'mean'.

LI945 · 2019-06-20T10:26:54Z

I have another problem, which is that the loss function doesn't go down, does anybody else have this problem？

xinntao · 2019-06-20T16:08:02Z

@zzzzwj has pointed it out. CharbonnierLoss is in the sum mode. For L1/L2 losses, there are also some modes like mean and sum (you can see them in PyTorch doc).
The key during the training is the gradient instead of the loss. So even if with larger losses, the training is OK under proper gradients.
When using different modes, mean or sum, you may need to adjust the learning rate. But the Adam optimizer can automatically adjust it to some extent.

@LI945 During the training, you may observe the loss decreases very slowly. But if you evaluate the checkpoints, the performance (PSNR) actually increases as the training goes.

zzzzwj · 2019-06-21T02:25:53Z

I met the same problem as @LI945 mentioned. When I trained with my own datasets, the loss decreases very slow. When I train with SISR model (for example, EDSR), the psnr increases very fast which can reach almost the best value around 37.0 psnr in 20~30 epochs. However when I train with EDVR, using the raw training code, the psnr increases fast in first 10 epochs reaching ~33.0 psnr, then it's psnr value seems to be stable which means in next 20 epochs, the psnr value just inceases less than 1.0. So have you met the same problem when you train the REDS or Vimeo90K datasets? And can I have your training log? Hope for your reply @xinntao .

xinntao · 2019-06-23T15:41:51Z

@zzzzwj I will upload a training log example tomorrow. Actually, 1) we use a different training scheme with restarts, which improves the performance. 2) We usually measure in iteration rather than epoch.

zzzzwj · 2019-06-24T02:51:53Z

@zzzzwj I will upload a training log example tomorrow. Actually, 1) we use a different training scheme with restarts, which improves the performance. 2) We usually measure in iteration rather than epoch.

Well, thanks a lot.

guiji0812 closed this as completed Jun 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about the loss function #40

about the loss function #40

guiji0812 commented Jun 17, 2019

LI945 commented Jun 19, 2019

LI945 commented Jun 19, 2019

yinnhao commented Jun 19, 2019

LI945 commented Jun 19, 2019

zzzzwj commented Jun 20, 2019

LI945 commented Jun 20, 2019

xinntao commented Jun 20, 2019

zzzzwj commented Jun 21, 2019

xinntao commented Jun 23, 2019

zzzzwj commented Jun 24, 2019

about the loss function #40

about the loss function #40

Comments

guiji0812 commented Jun 17, 2019

LI945 commented Jun 19, 2019

LI945 commented Jun 19, 2019

yinnhao commented Jun 19, 2019

LI945 commented Jun 19, 2019

zzzzwj commented Jun 20, 2019

LI945 commented Jun 20, 2019

xinntao commented Jun 20, 2019

zzzzwj commented Jun 21, 2019

xinntao commented Jun 23, 2019

zzzzwj commented Jun 24, 2019