Training Loss is Puzzling #11

XYZ-qiyh · 2020-01-08T07:20:05Z

I have re-trained the network using the same cofiguration, but the training loss don't converge.

XYZ-qiyh · 2020-01-09T02:59:55Z

Hello @xy-guo . Thanks for your amazing work. I have re-trained the network using the same cofiguration, but the training loss don't converge.
I don't think the training loss is very plausible. Do you have this problem??
Regards.

kwea123 · 2020-02-03T06:51:24Z

hi, did you continue training? What does the final loss curve look like? Also does the depth in the image tab seem improving? My training loss also fluctuates a lot.

zhiwenfan · 2020-02-13T08:41:18Z

Hello @xy-guo . Thanks for your amazing work. I have re-trained the network using the same cofiguration, but the training loss don't converge.
I don't think the training loss is very plausible. Do you have this problem??
Regards.

Have you ever try to use larger batchsize?

xy-guo · 2020-02-13T08:44:56Z

I use 8 gpus to train the model. It seems your training loss is converging, try using larger smoothing weight in tensorboard.

kwea123 · 2020-02-13T12:57:17Z

Do you have the final metrics of abs_depth_error, thres2mm_error, thres4mm_error, thres8mm_error? I want to compare with my results, thank you.

kwea123 · 2020-02-16T04:21:21Z

@xy-guo

xy-guo · 2020-02-16T14:38:02Z

Sorry I have lost the final metrics information. @kwea123

whubaichuan · 2020-02-19T07:54:36Z

@kwea123 Have you try bigger batchsize? What's the difference with the training loss of batchsize=1?

kwea123 · 2020-02-20T01:40:17Z

The model consumes too much memory so I cannot try bigger batchsize (batchsize=1 requires ~8GB). You will need multiple gpus.

whubaichuan · 2020-02-20T03:38:25Z

@QTODD Hi, have you solve the problem that the training loss is not plausible?

whubaichuan · 2020-02-20T03:50:11Z

@xy-guo Do you use 8 GPUS to test?

kwea123 · 2020-02-29T07:39:45Z

I tried, even 8GPUs with batchsize 4 results still in fluctuating losses. I think it's insolvable then; although it doesn't worsen the model performance..

XYZ-qiyh closed this as completed Feb 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training Loss is Puzzling #11

Training Loss is Puzzling #11

XYZ-qiyh commented Jan 8, 2020

XYZ-qiyh commented Jan 9, 2020

kwea123 commented Feb 3, 2020

zhiwenfan commented Feb 13, 2020

xy-guo commented Feb 13, 2020

kwea123 commented Feb 13, 2020

kwea123 commented Feb 16, 2020

xy-guo commented Feb 16, 2020

whubaichuan commented Feb 19, 2020

kwea123 commented Feb 20, 2020

whubaichuan commented Feb 20, 2020

whubaichuan commented Feb 20, 2020

kwea123 commented Feb 29, 2020

Training Loss is Puzzling #11

Training Loss is Puzzling #11

Comments

XYZ-qiyh commented Jan 8, 2020

XYZ-qiyh commented Jan 9, 2020

kwea123 commented Feb 3, 2020

zhiwenfan commented Feb 13, 2020

xy-guo commented Feb 13, 2020

kwea123 commented Feb 13, 2020

kwea123 commented Feb 16, 2020

xy-guo commented Feb 16, 2020

whubaichuan commented Feb 19, 2020

kwea123 commented Feb 20, 2020

whubaichuan commented Feb 20, 2020

whubaichuan commented Feb 20, 2020

kwea123 commented Feb 29, 2020