-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training Loss is Puzzling #11
Comments
|
hi, did you continue training? What does the final loss curve look like? Also does the depth in the image tab seem improving? My training loss also fluctuates a lot. |
Have you ever try to use larger batchsize? |
I use 8 gpus to train the model. It seems your training loss is converging, try using larger smoothing weight in tensorboard. |
Do you have the final metrics of |
Sorry I have lost the final metrics information. @kwea123 |
@kwea123 Have you try bigger batchsize? What's the difference with the training loss of batchsize=1? |
The model consumes too much memory so I cannot try bigger batchsize (batchsize=1 requires ~8GB). You will need multiple gpus. |
@QTODD Hi, have you solve the problem that the training loss is not plausible? |
@xy-guo Do you use 8 GPUS to test? |
I tried, even 8GPUs with batchsize 4 results still in fluctuating losses. I think it's insolvable then; although it doesn't worsen the model performance.. |
I have re-trained the network using the same cofiguration, but the training loss don't converge.
The text was updated successfully, but these errors were encountered: