You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @ycszen thank you for the wonderful codebase. May I ask you several questions?
what is "min_kept" in ProbOhemCrossEntropy2d function? ohem_criterion = ProbOhemCrossEntropy2d(ignore_label=255, thresh=0.7, min_kept=250000, use_weight=False)
What is 16 means in this equation? Is it the batch size? min_kept=int(config.batch_size // len(engine.devices) * config.image_height * config.image_width // 16)
Actually my situation is, I can start training but it will trigger device RuntimeError: cuda runtime error (59) : device-side assert at random epochs. It is similar to issue 10, but mine is random during training. I already checked my labels. They should be correct, with a range of 0 to 18. The only modification I did to your config file is changing the batch size to 12 because I only has 2 GPUs. Could you or anyone help or give me any hints how to debug? Thank you very much.
The text was updated successfully, but these errors were encountered:
@bryanyzhu
1st question:
min_kept means if min_kept > num_valid at least min_kept pixels should use to count the loss. You can read the code for detail.
2ed question:
min_kept is setted to be height//4*width//4 of whole image pixels. For a gpu card, the input pixels are batch_size // gpu_num * height//4 * width//4.
Hi @ycszen thank you for the wonderful codebase. May I ask you several questions?
what is "min_kept" in ProbOhemCrossEntropy2d function?
ohem_criterion = ProbOhemCrossEntropy2d(ignore_label=255, thresh=0.7, min_kept=250000, use_weight=False)
What is 16 means in this equation? Is it the batch size?
min_kept=int(config.batch_size // len(engine.devices) * config.image_height * config.image_width // 16)
Actually my situation is, I can start training but it will trigger
device RuntimeError: cuda runtime error (59) : device-side assert
at random epochs. It is similar to issue 10, but mine is random during training. I already checked my labels. They should be correct, with a range of 0 to 18. The only modification I did to your config file is changing the batch size to 12 because I only has 2 GPUs. Could you or anyone help or give me any hints how to debug? Thank you very much.The text was updated successfully, but these errors were encountered: