Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the epoch_IoU of retrained refinement network can only up to 0.35 on deepglobe dataset #18

Closed
DwRolin opened this issue Jun 2, 2022 · 4 comments

Comments

@DwRolin
Copy link

DwRolin commented Jun 2, 2022

I tried to retrain the segmentation backbone and refinement network following the guideline in readme https://github.com/VinAIResearch/MagNet#training-backbone-networks.
The best_mIoU of retrained backbone fpn is 0.6363 , this result is close to the baseline IoU 0.6722 shown in readme.
image
In this sense, the performance of retrained refinement network with retrained backbone should be close to the performance with pretrained backbone.
In the retraining of refinement network, the change of epoch_IoU with pretrained backbone was like following image,
image1
the change of epoch_IoU with retrained backbone was like following image.
image2
With the retrained backbone, the epoch_IoU can only up to 0.35.
I tried to find the difference between pretrained backbone and retrained backbone.
I separated the validate part from backbone/train.py to evaluate the performance of pretrained backbone. https://github.com/DwRolin/temp_code/blob/main/eval_pretrain.py
What's strange is that the MeanIU of pretrained backbone is only 0.07.
I would like to know what causes this contradiction and how to make the retrained refinement network work well.

@DwRolin
Copy link
Author

DwRolin commented Jun 2, 2022

The following is the training log for the backbone fpn.
resnet_fpn_train_612x612_sgd_lr1e-2_wd5e-4_bs_12_epoch484_2022-05-05-16-57_train.log

@hmchuong
Copy link
Collaborator

hmchuong commented Jun 5, 2022

Hi,
Are you working on the DeepGlobe database? Can you check that the CUDNN. ENABLED are the same in both backbone and your refinement training script? and it should be True.

@DwRolin
Copy link
Author

DwRolin commented Jun 6, 2022

Thank you for your serious reply!
I set CUDNN.ENABLED to true, then the issue is resolved.
I cloned this code about six months ago, at that time, the CUDNN.ENABLED was set to false.
And I am curious about why the CUDNN.ENABLE is set to false in hrnet_ocr_w18_train_256x128_sgd_lr1e-2_wd5e-4_bs_12_epoch484.yaml, while it is set to true in resnet_fpn_train_612x612_sgd_lr1e-2_wd5e-4_bs_12_epoch484.yaml.

@hmchuong
Copy link
Collaborator

hmchuong commented Jun 7, 2022

Hi,

The reason is that the Batchnorm2d behavior depends on CUDNN

@hmchuong hmchuong closed this as completed Jun 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants