New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deeplab V3+ and Xception #78
Comments
Hi @lromor, For deeplab v3+ with xception backbone, the backbone used is not really the same, if you go through the code, you'll see that the checkpoint model we're using from pretrained-models.pytorch is a smaller version than the one deeplab v3+ uses, and the layers not in the checkpoint are initialized using the last layer in the checkpoint. I think this is why you might have some problems when training. I suggest to not use differentiable learning rate (where the backbone is trained with 0.1lr), and use the same learning rate across the whole model to train the backbone too. If you like, I am sure you can find some ported deeplab xception checkpoints, and in this case, you can load the correct weights and train normally (I might do this if I have sometime). |
I see, thank you for you answer. I'm also noticing another strange behavior with resnet, the validation loss tends to go up after 600 iterations but mIOU keeps rising (still in the validation set). Have you also observed this weird behavior? |
Not really. The validation loss might to go up from time to time, but as a general trend, I'd expect it to go down. Maybe try a lower LR. |
I'm having the very same diverging validation loss of #29. I don't know what's the problem. Did you achieve the same result also with a single gpu? I'm starting to think that this parallel batch norm might be faulty... |
@yassouali it seems to me that by default, the validation set performs scaling augmentation.
this might actually enable scale=true. |
@lromor the current config applies a batch validation, and for validation, in base_dataset, the augmentations used in validation are different. We only apply center crops, where the smaller side is automatically rescaled to the crop size. If you want, to get more precise results, you can remove |
You are right. I just noticed that the validation does skip the augmentations. I'm searching for possible reasons. I did update the resnet101 backbone and achieved better results, but just1,2% reaching ~75% on the validation dataset. It's similar to what's happening in the plot you can find in the bottom of this page: One last thing. The original paper uses multi-grid. I'm not sure what does that means exactly, but maybe this implementation lacks of this method leading to lower accuracies. |
I think the multi-grid refers to using different And ofcourse, please make a pull request with your findings and correction to the code if you find any improvements. |
@yassouali , this is a run using the torchvision backbone. The training goes down, the val loss, goes up. The upward trend it's still relatively small, but the interesting thing is that the mIOU keeps improving, as well as the class accuracy. I wonder how's that possible. |
Sorry for the delay, Yeah, the behavior is certainly interesting, but maybe since the backbone is pretrained, we are already close to a local minimum, and the model only needs to search within the vicinity for this minimum for the optimal performances. Which might explain why the loss remains relatively small, affirming that no overfitting is taking place. |
I see. Thanks for your help. In my fork I have an updated version with newer backbones. But, as you can see, they are not really changing the results too much. Regarding the increasing loss: That makes sense. I thought strange for the validation mIoU and loss to have diverging trends. |
Hi!
Great repo.
Could you recommend a configuration file for running an experiment using Deeplab V3+ and Xception
to achieve at some level similar to the results presented in the paper https://arxiv.org/pdf/1802.02611.pdf?
I'm constantly getting very low mIOUs with the following:
PSPNet has an initial mIOU which quickly scales up. In my case, I observe a very low increase (after an epoch is around 0.06).
Any ideas/suggestions? It seems to be a problem of the xception model. With resnet I don't see the issue.
The text was updated successfully, but these errors were encountered: