Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsupervised Saliency Model #1

Closed
yucornetto opened this issue Feb 22, 2021 · 8 comments
Closed

Unsupervised Saliency Model #1

yucornetto opened this issue Feb 22, 2021 · 8 comments

Comments

@yucornetto
Copy link

Hi, thanks so much for the great work, it is really interesting and inspiring. I wonder if you will provide the implementation and/or pretrained weight for the unsupervised saliency model, so that we can also generate saliency masks and try your method on other datasets besides PASCAL VOC?

@wvangansbeke
Copy link
Owner

Hi @yucornetto,

Thank you for the kind words.

First of all, we will update the repository upon acceptance of the paper.
We might add the download link to the unsupervised saliency model in this future update.
It is actually quite straightforward. You first need to make predictions with DeepUSPS on MSRA (or a similar dataset) with the publicly available code and then train BASNet on the obtained pseudo-labels. We follow this strategy because we noticed that training BASNet with the pseudo-labels from DeepUSPS resulted in masks with higher quality. This bootstrapping procedure is also explained on page 6 in section 4.1 of our paper.

After training, you need to threshold the output at 0.5 to obtain a binary mask. We additionally filter out images for which the area of the salient object is smaller than 1% of the total area. For more complex scenes, you might also apply connected components to get the non-overlapping object masks. However, this was not necessary for PASCAL.
We also discuss a few future directions in the paper to improve the current pixel grouping prior.

@WrongWhp
Copy link

Hi @wvangansbeke @yucornetto, I'm also interested in applying the unsupervised saliency mask contrast to a different dataset.
Thanks for your detailed response @wvangansbeke and for releasing your code so quickly! I could find publicly available code for BASNet but not for DeepUSPS. Could you please link to it? Thanks so much!

@WrongWhp
Copy link

Just realized that the code for DeepUSPS is not linked to on the arxiv version but is present in the neurips version. Link: https://tinyurl.com/wtlhgo3

@wvangansbeke
Copy link
Owner

Hi @WrongWhp,
Thank you for your interest.
Yes indeed, you can find the link to the code of DeepUSPS in their paper. The code for BASNet can be found here.

@yucornetto
Copy link
Author

Hi @WrongWhp,
Thank you for your interest.
Yes indeed, you can find the link to the code of DeepUSPS in their paper. The code for BASNet can be found here.

It is really helpful! Thanks again for your great work and good luck with your paper submission!

@yucornetto
Copy link
Author

yucornetto commented Feb 23, 2021

Hi @wvangansbeke , sorry to bother you again. I just checked the paper and code of DeepUSPS, and found that they seem to build the model based on a cityscapes-pretrained DRN (In sec 4.1, We use the DRN-network (Chen et al., 2018) which is pretrained on CityScapes (Cordts et al., 2016). , and also in their code DeepUSPS.py, Line 264 single_model = DRNSeg(args.arch, 2, None, pretrained=True) ). So it seems that DeepUSPS has used the labeled data of cityscapes. And that may make your method also benefit from labeled cityscapes data?

My understanding is that maybe in unsupervised saliency detection task it is okay to claim "unsupervised" as long as no saliency label is used. But I feel that in SSL area people tend to avoid using any sort of label no matter it is related to the target dataset or not. I also found that in your paper, the backbone is initialized with MoCov2 pre-trained weight instead of ImageNet supervised pre-trained weight, so I assume that you are also trying to avoid using any sort of labeled data. Thus this situation confuses me.

I am not sure if I misunderstand anything, thanks in advance for your time.

@wvangansbeke
Copy link
Owner

Hi @yucornetto,

In our repo we refer to the code of DeepUSPS to obtain the saliency estimator for now.
Indeed, if you don’t want to use any prior knowledge for the saliency model, you should use the ResNet based model without Cityscapes pre-training (see also the DeepUSPS paper for more details: its influence is quite small). Moreover, with different initialization strategies (i.e. simclrv2/swav) and optionally a deeper network for the saliency estimation (Resnet152) you can get similar results as DeepUSPS reports. After performing the bootstrapping procedure for the completely unsupervised case (see section 4.1), you can get 35% mIoU with K-means as reported in the paper. Keep in mind that in contrast to our method, prior art either requires image level tags, bounding boxes or point annotations on PASCAL (or COCO). We don’t. Finally, it depends on how far you want to go of course (unsup. saliency, no sup. ImageNet weights?). In the complete unsupervised scenario, the proposed method works fine as is presented in the paper. We will add the download link to the weights of our bootstrapped saliency model for full transparency.

@yucornetto
Copy link
Author

yucornetto commented Feb 24, 2021

Hi @yucornetto,

In our repo we refer to the code of DeepUSPS to obtain the saliency estimator for now.
Indeed, if you don’t want to use any prior knowledge for the saliency model, you should use the ResNet based model without Cityscapes pre-training (see also the DeepUSPS paper for more details: its influence is quite small). Moreover, with different initialization strategies (i.e. simclrv2/swav) and optionally a deeper network for the saliency estimation (Resnet152) you can get similar results as DeepUSPS reports. After performing the bootstrapping procedure for the completely unsupervised case (see section 4.1), you can get 35% mIoU with K-means as reported in the paper. Keep in mind that in contrast to our method, prior art either requires image level tags, bounding boxes or point annotations on PASCAL (or COCO). We don’t. Finally, it depends on how far you want to go of course (unsup. saliency, no sup. ImageNet weights?). In the complete unsupervised scenario, the proposed method works fine as is presented in the paper. We will add the download link to the weights of our bootstrapped saliency model for full transparency.

I see, thanks again for the explanation and your great work! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants