Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pretext + kmeans results of ImageNet-50 #61

Open
ZhiyuanDang opened this issue May 12, 2021 · 9 comments
Open

pretext + kmeans results of ImageNet-50 #61

ZhiyuanDang opened this issue May 12, 2021 · 9 comments

Comments

@ZhiyuanDang
Copy link

ZhiyuanDang commented May 12, 2021

Hi, thanks for your excellent work.

I run the K-means algorithm (faiss-gpu based) over L2-normalized pretext features (i.e., MoCo v2 checkpoint). The result on the training dataset is around 66% ACC, but these on test dataset is around 38% ACC (L2 distance of training centroid). However, Table. 4 of the paper reports the test ACC of ImageNet-50 is around 65%.

Have you encountered this problem? What is your K-means implementation scheme?

@wvangansbeke
Copy link
Owner

Hi @ZhiyuanDang,

I noticed that quite a lot of people have troubles with the implementation of KMeans. I might upload a script after the NeurIPS deadline.
For now, did you check the previous issues e.g., issue #49?

@ZhiyuanDang
Copy link
Author

Thanks for your reply. I have checked these corresponding issues #49 before I propose this issue.

It is weird that the training and test k-means results on CIFAR-10/20 are fine, but on the ImageNet subset, these two results have a large gap.

How do you address that?

@wvangansbeke
Copy link
Owner

Hi @ZhiyuanDang,

You should obtain more or less the same accuracy for the train and test set on ImageNet (distributions are very similar). I always report the numbers on the test set, so I'm not really sure what the issue is. I will verify it later. It is indeed weird that you obtain such a large discrepancy between train and test set accuracies, especially since you get the correct results on CIFAR10. I guess there is still something wrong with your evaluation.

In the meantime, can you try on the complete ImageNet dataset or other datasets?

@ZhiyuanDang
Copy link
Author

ZhiyuanDang commented May 13, 2021

For the same evaluation method (from #49 ), CIFAR-10/20 and STL-10 can achieve the same train and test ACC.

However, ImageNet-50 still obtain 65% training ACC and 38% test ACC. I will report the results of ImageNet-100/200 later.

--
ImageNet-100 report 59% training ACC and 37% test ACC.

--
ImageNet-200 report 52% training ACC and 37% test ACC.

Besides that, another subset of ImageNet, ImageNet-10 reports 96% training ACC and 62% test ACC.

@wvangansbeke
Copy link
Owner

Hi @ZhiyuanDang,

FYI, I double checked and get the same results as in the paper.

@ZhiyuanDang
Copy link
Author

Hi @wvangansbeke ,

Thanks for your reply,

Could you please release the k-mean code for reference?

@Li-Hyn
Copy link

Li-Hyn commented Jun 15, 2022

Hi @wvangansbeke ,

Thanks for your reply,

Could you please release the k-mean code for reference?

@wvangansbeke @ZhiyuanDang

I'm sorry to bother you, but has there been any progress on this work after a year?

I'm also reproducing the pretext+k-means part, and I can't reproduce the accuracy in the paper at the moment, can you please explain it in details?

Could you please release the k-mean code for reference?

@wvangansbeke
Copy link
Owner

Make sure that you L2 normalize the features and that you report the results on the test set (use the train set to fit the kmeans). It doesn't even matter which implementation or which distance metric you use. It's all pretty close. Also have a look at our repo on Unsupervised Semantic Segmentation and check out the kmeans clustering part. It's more or less the same for classification. Honestly, the implementation is quite straightforward here. I have seen papers that were able to reproduce it. I will release it if I find the time.

@Li-Hyn
Copy link

Li-Hyn commented Jun 15, 2022

Make sure that you L2 normalize the features and that you report the results on the test set (use the train set to fit the kmeans). It doesn't even matter which implementation or which distance metric you use. It's all pretty close. Also have a look at our repo on Unsupervised Semantic Segmentation and check out the kmeans clustering part. It's more or less the same for classification. Honestly, the implementation is quite straightforward here. I have seen papers that were able to reproduce it. I will release it if I find the time.

Thanks for your kind reply, I'll try the methods you mentioned!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants