over cluster of GCN-V #69

zhaoxin111 · 2020-10-21T06:08:26Z

Thank you very much for sharing such a great work, the code is also very nice！
I try run GCN-V on my own dataset, I found that the clustering effect of the model on small-scale data is very good, but on large-scale data, many IDs are split, resulting in many duplicate IDs belonging to the same person.
I have tried to adjust the k，tau_0 or tau，but the results are similar.

PS: I used my self-trained feature extrator which was trained on cleaned MS1M, and the GCN-V network was retrained from scratch.

yjhuasheng · 2020-11-12T03:55:38Z

I have encountered this problem, I also try to extract 512 dimensional feature and adjust the value of KNN to train gcn_v, but the result is not very good.

jquesadap · 2020-11-16T17:46:52Z

I am also encountering this problem. Using the default configurations for k and tau, each identity is split into several clusters. Any help is appreciated.

yl-1993 · 2020-11-17T04:25:11Z

Hi @zhaoxin111 @yjhuasheng @jquesadap , thanks for the discussion. I think solving the mentioned problem is the key to increase the recall of the clustering results. We may need to develop better algorithm to pinpoint robust linkage or use dynamic/learnable threshold for different structures on the entire graph.

If you are tackling a practical problem (not hunting for novelty), here is a few simple suggestions:

You may want to try dsgcn, as it is more flexible to adjust the balance between recall and precision.
You may refer to some paper about interactive clustering, which may provide better capability for splitting and merging.
You may apply some post-processing methods based on the clustering results, e.g., searching for nearby clusters and using a criteria to determine whether to merge .
You may ensemble different clustering results. For example, (1) you can design you own confidence and ensemble several confidence in GCN-V. (2) You can merge the results from different methods, such as CDP, GCN-D/GCN-S, K-means, and apply some voting schemes, e.g., majority vote.

Feel free to share your ideas :)

Youskrpig · 2020-11-23T07:48:14Z

Hello, I run the test experiments with provided pretrained_gcn_v_fashion.pth. but it seems the clusters is not lined to the paper result. Here is the result:

yl-1993 · 2020-12-02T03:40:33Z

@Youskrpig Hi, thanks for trying. I suspect the reason may lie in the version of PyTorch, as the result of GCN_V in your test, F_P=33.59 and F_P=59.41, is close to our reported results, where F_P=33.07 and F_B=57.26. The number of clusters have large difference, 6913 vs. 4998, but as it is affected greatly by small clusters, the number may look similar if discarding clusters with small clusters, e.g., clusters containing only one instance.

For your information, we used torch==0.5.0a0+e31ab99 for experiments. Feel free to report detailed settings or more information, if you want to reproduce the results exactly.

Youskrpig · 2020-12-04T09:20:22Z

Thanks for your reply. I have other two questions:

The training epoch in your experiments(unlabeled clustering on MS-Celeb-1M but different size(584k, 1.74M, 2.89M) remains the same? or increase ? because i set labels of unlabeled data to -1, and loss of the whole training process is ignored.
Have you tried your train data and test is not totally from the same dataset, such as i choose dataset (include train_part of MS-Celeb-1M and deep Fashion dataset) for training and test_part of MS-Celeb-1M for testing ? or train dataset and test dataset have no intersection，but train dataset is big enough.
Looking forward for your reply
Best wishes

yjhuasheng · 2020-12-04T09:35:32Z

Thanks for your reply. I have other two questions:

The training epoch in your experiments(unlabeled clustering on MS-Celeb-1M but different size(584k, 1.74M, 2.89M) remains the same? or increase ? because i set labels of unlabeled data to -1, and loss of the whole training process is ignored.

Have you tried your train data and test is not totally from the same dataset, such as i choose dataset (include train_part of MS-Celeb-1M and deep Fashion dataset) for training and test_part of MS-Celeb-1M for testing ? or train dataset and test dataset have no intersection，but train dataset is big enough.
Looking forward for your reply
Best wishes

hi，请问你用这个方法训练过自己的人脸数据吗，我训练的是自己的人脸数据，聚类的fscore值和聚类效果都比较差，我也调整过knn的k值和阈值，不管怎么调效果都不是很明显，我也用cdp，lgcn这些方法试过，效果都不是很好？请问你自己数据的训练效果怎么样呢？

Youskrpig · 2020-12-06T07:58:33Z

Thanks for your reply. I have other two questions:

The training epoch in your experiments(unlabeled clustering on MS-Celeb-1M but different size(584k, 1.74M, 2.89M) remains the same? or increase ? because i set labels of unlabeled data to -1, and loss of the whole training process is ignored.

Have you tried your train data and test is not totally from the same dataset, such as i choose dataset (include train_part of MS-Celeb-1M and deep Fashion dataset) for training and test_part of MS-Celeb-1M for testing ? or train dataset and test dataset have no intersection，but train dataset is big enough.
Looking forward for your reply
Best wishes

hi，请问你用这个方法训练过自己的人脸数据吗，我训练的是自己的人脸数据，聚类的fscore值和聚类效果都比较差，我也调整过knn的k值和阈值，不管怎么调效果都不是很明显，我也用cdp，lgcn这些方法试过，效果都不是很好？请问你自己数据的训练效果怎么样呢？

昂，效果差的不多不是很明显，有些参数还是可以再细调调，不过有个问题没弄明白，数据大的情况下，怎么用多gpu来训练啊？希望大佬指教一下，十分感谢！

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

over cluster of GCN-V #69

over cluster of GCN-V #69

zhaoxin111 commented Oct 21, 2020

yjhuasheng commented Nov 12, 2020

jquesadap commented Nov 16, 2020

yl-1993 commented Nov 17, 2020

Youskrpig commented Nov 23, 2020 •

edited

Loading

yl-1993 commented Dec 2, 2020

Youskrpig commented Dec 4, 2020

yjhuasheng commented Dec 4, 2020

Youskrpig commented Dec 6, 2020

over cluster of GCN-V #69

over cluster of GCN-V #69

Comments

zhaoxin111 commented Oct 21, 2020

yjhuasheng commented Nov 12, 2020

jquesadap commented Nov 16, 2020

yl-1993 commented Nov 17, 2020

Youskrpig commented Nov 23, 2020 • edited Loading

yl-1993 commented Dec 2, 2020

Youskrpig commented Dec 4, 2020

yjhuasheng commented Dec 4, 2020

Youskrpig commented Dec 6, 2020

Youskrpig commented Nov 23, 2020 •

edited

Loading