??? Performance when trained sfm-120k #10

songcheolhwan · 2021-12-15T07:17:57Z

Hi~
Are there any experimental results using the sfm-120k train set?
There are signs in your code that you want to use sfm-120k as the train set.
(https://github.com/tonyngjichun/SOLAR/blob/master/solar_global/examples/train.py#L40)

In my purely personal opinion, I'm noticing that many studies in this field have different train sets and same evaluation sets. I don't think this is fair.
I think both should be evaluated comparatively in the same case.
So, I'd like to know for a fair comparison.
By any chance, do you have any experimental results?
thanks~
@tonyngjichun

tonyngjichun · 2021-12-16T03:29:14Z

Hi,

Yes, I have configured the code of training on GLDv1 to be compatible with the original training script from GeM (https://github.com/filipradenovic/cnnimageretrieval-pytorch) so if you download the SfM120k dataset and the corresponding pickle files you should be able to train on it directly.

I totally agree with the inconsistencies of training sets in this field, which is a well-known problem for years. Therefore, in the paper, I did compare with GeM trained on GLDv1 as the baseline rather than the original results on SfM120k. However, I see a trend of recent image retrieval papers converging to using GLDv2-clean as the default training set, which is a good sign for the community as both SfM120k and GLDv1 suffer from quite a lot of mislabelling and noisy data,

Unfortunately, GLDv2 was not made public yet at the time of this work's submission and there is no plan to re-train or improve SOLAR models. I would highly encourage you to use the new GLDv2-clean set rather than SfM120k, which has frankly become sort of a relic of the past as the field is still progressing at an exponential rate.

songcheolhwan · 2021-12-16T04:19:15Z

Thank you for answer.

In a situation where various algorithms exist, if I have to reproudce them, the google landmark series dataset has too much data, so it takes a long time to learn.
So, I am going with sfm-120k (Filip's code is used by many people as the default code.) as the default, and if I have time, I am considering the gdv2 clean version.

thank you.

tonyngjichun · 2021-12-16T09:29:15Z

I completely understand your concern as I've been in the same situation as well, GLD simply takes too long to train without industry-grade resources, so it's impossible to replicate all baselines with the same data. We could only hope that from now on all subsequent work would be converging to the same data (whether it's GLDv2 or not, only time could tell).

Good luck with your work, looking forward to seeing it soon!

Closing the issue for now.

songcheolhwan changed the title ~~??? Performance when learning sfm-120k~~ ??? Performance when trained sfm-120k Dec 15, 2021

tonyngjichun closed this as completed Dec 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

??? Performance when trained sfm-120k #10

??? Performance when trained sfm-120k #10

songcheolhwan commented Dec 15, 2021 •

edited

Loading

tonyngjichun commented Dec 16, 2021

songcheolhwan commented Dec 16, 2021 •

edited

Loading

tonyngjichun commented Dec 16, 2021

??? Performance when trained sfm-120k #10

??? Performance when trained sfm-120k #10

Comments

songcheolhwan commented Dec 15, 2021 • edited Loading

tonyngjichun commented Dec 16, 2021

songcheolhwan commented Dec 16, 2021 • edited Loading

tonyngjichun commented Dec 16, 2021

songcheolhwan commented Dec 15, 2021 •

edited

Loading

songcheolhwan commented Dec 16, 2021 •

edited

Loading