Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

??? Performance when trained sfm-120k #10

Closed
songcheolhwan opened this issue Dec 15, 2021 · 3 comments
Closed

??? Performance when trained sfm-120k #10

songcheolhwan opened this issue Dec 15, 2021 · 3 comments

Comments

@songcheolhwan
Copy link

songcheolhwan commented Dec 15, 2021

Hi~
Are there any experimental results using the sfm-120k train set?
There are signs in your code that you want to use sfm-120k as the train set.
(https://github.com/tonyngjichun/SOLAR/blob/master/solar_global/examples/train.py#L40)

In my purely personal opinion, I'm noticing that many studies in this field have different train sets and same evaluation sets. I don't think this is fair.
I think both should be evaluated comparatively in the same case.
So, I'd like to know for a fair comparison.
By any chance, do you have any experimental results?
thanks~
@tonyngjichun

@songcheolhwan songcheolhwan changed the title ??? Performance when learning sfm-120k ??? Performance when trained sfm-120k Dec 15, 2021
@tonyngjichun
Copy link
Owner

Hi,

Yes, I have configured the code of training on GLDv1 to be compatible with the original training script from GeM (https://github.com/filipradenovic/cnnimageretrieval-pytorch) so if you download the SfM120k dataset and the corresponding pickle files you should be able to train on it directly.

I totally agree with the inconsistencies of training sets in this field, which is a well-known problem for years. Therefore, in the paper, I did compare with GeM trained on GLDv1 as the baseline rather than the original results on SfM120k. However, I see a trend of recent image retrieval papers converging to using GLDv2-clean as the default training set, which is a good sign for the community as both SfM120k and GLDv1 suffer from quite a lot of mislabelling and noisy data,

Unfortunately, GLDv2 was not made public yet at the time of this work's submission and there is no plan to re-train or improve SOLAR models. I would highly encourage you to use the new GLDv2-clean set rather than SfM120k, which has frankly become sort of a relic of the past as the field is still progressing at an exponential rate.

@songcheolhwan
Copy link
Author

songcheolhwan commented Dec 16, 2021

Thank you for answer.

In a situation where various algorithms exist, if I have to reproudce them, the google landmark series dataset has too much data, so it takes a long time to learn.
So, I am going with sfm-120k (Filip's code is used by many people as the default code.) as the default, and if I have time, I am considering the gdv2 clean version.

thank you.

@tonyngjichun
Copy link
Owner

I completely understand your concern as I've been in the same situation as well, GLD simply takes too long to train without industry-grade resources, so it's impossible to replicate all baselines with the same data. We could only hope that from now on all subsequent work would be converging to the same data (whether it's GLDv2 or not, only time could tell).

Good luck with your work, looking forward to seeing it soon!

Closing the issue for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants