Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recall stays 0.0001 #2

Closed
GertjanBrouwer opened this issue Aug 6, 2019 · 2 comments
Closed

Recall stays 0.0001 #2

GertjanBrouwer opened this issue Aug 6, 2019 · 2 comments

Comments

@GertjanBrouwer
Copy link

GertjanBrouwer commented Aug 6, 2019

I am trying to generate a graph using efanna_graph. I have a dataset of 4000 images. I have calculated and wrote all SIFT descriptors to a .fvecs file and used that to generate the graph. Unfortunately efanna_graph recall never went above 0.0001. I believe it is an issue with the way I write descriptors to .fvecs.

I have tried to write .fvecs multiple ways. The code I am using now is this: write_to_fvecs . As you can see after I have calculated descriptors for each image and concatenated these descriptors in a single array I write them to .fvecs using:
vectorArray.astype(np.int32).tofile('./my_sift_descriptors.fvecs')
As you can see I use np.int32 which seems wrong to me. The reason for using np.int32 is as follows.

First I tried writing to file like this:
vectorArray.astype().tofile('./vanbeeklederwaren_astype_int32.fvecs')
But when I start efanna_graph test_nndescent I get this message: "data dimension: 1124073472
Floating point exception (core dumped)".

Then I tried writing to file like this(which seems to me is the correct way to this):
vectorArray.astype(np.float32).tofile('./vanbeeklederwaren_astype_int32.fvecs')
But again when running efanna_graph I get this message: "data dimension: 1124073472
Floating point exception (core dumped)".

Then I used this snippet: read_fvecs which you can use to read fvecs files in python. I used this snippet to read the first 4 bytes of 4 different files. The first:
The fvecs file provided by TexMex showed the first 4 bytes to be of type of float32 and the value was 1.8e-43.

The file saved without specifying a type was also of type float32 but displayed 128.0 when printed.

The file saved as float32 also was of type float32 and also displayed 128.0 when printed.

The file saved as int32 also was of type but displayed 1.8e-43 when printed.

I assumed the last file should be correct, thus I continued and calculated all my descriptors, saved them to .fvecs and started efanna_graph. However the training did no go as expected and the recall never went above 0.0001. The parameters I used: 200 200 20 10 100.

I can't seem to find a solution. Can you please provide your snippet on how you compute SIFT descriptors and save these to .fvecs file?

Thank you.

@fc731097343
Copy link
Member

The way we read the file is to read the first number (the dimension 128 in your case) of each vector as an "unsigned int" in C++. Then we read the vector as a sequence of float in C++ (float32 in Python). Please see any test*.cpp in the test dir for example.

I think you should write these two parts separately.
Specifically, you should write an int32 then write 128 float32 for each row of the matrix. Hope it can solve your problem.

@GertjanBrouwer
Copy link
Author

GertjanBrouwer commented Aug 7, 2019

Thank you very much, i misread the documentation and saved every byte as int32. For anyone else with this problem. I use this to save a single image descriptors:
for descriptor in descriptors:
dimension_array = array('i', [128])
dimension_array.tofile(output_file)
float_array = array('f', descriptor)
float_array.tofile(output_file)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants