Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Byte" object type #23

Closed
chunjiangzhu opened this issue Feb 23, 2019 · 6 comments
Closed

"Byte" object type #23

chunjiangzhu opened this issue Feb 23, 2019 · 6 comments

Comments

@chunjiangzhu
Copy link

Hi, is the object type "Byte" well developed and fully tested in the current code? By the way, I assume the algorithms work for custom distance function, am I right? Thank you!

@chunjiangzhu
Copy link
Author

By the way, I often got the following error in my run: "Error! startEpsilon should be reduced for the specified range.". Can you see any obvious implications from it? Thank you so much!

"got a train set of size (941 * 1024)
got 100 queries
ngt::reconstructGraph: Extract the graph data.
Reconstruction time=0.00072567:0.00114803:0.00380066
original edge size=10
reverse edge size=60
ngt::Graph reconstruction time=0.00719446 (sec)
GraphReconstructor::adjustPaths: graph preparing time=0.000302463 (sec)
GraphReconstructor::adjustPaths extracting removed edge candidates time=0.152378 (sec)
ngt::Path adjustment time=0.182657 (sec)
adjustBaseSearchEdgeSize::Extract queries for GT...
adjustBaseSearchEdgeSize::create GT...
adjustBaseSearchEdgeSize::explore for the mergin 0.2...
adjustBaseSearchEdgeSize::Base edge size=10, distance computation=1.48209
Warning: Cannot adjust the base edge size for mergin 0.2. /home/app/ngt/lib/NGT/Optimizer.h:427: exploreEpsilonForAccuracy:
Error! startEpsilon should be reduced for the specified range.
Try again for the next mergin.
adjustBaseSearchEdgeSize::explore for the mergin 0.25...
adjustBaseSearchEdgeSize::Base edge size=10, distance computation=1.48209
Warning: Cannot adjust the base edge size for mergin 0.25. /home/app/ngt/lib/NGT/Optimizer.h:427: exploreEpsilonForAccuracy:
Error! startEpsilon should be reduced for the specified range.
Try again for the next mergin.
adjustBaseSearchEdgeSize::explore for the mergin 0.3...
adjustBaseSearchEdgeSize::Base edge size=10, distance computation=1.48209
Warning: Cannot adjust the base edge size for mergin 0.3. /home/app/ngt/lib/NGT/Optimizer.h:427: exploreEpsilonForAccuracy:
Error! startEpsilon should be reduced for the specified range.
Try again for the next mergin."

@masajiro
Copy link
Member

Since the “Byte” type is used for our applications, it basically works well. Note that the “Byte” is implemented as an unsigned variable. If you properly implement your distance function, the “Byte” works. According to your logs, you use NGT on ann benchmarks, don’t you? If so, the size of dataset (941 objects) you used is too small, because NGT on ann benchmarks try to optimize search parameters for the specified dataset. However, even though you find errors during the optimization, default search parameters are used for the benchmarks.

@chunjiangzhu
Copy link
Author

chunjiangzhu commented Feb 25, 2019

Thank you so much for the response!
You are right, I am running the ann benchmarking. I noticed that you updated the python driver "onng_ngt.py" under that project.
[https://github.com/erikbern/ann-benchmarks/blob/master/ann_benchmarks/algorithms/onng_ngt.py]
For the fit function, could you please give some descriptions to help me understand the logic? Thank you!

@masajiro
Copy link
Member

For the fit function, could you please give some descriptions to help me understand the logic? Thank you!

The process is the same of the following.
https://github.com/yahoojapan/NGT/blob/master/bin/ngt/README.md#onng
First, create ANNG. Second, convert the ANNG to ONNG.
The conversion consists of

  • Adjust incoming and outgoing edges
  • Reduce shortcut edges
  • Optimize search parameters.

Does that answer your question?

@chunjiangzhu
Copy link
Author

Yes, that makes sense!

What confused me was the log information, e.g. print('ONNG: create ANNG'), print('ONNG: ANNG construction time(sec)=' + str(time.time() - t)). In some runs, the log of the subprocess seems to dominate the stdout, and the log I mentioned above does not appear. Maybe this is because of the subprocess calling. But I believe the code were executed successfully because the self.index was successfully set.

I think we are approaching to close this issue. Thanks!

@masajiro
Copy link
Member

There are messages of the onng_ngt.py in our log below, although they are a little out of order. The difference might be due to the docker or os environment.

Trying to instantiate ann_benchmarks.algorithms.onng_ngt.ONNG(['euclidean', 'Float', 0.1, {'edge': 100, 'outdegree': 10, 'indegree': 120}])
ONNG: edge_size=100
ONNG: outdegree=10
ONNG: indegree=120
ONNG: edge_size_for_search=-2
ONNG: epsilon=0.1
ONNG: metric=euclidean
ONNG: object_type=Float
got a train set of size (1000000 * 128)
got 10000 queries
Processed 100000 objects. time= 199.102 (sec)
Processed 200000 objects. time= 449.442 (sec)
Processed 300000 objects. time= 538.9 (sec)
Processed 400000 objects. time= 613.441 (sec)
Processed 500000 objects. time= 627.649 (sec)
Processed 600000 objects. time= 708.047 (sec)
Processed 700000 objects. time= 696.557 (sec)
Processed 800000 objects. time= 779.332 (sec)
Processed 900000 objects. time= 748.134 (sec)
Processed 1000000 objects. time= 837.672 (sec)
ngt::reconstructGraph: Extract the graph data.
Processed 1000000 objects.
Processed 100000 nodes
Processed 200000 nodes
Processed 300000 nodes
Processed 400000 nodes
Processed 500000 nodes
Processed 600000 nodes
Processed 700000 nodes
Processed 800000 nodes
Processed 900000 nodes
Processed 1000000 nodes
Reconstruction time=0.739276:18.1838:7.54052
original edge size=10
reverse edge size=120
ngt::Graph reconstruction time=27.9895 (sec)
GraphReconstructor::adjustPaths: graph preparing time=0.992566 (sec)
GraphReconstructor::adjustPaths extracting removed edge candidates time=719.449 (sec)
ngt::Path adjustment time=905.001 (sec)
adjustBaseSearchEdgeSize::Extract queries for GT...
adjustBaseSearchEdgeSize::create GT...
adjustBaseSearchEdgeSize::explore for the mergin 0.2...
adjustBaseSearchEdgeSize::Base edge size=10, distance computation=2.25153
adjustBaseSearchEdgeSize::Base edge size=20, distance computation=2.2905
Reconstruct Graph: adjust the base search edge size. 10
ngtpy::Index read only!
ONNG: start indexing...
ONNG: # of data=1000000
ONNG: dimensionality=128
ONNG: index=indexes/ONNG-100-10-120
ONNG: create ANNG
ONNG: ANNG construction time(sec)=6211.593377828598
ONNG: degree adjustment
ONNG: degree adjustment time(sec)=951.6256878376007
ONNG: index already exists! indexes/ONNG-100-10-120
ONNG: open time(sec)=5.4101881980896
ONNG: end of fit
Built index in 7168.630221128464
Index size:  2625880.0
Running query argument group 1 of 8...
ONNG: epsilon=0.6
Run 1/1...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants