Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom distances #21

Open
MilesCranmer opened this issue Oct 28, 2018 · 6 comments
Open

Custom distances #21

MilesCranmer opened this issue Oct 28, 2018 · 6 comments

Comments

@MilesCranmer
Copy link

Hi,
I was wondering if you were planning to add support for custom distance_type's in the python API? I have a complicated function in python that I wish to use as a custom distance metric and am interested in the scalability of this package.

For example, I would like to set distance_type=func.
As with sklearn's NearestNeighbors, the function typically has arguments for each point input (can be input as 2D numpy vectors). Would this work for NGT?

def func(x1, x2): #L1
    return np.sum(np.abs(x1-x2), axis=1)
@masajiro
Copy link
Member

Currently I have no plan to implement a custom distance. But, since I know that a custom distance is important for some applications, I would like to implement a custom distance into C++ API in future. But, I do not think that a custom distance defined by slow python is an effective way, because the distance function largely affects the search time.

@MilesCranmer
Copy link
Author

Thanks. Just to check, if I want to add a custom distance myself in the C++ without adding all the architecture for it, could I potentially just replace the following lines:

inline static double compareL2(const OBJECT_TYPE *a, const OBJECT_TYPE *b, size_t size) {
const OBJECT_TYPE *last = a + size;
const OBJECT_TYPE *lastgroup = last - 3;
COMPARE_TYPE diff0, diff1, diff2, diff3;
double d = 0.0;
while (a < lastgroup) {
diff0 = (COMPARE_TYPE)(a[0] - b[0]);
diff1 = (COMPARE_TYPE)(a[1] - b[1]);
diff2 = (COMPARE_TYPE)(a[2] - b[2]);
diff3 = (COMPARE_TYPE)(a[3] - b[3]);
d += diff0 * diff0 + diff1 * diff1 + diff2 * diff2 + diff3 * diff3;
a += 4;
b += 4;
}
while (a < last) {
diff0 = (COMPARE_TYPE)(*a++ - *b++);
d += diff0 * diff0;
}

and refer to my custom distance as "l2" ? (My distance is similar to L2) Or would there be a lot more I would need to replace?

Cheers,
Miles

@masajiro
Copy link
Member

You are right. You do not need to update other part of the source code. But you have to care about implementing your function. Objects (vector data) are automatically stored as a 16-byte boundary object with zero padding in the index.

inline static double compareL2(const OBJECT_TYPE *a, const OBJECT_TYPE *b, size_t size) {

The size above is the byte length of the 16-byte boundary object. You cannot get the original size of the object in the function. For example, when you specify 3-dimensional floating point object to create an empty index, the size of the function’s parameter is 16 and the added forth floating point of the object is always zero.

@masajiro
Copy link
Member

If you use the default cmake definitions, you have to replace

inline static double compareL2(const float *a, const float *b, size_t size) {

or
inline static double compareL2(const unsigned char *a, const unsigned char *b, size_t size) {

, because AVX must be available.

@raulcarlomagno
Copy link

is there any plan to add support for braycurtis and canberra distances? thanks

@masajiro
Copy link
Member

masajiro commented Mar 2, 2021

We have no plan to implement braycurtis and canberra distances. However, it might not be so difficult to implement the distances by yourself referring to this PR #91.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants