Bug: Slow index.add #176

vprelovac · 2023-08-01T05:51:57Z

Describe the bug

Index creation, vector addition and search loop is 90x slower than identical loop for faiss.

Reproduced in Google colab.

Steps to reproduce

import faiss
from usearch.index import Index

import numpy as np

# Set the seed for reproducibility
np.random.seed(42)

# Generate 500 random vectors of length 1024 with values between 0 and 1
vectors = np.random.rand(500, 1024)
vector=np.random.rand(1024)

%%timeit
# FAISS

indexf = faiss.IndexFlatL2(vectors.shape[-1])
indexf.add(np.array(vectors).astype(np.float32))
D, I = indexf.search(np.array([vector]), 50)
# 1.14 ms ± 4.34 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


%%timeit

index = Index(
    ndim=vectors.shape[-1], # Define the number of dimensions in input vectors
    metric='l2_sq', # Choose 'cos', 'l2sq', 'haversine' or other metric, default = 'ip'
  
   )
index.add(labels= np.arange(len(vectors)), vectors=vectors)
matches, distances, count = index.search(vector, 50, exact=True)
  
# 94.7 ms ± 20.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Expected behavior

n/a

USearch version

latest pip

Operating System

ubuntu

Hardware architecture

x86

Which interface are you using?

Python bindings

Contact Details

No response

Is there an existing issue for this?

I have searched the existing issues

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

ashvardanian · 2023-08-01T06:01:45Z

@vprelovac, oh, I see! So IndexFlatL2 is not actually an index. It just appends vectors and brute-forces on search. On tiny collections with 1000 entries, it makes sense. The kNN benchmarks are generally meaningful from 1M entries onwards. The correct comparison there would be against IndexHNSW from FAISS.

That being said, we can probably automatically switch to an exact search on tiny collections :) I will consider it in the next release.

vprelovac · 2023-08-01T13:25:08Z

the kNN benchmarks are generally meaningful from 1M entries onwards.

Except when they are not :) Our production use case is 500-50000 items, and the index has to be recreated every time. Every ms matters.

Is there a configuration where usearch can be faster than faiss on this use case?

ashvardanian · 2023-08-01T13:33:29Z

@vprelovac, probably. You can pass the additional exact=True for the' search' queries, which will switch to brute force.
As for construction, I've just added a task to the 1.0 TODO list.
Can you please provide more details about your setup?

What's the dimensionality of the vectors?
Which metrics do you use?
Which interface/platform do you prefer? Python, C++, WASM, ...

We may be able to adjust the implementation accordingly 🤗

vprelovac · 2023-08-01T13:37:55Z

Thanks for looking into it.

512-1024
l2sq
Python

This is the typical retrieval scenario in web search when LLMs are used.

I am passing exact=True for search as you see in my test code.

Closes unum-cloud#176

ashvardanian · 2023-08-05T13:39:20Z

Hey, @vprelovac! I have implemented an exact search shortcut, which avoids index construction altogether. The interface is as simple as it gets:

from usearch.index import search, MetricKind

import numpy as np

# Set the seed for reproducibility
np.random.seed(42)

# Generate 500 random vectors of length 1024 with values between 0 and 1
vectors = np.random.rand(500, 1024).astype(np.float32)
vector = np.random.rand(1024).astype(np.float32)

search(vectors, vector, 50, MetricKind.L2sq, exact=True)

I have repeated your benchmark on my MacBook Pro with the M2 CPU, and the speed is higher than for FAISS, even without having a BLAS dependency and no SimSIMD optimizations for the L2 metric. This is going to be merged into 1.0. More optimizations coming soon.

# [0.23.0](v0.22.3...v0.23.0) (2023-08-05) ### Add * `Matches` and `BatchMatches` simple API ([1b40f13](1b40f13)) * Add node offsets in a serialized file ([c600ffd](c600ffd)) * Batch add ([74860d6](74860d6)) * Batch add test ([5f99b05](5f99b05)) * Changing the metric at runtime ([d7bfac7](d7bfac7)) * Compactions ([434c1da](434c1da)) * efficiency estimate in `recall_members` ([64a60b4](64a60b4)) * Exact search shortcut ([a005084](a005084)), closes [#176](#176) * Multi-`Index` lookups ([c5b7ccd](c5b7ccd)) * Parallel View ([ed3f845](ed3f845)) * Prefetching functionality for external memory ([b544ddb](b544ddb)), closes [#170](#170) [#171](#171) * Streaming and in-memory serialization in C++ ([7da44a2](7da44a2)) * Vector alignment ([ea230e0](ea230e0)) ### Break * Final patches for 1.0 release ([8d557e2](8d557e2)) ### Docs * add descriptions of match-related classes ([637e5ef](637e5ef)) * Annotating C 99 and GoLang interfaces ([4b910a8](4b910a8)) * Documenting Python tests ([1f89e0a](1f89e0a)) * Shorten name ([9a6a01c](9a6a01c)) * Spelling and details ([6f25ed9](6f25ed9)) * Spelling and links ([20566e0](20566e0)) * TypeScript docs factual errors ([fe8103c](fe8103c)) * Update benchmarking sections ([96baa09](96baa09)) ### Fix * `reset` and serialization code ([11d7844](11d7844)) * Avoid exception in `.xbin` file is missing ([4863bea](4863bea)) * Avoid spawning needless threads ([9dff0fb](9dff0fb)) * Concurrent file access issues in tests ([5ae6db1](5ae6db1)) * Dead-lock on post-removal insertions ([284b058](284b058)), closes [#175](#175) * Excpetion handling for `index_dense_metadata` ([d9627ba](d9627ba)) * Heap overflow for fractional-size scalars ([459abcd](459abcd)) * Imports in Python benchmarks ([cffe507](cffe507)) * Inferring OS-dependant file path in Python ([7743709](7743709)), closes [#174](#174) * JavaScript bindings ([ee04856](ee04856)) * JS keys should be `bigint` ([e1fbec4](e1fbec4)), closes [#178](#178) * Memory leak and multi-index lookup overflow ([597b0d5](597b0d5)) * Narrowing conversions for WASM 32-bit builds ([79add97](79add97)) * Portable way of matching 32-bit builds ([604e634](604e634)) * Progress reporting issue ([b2565e5](b2565e5)) * Reclaiming file descriptor ([05e908f](05e908f)) * Report error if `reserve` hasn't been called ([f94f358](f94f358)) * Typo in metric name ([34f5530](34f5530)) * Undefined behaviour on duplicate labels ([c04a5cc](c04a5cc)) ### Improve * `usearch_remove` C99 interface ([2072540](2072540)) * Align allocations to page size by default ([134a6f0](134a6f0)) * Broader types support in `usearch.io` ([b1a1439](b1a1439)) * Exposing search stats to users ([2779ffc](2779ffc)) * Feature-complete GoLang bindings ([e2058d1](e2058d1)) * More flexibility for Python args ([6aa06cb](6aa06cb)) * Out-of-bounds checks ([54cecb6](54cecb6)) * Task scheduling with STL threads ([9131287](9131287)) ### Make * Add CMake for C builds ([4d2127b](4d2127b)) * All targets enabled for debugging ([ea0f835](ea0f835)) * Build only WASM tests ([372738b](372738b)) * Typescript ([dacfbed](dacfbed)) * Upgrade to the newest SimSIMD ([368d853](368d853)) ### Refactor * `label_t` to `key_t` ([0d6c800](0d6c800)) * Add ([5d62180](5d62180)) * Index serialization in a file ([ba72585](ba72585)) * JS and GoLang tests ([a45fc40](a45fc40)) * Keep only batch requests in CPython ([44c0318](44c0318)) * Rename `f8` to `i8` to match IEEE ([c37f80b](c37f80b)) * Revert `Matches` ([5731e70](5731e70)) * Splitting proximity-graphs and vectors ([e996b38](e996b38)) * Use Executor instead of std::thread ([c3a3693](c3a3693)) * Vector alignment issue ([b02d0ad](b02d0ad)) ### Test * Set vector alignment ([0acb54a](0acb54a)) * Wrong buffer size caused illegal access ([830e280](830e280))

ashvardanian · 2023-08-05T14:10:02Z

🎉 This issue has been resolved in version 0.23.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

vprelovac · 2023-08-06T00:17:27Z

I can indeed confirm vastly better performance on my setup (by a factor 20). I can only say fantastic work! 🚀

Question: How does one get the distances?

ashvardanian · 2023-08-06T01:44:44Z

Those should be in matches.distances. This method also supports batch search, so you can search for many entries at once. It also supports an additional threads argument, to specify the amount of CPU resources to allocate.

I am glad it worked so well, @vprelovac! We are also preparing major releases in UCall, UForm, and UStore, so stay tuned and don’t hesitate to share feedback. Myself and the team will be happy to find new ways we can improve our projects 🤗

vprelovac · 2023-08-06T04:12:23Z

Thanks this solves the use case for small size, real time, vector based search.

Given another use case of say 1-5M items, whre index can be built before search - how would you approach that?

ashvardanian · 2023-08-06T04:45:32Z

@vprelovac for that I would indeed use the Index class. I’d recommend experimenting with the dtype constructor argument. Try “i8”. If recall doesn’t drop, use that. Otherwise, use “f16”. It will be both space-efficient and fast. I am also experimenting with “i4”, which should work in case you use heavily quantized models. Will have more updates for this in the next couple of days. You can also control the balance between speed and accuracy using the connectivity constructor argument. Higher value, will probably lead to slower construction time, may result in higher accuracy, and will definitely use more memory.

vprelovac · 2023-10-19T01:54:08Z

@ashvardanian Sorry for reopening an old thread.

Finally had some more time to do testing with new version. Here is replication code.
Findings at the end.

!pip install faiss-cpu   git+https://github.com/vioshyvo/mrpt/  usearch


from usearch.index import search, MetricKind
import numpy as np
import mrpt
import faiss
import timeit
import matplotlib.pyplot as plt

def run_mrpt(vector, vectors, k=15):
  index = mrpt.MRPTIndex(vectors)
  res=index.exact_search(vector, k, return_distances=False)
  return res

def run_faiss(vector, vectors,k=15):
  index = faiss.IndexFlatL2(vectors.shape[-1])
  index.add(vectors)
  D, I = index.search(np.array([vector]), k)
  return I

def run_usearch(vector, vectors, k=15):
  matches= search (vectors, vector, k, MetricKind.L2sq, exact=True)
  return matches


sizes = [16, 24, 36, 54, 81, 122, 182, 273, 410, 615, 923, 1384, 2076]
#sizes = [3570, 4500, 5500, 6700, 7700, 8800, 9900, 11000, 15000,37481, 67466, 121440, 218591, 500000]

# Initialize lists to store the execution times
mrpt_times = []
faiss_times = []
usearch_times = []

number=20

# Benchmark the functions for each dataset size
for size in sizes:
    print(size)
    # Generate random dataset and query vector
    vectors = np.random.rand(size, 768).astype(np.float32)
    vector = np.random.rand(768).astype(np.float32)

    # Time the execution of each function
    mrpt_time = timeit.timeit(lambda: run_mrpt(vector, vectors, k=15), number=number)
    faiss_time = timeit.timeit(lambda: run_faiss(vector, vectors, k=15), number=number)
    usearch_time = timeit.timeit(lambda: run_usearch(vector, vectors, k=15), number=number)

    # Append the execution times to the corresponding lists
    mrpt_times.append(mrpt_time)
    faiss_times.append(faiss_time)
    usearch_times.append(usearch_time)

# Plot the results
plt.figure(figsize=(20, 12))
plt.plot(sizes, mrpt_times, label='MRPT', marker='o')
plt.plot(sizes, faiss_times, label='FAISS', marker='o')
plt.plot(sizes, usearch_times, label='usearch', marker='o')

plt.xlabel('Number of Vectors')
plt.ylabel('Execution Time (s)')
plt.title('Nearest Neighbor Search Performance')
plt.legend()
plt.grid()
plt.show()

What is interesting is that for small number of items (<2000), Faiss is still the king of the hill and both mrpt and usearch show great varience.

Going above that, Faiss starts to lag sinigifcantly, and MRPT consistently comes on top.

So perfect verctor search at the moment is a combination of Faiss for <2000 items and MRPT for more than that. Can you make one to rule them all? ;)

ashvardanian · 2023-10-19T02:01:25Z

For sure, @vprelovac! Thanks for sharing updates! Which hardware are you running on? Can you please check my SimSIMD library? I have upgraded it recently, but haven't merged into USearch yet.

vprelovac · 2023-10-19T02:45:01Z

This is just CPU on colab. Not sure what SimSIMD is, but if it affects the speed of usearch on CPU, it should be installed and used by default? Or an example using my code would be great.

ashvardanian · 2023-10-19T07:28:27Z

@vprelovac, I will double check in the next couple of days, in the meantime, here is what I mean by SimSIMD - it now has it's own light weight Python bindings. Here is a set of benchmarks for SimSIMD vs NumPy vs SciPy on a few different machines.

ashvardanian · 2023-10-19T07:30:28Z

Also, if you are running on recent hardware, you may prefer to use half-precision floating point numbers.

ashvardanian · 2023-10-24T23:36:22Z

@vprelovac, can you please check the newest release and please try half-precision as well 🤗

vprelovac · 2023-10-25T01:36:49Z

@ashvardanian Sure - Can you let me know if you tried it with the above code and do I need to make any changes to it?

ashvardanian · 2023-10-26T17:54:14Z

Here are the results I get running your script, @vprelovac:

If I understand correctly, MRPTIndex is just a wrapper around the C++ Eigen library. It's a pretty good library for numeric computations. It might be more accurate to compare it to the faiss.IndexFlatIP and the usearch.MetricKind.IP.

All three systems support batch queries. Both FAISS and USearch support half-precision. With half-precision, FAISS becomes much slower, but we retain good performance. Here are the charts I'm getting for batch size 10, enabling half-precision in USearch, leading to 2x lower memory consumption.

Here is the refreshed script:

!pip install faiss-cpu   git+https://github.com/vioshyvo/mrpt/  search

from usearch.index import search, MetricKind, ScalarKind
import numpy as np
import mrpt
import faiss
import timeit
import matplotlib.pyplot as plt

def run_mrpt(vector, vectors, k=15):
  index = mrpt.MRPTIndex(vectors)
  res=index.exact_search(vector, k)
  return res

def run_faiss(vector, vectors,k=15):
  index = faiss.IndexFlatIP(vectors.shape[-1])
  index.add(vectors)
  D, I = index.search(vector, k)
  return I

def run_usearch(vector, vectors, k=15):
  matches= search (vectors, vector, k, MetricKind.IP, exact=True)
  return matches


from usearch.compiled import hardware_acceleration

print("USearch hardware acceleration:", hardware_acceleration(dtype=ScalarKind.F16, ndim=768, metric_kind=MetricKind.IP), hardware_acceleration(dtype=ScalarKind.F32, ndim=768, metric_kind=MetricKind.IP))

count_dataset = [16, 24, 36, 54, 81, 122, 182, 273, 410, 615, 923, 1384, 2076]

# Initialize lists to store the execution times
mrpt_times = []
faiss_times = []
usearch_times = []

k = 10
count_repetitions = 100
count_queries = 10

# Benchmark the functions for each dataset size
for size in count_dataset:
    print(size)
    # Generate random dataset and query vector
    dataset = np.random.rand(size, 768).astype(np.float32)
    queries = np.random.rand(count_queries, 768).astype(np.float32)

    dataset_f16 = np.random.rand(size, 768).astype(np.float16)
    queries_f16 = np.random.rand(count_queries, 768).astype(np.float16)

    # Time the execution of each function
    mrpt_time = timeit.timeit(lambda: run_mrpt(queries, dataset, k=k), number=count_repetitions)
    faiss_time = timeit.timeit(lambda: run_faiss(queries, dataset, k=k), number=count_repetitions)
    usearch_time = timeit.timeit(lambda: run_usearch(queries_f16, dataset_f16, k=k), number=count_repetitions)

    # Append the execution times to the corresponding lists
    mrpt_times.append(mrpt_time)
    faiss_times.append(faiss_time)
    usearch_times.append(usearch_time)

# Plot the results
plt.figure(figsize=(20, 12))
plt.plot(count_dataset, mrpt_times, label='MRPT', marker='o')
plt.plot(count_dataset, faiss_times, label='FAISS', marker='o')
plt.plot(count_dataset, usearch_times, label='usearch', marker='o')

plt.xlabel('Number of Vectors')
plt.ylabel('Execution Time (s)')
plt.title('Nearest Neighbor Search Performance')
plt.legend()
plt.grid()
plt.show()

vprelovac added bug Something isn't working invalid This doesn't seem right labels Aug 1, 2023

ashvardanian mentioned this issue Aug 1, 2023

1.0: Separating vector store from the Index #168

Merged

7 tasks

ashvardanian added a commit to ashvardanian/usearch that referenced this issue Aug 5, 2023

Add: Exact search shortcut

a005084

Closes unum-cloud#176

ashvardanian closed this as completed Aug 5, 2023

ashvardanian mentioned this issue Aug 5, 2023

Add: Exact search shortcut #182

Merged

ashvardanian added the released label Aug 5, 2023

ashvardanian self-assigned this Aug 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Slow index.add #176

Bug: Slow index.add #176

vprelovac commented Aug 1, 2023

ashvardanian commented Aug 1, 2023 •

edited

Loading

vprelovac commented Aug 1, 2023 •

edited

Loading

ashvardanian commented Aug 1, 2023

vprelovac commented Aug 1, 2023 •

edited

Loading

ashvardanian commented Aug 5, 2023

ashvardanian commented Aug 5, 2023

vprelovac commented Aug 6, 2023 •

edited

Loading

ashvardanian commented Aug 6, 2023

vprelovac commented Aug 6, 2023

ashvardanian commented Aug 6, 2023

vprelovac commented Oct 19, 2023 •

edited

Loading

ashvardanian commented Oct 19, 2023

vprelovac commented Oct 19, 2023 •

edited

Loading

ashvardanian commented Oct 19, 2023

ashvardanian commented Oct 19, 2023

ashvardanian commented Oct 24, 2023

vprelovac commented Oct 25, 2023 •

edited

Loading

ashvardanian commented Oct 26, 2023 •

edited

Loading

Bug: Slow index.add #176

Bug: Slow index.add #176

Comments

vprelovac commented Aug 1, 2023

Describe the bug

Steps to reproduce

Expected behavior

USearch version

Operating System

Hardware architecture

Which interface are you using?

Contact Details

Is there an existing issue for this?

Code of Conduct

ashvardanian commented Aug 1, 2023 • edited Loading

vprelovac commented Aug 1, 2023 • edited Loading

ashvardanian commented Aug 1, 2023

vprelovac commented Aug 1, 2023 • edited Loading

ashvardanian commented Aug 5, 2023

ashvardanian commented Aug 5, 2023

vprelovac commented Aug 6, 2023 • edited Loading

ashvardanian commented Aug 6, 2023

vprelovac commented Aug 6, 2023

ashvardanian commented Aug 6, 2023

vprelovac commented Oct 19, 2023 • edited Loading

ashvardanian commented Oct 19, 2023

vprelovac commented Oct 19, 2023 • edited Loading

ashvardanian commented Oct 19, 2023

ashvardanian commented Oct 19, 2023

ashvardanian commented Oct 24, 2023

vprelovac commented Oct 25, 2023 • edited Loading

ashvardanian commented Oct 26, 2023 • edited Loading

ashvardanian commented Aug 1, 2023 •

edited

Loading

vprelovac commented Aug 1, 2023 •

edited

Loading

vprelovac commented Aug 1, 2023 •

edited

Loading

vprelovac commented Aug 6, 2023 •

edited

Loading

vprelovac commented Oct 19, 2023 •

edited

Loading

vprelovac commented Oct 19, 2023 •

edited

Loading

vprelovac commented Oct 25, 2023 •

edited

Loading

ashvardanian commented Oct 26, 2023 •

edited

Loading