Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low index performance after clear() #417

Open
3 tasks done
mz1979 opened this issue May 20, 2024 · 2 comments
Open
3 tasks done

Low index performance after clear() #417

mz1979 opened this issue May 20, 2024 · 2 comments
Assignees
Labels
invalid This doesn't seem right

Comments

@mz1979
Copy link

mz1979 commented May 20, 2024

Describe the bug

Inserting vectors is extremely slow when using non-contiguous keys (Python SDK).

Steps to reproduce

Run this code and it will test the index insertion for contiguous and non-contiguous keys:

from usearch.index import Index
from random import random
import numpy as np

vectors = np.random.rand(600000, 256)
keys = np.arange(len(vectors))
offset = 1_000_000

keys_non_contiguous = []

for u in range(0, len(vectors), 50000):
    fileIndex = int(random()*10)
    batch = int(random()*256)
    batchIndex = int('0b' + bin(batch).removeprefix('0b').zfill(8) + '0'*32, 2)
    keys_non_contiguous.extend([batchIndex + fileIndex * offset + u for u in range(50000)])

keys_non_contiguous = np.array(keys_non_contiguous)

index = Index(
    ndim=256, # Define the number of dimensions in input vectors
    metric='cos', # Choose 'l2sq', 'haversine' or other metric, default = 'ip'
    dtype='f32', # Quantize to 'f16' or 'i8' if needed, default = 'f32'
    connectivity=16, # How frequent should the connections in the graph be, optional
    expansion_add=128, # Control the recall of indexing, optional
    expansion_search=64 # Control the quality of search, optional
  )

# This takes about 20 sec on a 32 vCPU machine
index.add(keys, vectors, log=True, copy=False)

index.clear()

# This takes about 1min15sec on a 32 vCPU machine
index.add(keys_non_contiguous, vectors, log=True, copy=False)

Expected behavior

Performance should match whether contiguous or non-contiguous keys.

USearch version

Build from source branch main-dev

Operating System

Ubuntu 24.04 LTS

Hardware architecture

x86

Which interface are you using?

Python bindings

Contact Details

No response

Are you open to being tagged as a contributor?

  • I am open to being mentioned in the project .git history as a contributor

Is there an existing issue for this?

  • I have searched the existing issues

Code of Conduct

  • I agree to follow this project's Code of Conduct
@mz1979 mz1979 added the bug Something isn't working label May 20, 2024
@mz1979 mz1979 changed the title Bug: Slow index add performances when keys are not contigious Bug: Slow index add performances when keys are not contiguous May 20, 2024
@ashvardanian
Copy link
Contributor

The problem is in clear()! If you reinitialize the index variable with a new constructor it works just as fast. Neat finding! Will investigate.

@ashvardanian ashvardanian self-assigned this May 21, 2024
@ashvardanian ashvardanian changed the title Bug: Slow index add performances when keys are not contiguous Low index performance after clear() May 21, 2024
@ashvardanian ashvardanian added invalid This doesn't seem right and removed bug Something isn't working labels May 21, 2024
@mz1979
Copy link
Author

mz1979 commented May 21, 2024

I get the same performance issue if I do not run the clear but redefine my index variable:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid This doesn't seem right
Projects
None yet
Development

No branches or pull requests

2 participants