Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question/Feature: Does "view" allow adding new items directly to disk? #97

Closed
2 of 3 tasks
leoplusx opened this issue Jun 9, 2023 · 3 comments
Closed
2 of 3 tasks
Labels
enhancement New feature or request

Comments

@leoplusx
Copy link

leoplusx commented Jun 9, 2023

Describe what you are looking for

AFAIK, "view" allows us to memory map the index to disk. This way, we can load an index that doesn't fit into RAM.

I was just wondering if that will also work for adding items to an index.

If so, what is the process?

  1. Instantiate the index
  2. index.save()
  3. index.view() from file
  4. index.add()

If that does work, is it necessary to call index.save() again at any point, or will each index.add() operation directly write to disk?

If memory mapping does not work for adding items, then we will always need a machine with enough RAM to hold the entire index at least for the creation of that index, or for any adding operation. Is that correct?

Thanks.

Can you contribute to the implementation?

  • I can contribute

Is your feature request specific to a certain interface?

Python bindings

Contact Details

No response

Is there an existing issue for this?

  • I have searched the existing issues

Code of Conduct

  • I agree to follow this project's Code of Conduct
@leoplusx leoplusx added the enhancement New feature or request label Jun 9, 2023
@ashvardanian
Copy link
Contributor

For now, its not supported, but its two minor releases away. It won’t be done through add and will instead use the upcoming merge feature #84

@leoplusx
Copy link
Author

Let me see if I understand who it would work:

Let's say I have a machine with 128 GB RAM and 300 GB of index data - so more data than would fit into RAM.

It sounds as though I could then assemble such an index like this:

  1. Create sub-indices:

    • 100 GB -> index1 (create in RAM, then write to disk)
    • 100 GB -> index2 (create in RAM, then write to disk)
    • 100 GB -> index3 (create in RAM, then write to disk)
  2. Use merge to merge those indices on disk into one large index on disk, without loading any of them into RAM.

  3. Use view to search that large index, without loading it into RAM.

Is that how it would work?

@ashvardanian
Copy link
Contributor

Yes, you are right

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants