Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index implementation rethinking #562

Open
xiaoyifang opened this issue Apr 21, 2023 · 6 comments
Open

Index implementation rethinking #562

xiaoyifang opened this issue Apr 21, 2023 · 6 comments

Comments

@xiaoyifang
Copy link
Owner

xiaoyifang commented Apr 21, 2023

Is your feature request related to a problem? Please describe.
The Goldendict use custom btree implementation to store the index, it serves the purpose well.

Describe the solution you'd like
As we already use xapian for fulltext engine, will use xapian to replace the custom btree implementation is possible and necessary?

I'll just leave the issue here for further consideration.

some drawbacks:

  1. 0 sensitive in word
  2. custom tokenize when process phrase such as a lot of
  3. performance.the more the word,the slower
  4. unsorted

Possible solution
without the adaption of xapian,I would highly try rocksdb.

@data-man
Copy link
Contributor

Maybe https://github.com/greg7mdp/parallel-hashmap is a good choice.

@xiaoyifang
Copy link
Owner Author

xiaoyifang commented Apr 22, 2023

Does parallel-hashmap provide a way to save the structure to file ?

@xiaoyifang
Copy link
Owner Author

greg7mdp/parallel-hashmap#146
last time I use the parallel structure,I got a error .

@data-man
Copy link
Contributor

Does parallel-hashmap provide a way to save the structure to file ?

Yes.

Dump/load feature: when a flat hash map stores data that is std::trivially_copyable, the table can be dumped to disk and restored as a single array, very efficiently, and without requiring any hash computation. This is typically about 10 times faster than doing element-wise serialization to disk, but it will use 10% to 60% extra disk space. See examples/serialize.cc. (flat hash map/set only)

@data-man
Copy link
Contributor

Or use SQLite or another suitable key-value DB.

@xiaoyifang
Copy link
Owner Author

rocksdb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants