New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indexes Being Dropped #254
Comments
Hello, This may be normal depending on your Sonic configuration. Sonic has a maximum words per object configured as Also, regarding the indexing of Can you try temporarily altering this game name and add a non-stopwords word just to be sure? (stopwords for English are listed in: https://github.com/valeriansaliou/sonic/blob/master/src/stopwords/eng.rs) Also, you should probably force English as detected langage in the ingest query, instead of relying on the built-in ngram locale detector, which would pick the wrong langage for a lot of those short strings, and thus rely on a different stopwords dictionary that may interfere. If you deem that all that you ingest is English-based, then you should force "eng" as |
The stopwords seems to be the culprit :~) I don't think the issue is with How would I now go about solving the stopword issue? Is there any way this can be disabled (not sure how crucial stopwords are to how the library works) |
Unfortunately for now there is no way to disable the stopwords "cleanup" system. I'll add a protocol options though to disable that on all requests. |
Problem
I'm having issues where a predictable amount (around 10%) of indexed searches are being dropped. I'm not sure if it's an issue with my dataset, but it doesn't seem so.
This is occurring in development using Sonic 1.3.0 (installing on Homebrew, running with this config) and production on Sonic 1.3.0 (pulling from Docker, running on Kubernetes) environments, both connecting via Node.js on
sonic-channel@1.2.6
.FYI: I'm trying to add "interests" to the search index.
Test Script
This script indexes a interest and then immediately drops it. If the
removed
variable is equal to zero, then it means that the search term wasn't properly indexed.After running on 1,000 interests, the same "The Last of Us Part II" interest (the 504th interest in the array) is consistently not properly being indexed:
After upping to 10,000 interests, the same 19 interests are being dropped from the search index:
Conclusion
I'm not seeing anything in my logs, changing the collection id is not having an effect either. For "The Last of Us Part II", here are the config params I'm using:
After indexing 40,000~ interests, I'm seeing exactly
4040
being consistently dropped :~(I'd really appreciate some help with this issue—thanks!
The text was updated successfully, but these errors were encountered: