New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search issues #216
Comments
|
Another example, my app Go For It! can barely be found: Only when running Attracting new users isn't a huge issue as searching for todo/timer/... works fine, but finding the app by name is basically impossible. |
|
Hmm, I can't reproduce any of this. If you turn on verbose mode with |
Tokenize the search query with tokenize-and-fold, do not compare search terms with an English-only word blacklist and speed up the search token validity check. Also, unittest the stemming feature. See #216
No idea, probably? (
Locale has no effect for me, en_US and nl_NL yield ~ the same results. (System locale is nl_NL) Using So it seems you are right that the individual tokens are dropped. |
|
Reproducing the behavior observed by @peteruithoven: |
|
Hehe ^^ |
I noticed that, yes. That also means that the issue I'm having has a different cause, as this isn't a result from stemming. In both cases a naive substring search would "solve" the issue, however. (assuming excessively large processing power and memory space) I do think that individual tokens should not be discarded when searching for application names as this would make the situation for apps with names like the one I maintain rather hopeless. |
This will make it possible for users to find apps like "Go for it!" which otherwise would be impossible to search for. (Note: We do not keep such small search tokens in the cache, except for high-value texts, like name and summary) CC: #216
|
As part of the caching rework, I also landed a few search optimizations which should improve the results. $ appstreamcli s cal | grep Identifier | wc -l
134
$ appstreamcli s calc | grep Identifier | wc -l
2
$ appstreamcli s calcula | grep Identifier | wc -l
41
$ appstreamcli s calculato | grep Identifier | wc -l
18
$ appstreamcli s calculator | grep Identifier | wc -l
18The numbers look odd at first, but are easy to explain: Nothing matches "cal" or "calcula" (they also don't get stemmed), so a prefix match is performed and we get broad results of stuff with tokens that do have these prefixes. "calc" however is a direct-match token, so we will find "libreoffice calc" (as that's it's name). The other queries will match calculator apps with varying precision. How many calculators do we find? (grep for "calculator") As for the "Go for It!" oddity, with the new algorithm changes AppStream will keep small tokens in the index if they are of high value (= they stem from the component's name, summary or ID). With a very recent change to the user search query preprocessing, you should also now be able to search using small search tokens. I hope this helps! I am not done with improving search, there is still a lot of stuff to be looked into and potentially to be improved. I am also not sure whether the "prefer exact match only" approach ("calc" only finding Calc) is actually a good thing or confusing to users. If you want to test things, the changes are in master (because the changes are massive and invasive, I expect a bit of time for the dust to settle and all issues to be found - API is also not 100% behaving the same as before yet, that's something that needs to be addressed prior to the release). |
|
I think this is fixed now, search works well now and the original issue of this bug report is addressed. |
|
Thanks for looking into this! |
|
I am planning to make a new release soon :-) |
I'm on elementary OS (Built on Ubuntu 18.04 LTS)
AppStream version: 0.12.5
I noticed an search issue in the elementary OS AppCenter: elementary/appcenter#942 which is apparently reproducible when using the
appstreamcli searchdirectly.Summary of searching results:
cal / calc / calcu / calcul / calculator: Expected results like all sorts of calculator apps.
calcula / calculat: only 2 irrelevant seeming results (these where included with the results above)
calculato: No results
Searching calcul (using grep for brevity):
Searching calculat
Searching calculato
I looked through the existing issues and all search issues seemed fixed quite a while ago.
Please let me know if I can provide more information.
The text was updated successfully, but these errors were encountered: