-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PostgreSQL partial_match not working in PageChooserPanel and SnippetChooserPanel #7720
Comments
I was having the same issue and looks like this is happening because Lexeme prefix is False. The Lexeme prefix option is set by the class constant LAST_TERM_IS_PREFIX, and there is no option to change this behavior. My dirty hack/fix: from wagtail.search.backends.database.postgres.postgres import PostgresSearchQueryCompiler
PostgresSearchQueryCompiler.LAST_TERM_IS_PREFIX = True |
Having the same issue with the latest wagtail 3.0, but this is not only a postgres issue, it does not work with sqlite either. This can be reproduced easily:
Similar to the UI, this code example also won't find it: from wagtail.models import Page
Page.objects.search("Hel") Fortunately, the quickfix from @krukas works for postgres and therefore for our projects in production. edit: I've just tested it with wagtail 2.15.5, which uses the old search backends by default, and partial matches work there. So this seems to be missing in the new implementations. For 3.0 the old implementation has been removed with 00582ba. |
I hope this will be fixed in new releases ... (Still not working) |
Hey fellas, a PR would help speed this up. There is not one change to work across all backends, as far as I am aware, so this needs digging into |
I think first question should be why is the difference there? Should a normal search on A quick look in all the database search backends shows that the all use I don't know if we want to completely remove it and always do partial search. Or only set it to True leave the possibility to extend it and set it to False or add a setting to change this. |
My preference would be for this to be configurable on the database backend options in I don't the have the full record of the decisions along the time so can't comment on that. It makes sense that it is on by default for the autocomplete search query compiler as such is the nature. The rest is very much open for interpretation, imho |
I disagree - for standard search queries, a search for "cat" should not return results for "caterpillar". |
I checked the Elasticsearch code and also tested this, and Elasticsearch does partial search. Have a snippet with name
I don't have used the old DB search backend and I understand that this use to work for DB. So the question is the way how search should work changed with the new search backend? If so then Elasticsearch has a bug that give results when it shouldn't or the DB has a bug it is not showing results when it should show results? |
@gasman Why should cat not match caterpillar? I think you're making a big assumption there, and one that isn't necessarily very user friendly... Most users working with search are starting with an incomplete idea of exactly what they're looking for and often use jargon... cat could be catalytic converted to a mechanic, or catepillar to a constuction worker. Weighting should be taking care of ensuring full word matches take precendece over partial matches.... Search is a discovery process, putting hard constraints on matching reduces what is discoverable through it, making it less effective. At the very least is should work when partial_match=True is passed to the backends search method, which it doesn't. |
I have created a PR that fix it for all DB search backends: #9591 |
@dopry The basic functionality of a search engine is to tokenise the source text word-by-word and provide querying capabilities based on matching those words. On top of that it might apply additional techniques of varying sophistication such as stemming (cats -> cat), synonyms (cat->catalytic), and weighting - and indeed prefix matching is one such feature that might be enabled, if the situation calls for it (e.g. we're providing results for an autocompletion dropdown). However, it isn't technically sound for substring matching to be the default mode of operation, because then that closes off the possibility of applying those more advanced kinds of word-based analysis. To be clear, this isn't a comment on the overall validity of this issue - if partial match results are indeed not being returned when specifically requested, that's a bug - but switching to partial matching as default isn't the right solution. |
@gasman I'm familiar with how search engines work. I agree changing the default behavior of search probably isn't the solution to this issue. The assertion that substring matching closes off the possibility of applying more advanced kinds of word based analysis is false. Substring matching can be applied to stems as easily as full word. Some search engines even offer different match scoring based on the magnitude of the intersection of the search term, stem, and full word. Who came up with
and why? Were the issues with partial matching in the search? Was tested it with users? |
@dopry My point is that exact matching on tokens is the most basic, minimum-viable-product mode of operation for a search engine - everything else is essentially adding layers of "fuzziness" on top of that base case. (Yes, you could take that MVP search engine and swap out the exact lookup for a substring lookup with relatively little effort, and thus make the fuzzy variant the base case - but then you'd also need to implement weighting to get back similarly useful results.) It doesn't necessarily follow that the developer-facing API should map to the internal architecture of the search implementation, but I think it does make sense in this case, given that we want to give developers access to the low-level configuration if they choose to use it. Even if we had consensus that substring matching with weighting is the most helpful configuration for real-world searches - and I'm not convinced that it is - I believe developers would be better served by a new method with that configuration baked in, perhaps something like
As far as I remember, search was part of the initial release of Wagtail, at a time when we were delivering the complete CMS as a 6-month project from a ~5 person team. Even if we'd had resources for user testing back then, I don't think exact vs partial matching would have registered on our radar as a candidate for user testing, among the thousands of other design decisions we had to make based on intuition. Either way, I stand by that decision, not just because it would be a backwards-compatibility hassle to change it now, but because I believe it was a sound architectural choice for the reasons given above. |
I think you points on backwards compatibility make sense. I'm not really talking about the technical aspects of .search() or the architecture. I'm talking about the experience of using search as an end user. Which is off topic, so I'll try to structure my thoughts in an RFC. I did experience search not returning partial matches with postgres with |
@gasman Don't really have an opinion either way on what the default should be, and your points in support of exact matching make a lot of sense to me. However, as someone who just lost an afternoon trying to figure this out before finding this issue, I'd like to reiterate @m5seppal's original point that Wagtail's default behavior really should be documented properly. As it stands:
|
Have now done some digging, and I think the problem is captured by this note in https://docs.wagtail.org/en/latest/topics/search/searching.html#searching-pages:
It appears that when the database full-text search backends were introduced in Wagtail 2.15, they prematurely adopted the proposed "future" behaviour - namely that you must use The search function within the chooser modals (which I believe ought to be performing an autocomplete / partial match search, since it returns results immediately as you type) is still using |
Fixes wagtail#7720. Unfortunately the different search backends have diverged massively in the autocomplete APIs they support: * Postgres FTS (and probably the other database FTS backends) ignores the `partial_match` kwarg on `.search()` and `SearchField`, and only supports the `.autocomplete()` / `AutocompleteField` API * The database fallback backend (and probably Elasticsearch) accepts `partial_match` on `.search()` (in fact I think it always does partial matches regardless of that setting - untested), but raises `NotImplementedError` on `.autocomplete()` As a result, to perform autocompletion across all backends, we need to try both methods, with a try/except. (Also, user-defined models such as snippets will need either or both of `AutocompleteField` or `SearchField(partial_match=True)` in `search_fields` depending on the backend in use.) Additionally, while tests have been added to cover the autocomplete behaviour, the test suite only ever runs under the fallback backend (with the exception of the backend-specific tests in wagtail.search), and so these would have succeeded anyhow. Much more cleanup work is needed here, but this will serve as a stopgap solution to get the choosers working as intended again.
Fixes wagtail#7720. Unfortunately the different search backends have diverged massively in the autocomplete APIs they support: * Postgres FTS (and probably the other database FTS backends) ignores the `partial_match` kwarg on `.search()` and `SearchField`, and only supports the `.autocomplete()` / `AutocompleteField` API * The database fallback backend (and probably Elasticsearch) accepts `partial_match` on `.search()` (in fact I think it always does partial matches regardless of that setting - untested), but raises `NotImplementedError` on `.autocomplete()` As a result, to perform autocompletion across all backends, we need to try both methods, with a try/except. (Also, user-defined models such as snippets will need either or both of `AutocompleteField` or `SearchField(partial_match=True)` in `search_fields` depending on the backend in use.) Additionally, while tests have been added to cover the autocomplete behaviour, the test suite only ever runs under the fallback backend (with the exception of the backend-specific tests in wagtail.search), and so these would have succeeded anyhow. Much more cleanup work is needed here, but this will serve as a stopgap solution to get the choosers working as intended again.
We just updated to Wagtail 2.15.1 and updated to use new database search backend. Previously we have used wagtail.contrib.postgres_search.backend and partial match has never worked us, thus we have actually overridden search for PageChooserPanel and SnippetChooserPanel with our own implementation to make the search usable without using Elasticsearch. In the newest 2.15.1 there is no mention that partial match isn't supported with PostgreSQL, so I assume there is a bug, or then documentation should be updated.
Issue Summary
We have an ArticlePage model which is using Wagtail's default search_fields, for 'title' as following:
When trying to select ArticlePage using PageChooserPanel, search will not match partial words, for example page with title "Hello world" will only match with "Hello" or "world", but not with "Hel". The same is true for SnippetChooserPanel when selecting snippets. Autocomplete works, for example if there are "Hi world" and "Hello world" pages, searching "world" will show both pages immediately when "world" is written, but nothing with "worl" like expected.
To overcome this problem, we have patched Wagtail's default search with our own search implementation, us such:
With that override, the search for PageChooserPanel and SnippetChooserPanel works as expected what comes to partial matching, but this doesn't support other features like boost. Still, this gives us much better user experience, but we would like to get rid of our ugly hacks.
Maybe we are missing something or then PostgreSQL really doesn't support partial matching. In either case, I feel like documentation should be improved to make it clear whether it works or how to make it work.
Technical details
The text was updated successfully, but these errors were encountered: