Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PostgreSQL partial_match not working in PageChooserPanel and SnippetChooserPanel #7720

Closed
m5seppal opened this issue Nov 19, 2021 · 17 comments
Closed

Comments

@m5seppal
Copy link

We just updated to Wagtail 2.15.1 and updated to use new database search backend. Previously we have used wagtail.contrib.postgres_search.backend and partial match has never worked us, thus we have actually overridden search for PageChooserPanel and SnippetChooserPanel with our own implementation to make the search usable without using Elasticsearch. In the newest 2.15.1 there is no mention that partial match isn't supported with PostgreSQL, so I assume there is a bug, or then documentation should be updated.

Issue Summary

We have an ArticlePage model which is using Wagtail's default search_fields, for 'title' as following:

index.SearchField('title', partial_match=True, boost=2),
index.AutocompleteField('title'),

When trying to select ArticlePage using PageChooserPanel, search will not match partial words, for example page with title "Hello world" will only match with "Hello" or "world", but not with "Hel". The same is true for SnippetChooserPanel when selecting snippets. Autocomplete works, for example if there are "Hi world" and "Hello world" pages, searching "world" will show both pages immediately when "world" is written, but nothing with "worl" like expected.

To overcome this problem, we have patched Wagtail's default search with our own search implementation, us such:

def search(search_query, items, model):
   """Search override for Wagtail's default search to support partial matching"""
   conditions = Q()
   for term in search_query.split():
       for field in model.search_fields:
           # Ensure only SearchFields are used for filtering
           if isinstance(field, index.SearchField):
               conditions |= Q(**{'%s__icontains' % field.field_name: term})
   return items.filter(conditions)

With that override, the search for PageChooserPanel and SnippetChooserPanel works as expected what comes to partial matching, but this doesn't support other features like boost. Still, this gives us much better user experience, but we would like to get rid of our ugly hacks.

Maybe we are missing something or then PostgreSQL really doesn't support partial matching. In either case, I feel like documentation should be improved to make it clear whether it works or how to make it work.

Technical details

  • Python version: Python 3.9.4
  • Django version: 3.2.6
  • Wagtail version: 2.15.1
  • Browser version: Safari 15
@krukas
Copy link
Contributor

krukas commented Jan 18, 2022

I was having the same issue and looks like this is happening because Lexeme prefix is False. The Lexeme prefix option is set by the class constant LAST_TERM_IS_PREFIX, and there is no option to change this behavior.

My dirty hack/fix:

from wagtail.search.backends.database.postgres.postgres import PostgresSearchQueryCompiler
PostgresSearchQueryCompiler.LAST_TERM_IS_PREFIX = True

@th3hamm0r
Copy link
Contributor

th3hamm0r commented Jun 1, 2022

Having the same issue with the latest wagtail 3.0, but this is not only a postgres issue, it does not work with sqlite either.

This can be reproduced easily:

  1. Start a new project with wagtail start myproject
  2. Create a superuser and login
  3. Create a page with the title "Hello World"
  4. Try to find it by typing "Hel" in the search box. You won't find it, only "Hello", "World" or "Hello World" works.

Similar to the UI, this code example also won't find it:

from wagtail.models import Page
Page.objects.search("Hel")

Fortunately, the quickfix from @krukas works for postgres and therefore for our projects in production.
Setting it on the SQLiteSearchQueryCompiler does not fix it for sqlite (but that's not a big issue for us).

edit: I've just tested it with wagtail 2.15.5, which uses the old search backends by default, and partial matches work there. So this seems to be missing in the new implementations. For 3.0 the old implementation has been removed with 00582ba.
Some related PRs:
#7305
#7420

@kiyoshi-satoo
Copy link

kiyoshi-satoo commented Oct 6, 2022

I hope this will be fixed in new releases ... (Still not working)

@zerolab
Copy link
Contributor

zerolab commented Oct 7, 2022

Hey fellas, a PR would help speed this up. There is not one change to work across all backends, as far as I am aware, so this needs digging into

@krukas
Copy link
Contributor

krukas commented Oct 7, 2022

I think first question should be why is the difference there? Should a normal search on wagta match on wagtail. I think normal search should also be matching on search_word*, to give a beter user experience.

A quick look in all the database search backends shows that the all use LAST_TERM_IS_PREFIX and that his is set to True only for the *AutocompleteQueryCompiler. I don't know what the behavior is for Elasticsearch.

I don't know if we want to completely remove it and always do partial search. Or only set it to True leave the possibility to extend it and set it to False or add a setting to change this.

@zerolab
Copy link
Contributor

zerolab commented Oct 7, 2022

My preference would be for this to be configurable on the database backend options in WAGTAILSEARCH_BACKENDS.

I don't the have the full record of the decisions along the time so can't comment on that. It makes sense that it is on by default for the autocomplete search query compiler as such is the nature. The rest is very much open for interpretation, imho

@gasman
Copy link
Collaborator

gasman commented Oct 7, 2022

Should a normal search on wagta match on wagtail. I think normal search should also be matching on search_word*, to give a beter user experience.

I disagree - for standard search queries, a search for "cat" should not return results for "caterpillar".

@krukas
Copy link
Contributor

krukas commented Oct 13, 2022

I checked the Elasticsearch code and also tested this, and Elasticsearch does partial search.

Have a snippet with name wagtail on field that is searchable, Searching for wagta:

  • Default DB backend: No results
  • Elasticsearch6: snippet with name wagtail is shown
  • Default DB with LAST_TERM_IS_PREFIX = True: snippet with name wagtail is shown

I don't have used the old DB search backend and I understand that this use to work for DB. So the question is the way how search should work changed with the new search backend? If so then Elasticsearch has a bug that give results when it shouldn't or the DB has a bug it is not showing results when it should show results?

@dopry
Copy link
Contributor

dopry commented Oct 14, 2022

@gasman Why should cat not match caterpillar? I think you're making a big assumption there, and one that isn't necessarily very user friendly... Most users working with search are starting with an incomplete idea of exactly what they're looking for and often use jargon... cat could be catalytic converted to a mechanic, or catepillar to a constuction worker. Weighting should be taking care of ensuring full word matches take precendece over partial matches.... Search is a discovery process, putting hard constraints on matching reduces what is discoverable through it, making it less effective.

At the very least is should work when partial_match=True is passed to the backends search method, which it doesn't.

@krukas
Copy link
Contributor

krukas commented Nov 3, 2022

I have created a PR that fix it for all DB search backends: #9591

@gasman
Copy link
Collaborator

gasman commented Nov 3, 2022

@dopry The basic functionality of a search engine is to tokenise the source text word-by-word and provide querying capabilities based on matching those words. On top of that it might apply additional techniques of varying sophistication such as stemming (cats -> cat), synonyms (cat->catalytic), and weighting - and indeed prefix matching is one such feature that might be enabled, if the situation calls for it (e.g. we're providing results for an autocompletion dropdown). However, it isn't technically sound for substring matching to be the default mode of operation, because then that closes off the possibility of applying those more advanced kinds of word-based analysis.

To be clear, this isn't a comment on the overall validity of this issue - if partial match results are indeed not being returned when specifically requested, that's a bug - but switching to partial matching as default isn't the right solution.

@dopry
Copy link
Contributor

dopry commented Nov 11, 2022

@gasman I'm familiar with how search engines work. I agree changing the default behavior of search probably isn't the solution to this issue. The assertion that substring matching closes off the possibility of applying more advanced kinds of word based analysis is false. Substring matching can be applied to stems as easily as full word. Some search engines even offer different match scoring based on the magnitude of the intersection of the search term, stem, and full word.

Who came up with

the intended behaviour of the backends, which is to match only on complete words when performing a standard (non-autocomplete) search

and why? Were the issues with partial matching in the search? Was tested it with users?

@gasman
Copy link
Collaborator

gasman commented Nov 11, 2022

@dopry My point is that exact matching on tokens is the most basic, minimum-viable-product mode of operation for a search engine - everything else is essentially adding layers of "fuzziness" on top of that base case. (Yes, you could take that MVP search engine and swap out the exact lookup for a substring lookup with relatively little effort, and thus make the fuzzy variant the base case - but then you'd also need to implement weighting to get back similarly useful results.)

It doesn't necessarily follow that the developer-facing API should map to the internal architecture of the search implementation, but I think it does make sense in this case, given that we want to give developers access to the low-level configuration if they choose to use it. Even if we had consensus that substring matching with weighting is the most helpful configuration for real-world searches - and I'm not convinced that it is - I believe developers would be better served by a new method with that configuration baked in, perhaps something like Page.objects.natural_search("foo"), rather than having it as standard in the search method and requiring developers who do want something different to 'undo' that configuration to get back to the base case.

Were the issues with partial matching in the search? Was tested it with users?

As far as I remember, search was part of the initial release of Wagtail, at a time when we were delivering the complete CMS as a 6-month project from a ~5 person team. Even if we'd had resources for user testing back then, I don't think exact vs partial matching would have registered on our radar as a candidate for user testing, among the thousands of other design decisions we had to make based on intuition. Either way, I stand by that decision, not just because it would be a backwards-compatibility hassle to change it now, but because I believe it was a sound architectural choice for the reasons given above.

@dopry
Copy link
Contributor

dopry commented Nov 12, 2022

I think you points on backwards compatibility make sense. I'm not really talking about the technical aspects of .search() or the architecture. I'm talking about the experience of using search as an end user. Which is off topic, so I'll try to structure my thoughts in an RFC.

I did experience search not returning partial matches with postgres with partial_match=True, which is what landed me here in the first place. PostgresSearchQueryCompiler.LAST_TERM_IS_PREFIX = True solved my problem.

@mikaraunio
Copy link

@gasman Don't really have an opinion either way on what the default should be, and your points in support of exact matching make a lot of sense to me. However, as someone who just lost an afternoon trying to figure this out before finding this issue, I'd like to reiterate @m5seppal's original point that Wagtail's default behavior really should be documented properly.

As it stands:

  • The default behaviour varies by backend: elasticsearch does partial matching while database doesn't, yet this is documented nowhere
  • Instead, search/indexing.html creates the expectation that partial matching wil just work:

    "partial_match (boolean) - Setting this to true allows results to be matched on parts of words. For example, this is set on the title field by default, so a page titled Hello World! will be found if the user only types Hel into the search box."

@gasman
Copy link
Collaborator

gasman commented Dec 15, 2022

Have now done some digging, and I think the problem is captured by this note in https://docs.wagtail.org/en/latest/topics/search/searching.html#searching-pages:

Before the autocomplete() method was introduced, the search method also did partial matching. This behaviour is deprecated and you should either switch to the new autocomplete() method or pass partial_match=False into the search method to opt-in to the new behaviour. The partial matching in search() will be completely removed in a future release.

It appears that when the database full-text search backends were introduced in Wagtail 2.15, they prematurely adopted the proposed "future" behaviour - namely that you must use autocomplete() to perform partial match queries, and search() never does so, regardless of the partial_match argument.

The search function within the chooser modals (which I believe ought to be performing an autocomplete / partial match search, since it returns results immediately as you type) is still using search() with no partial_match argument - i.e. the deprecated convention that should, officially, still be doing partial matches (and still is, on Elasticsearch).

gasman added a commit to gasman/wagtail that referenced this issue Jan 16, 2023
Fixes wagtail#7720. Unfortunately the different search backends have diverged massively in the autocomplete APIs they support:

* Postgres FTS (and probably the other database FTS backends) ignores the `partial_match` kwarg on `.search()` and `SearchField`, and only supports the `.autocomplete()` / `AutocompleteField` API
* The database fallback backend (and probably Elasticsearch) accepts `partial_match` on `.search()` (in fact I think it always does partial matches regardless of that setting - untested), but raises `NotImplementedError` on `.autocomplete()`

As a result, to perform autocompletion across all backends, we need to try both methods, with a try/except. (Also, user-defined models such as snippets will need either or both of `AutocompleteField` or `SearchField(partial_match=True)` in `search_fields` depending on the backend in use.)

Additionally, while tests have been added to cover the autocomplete behaviour, the test suite only ever runs under the fallback backend (with the exception of the backend-specific tests in wagtail.search), and so these would have succeeded anyhow. Much more cleanup work is needed here, but this will serve as a stopgap solution to get the choosers working as intended again.
gasman added a commit to gasman/wagtail that referenced this issue Jan 19, 2023
Fixes wagtail#7720. Unfortunately the different search backends have diverged massively in the autocomplete APIs they support:

* Postgres FTS (and probably the other database FTS backends) ignores the `partial_match` kwarg on `.search()` and `SearchField`, and only supports the `.autocomplete()` / `AutocompleteField` API
* The database fallback backend (and probably Elasticsearch) accepts `partial_match` on `.search()` (in fact I think it always does partial matches regardless of that setting - untested), but raises `NotImplementedError` on `.autocomplete()`

As a result, to perform autocompletion across all backends, we need to try both methods, with a try/except. (Also, user-defined models such as snippets will need either or both of `AutocompleteField` or `SearchField(partial_match=True)` in `search_fields` depending on the backend in use.)

Additionally, while tests have been added to cover the autocomplete behaviour, the test suite only ever runs under the fallback backend (with the exception of the backend-specific tests in wagtail.search), and so these would have succeeded anyhow. Much more cleanup work is needed here, but this will serve as a stopgap solution to get the choosers working as intended again.
@gasman gasman closed this as completed in f5633db Jan 19, 2023
@gasman
Copy link
Collaborator

gasman commented Jan 19, 2023

This is fixed for the PostgreSQL search backend in #9900. Autocompletion does not currently work at all on the sqlite or mysql FTS backends, and so these retain the existing non-autocompleting search - #9903 has been opened to cover this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants