Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding multiple backends with wagtail.search.backends.database results in index being overwritten with last in the list #9253

Open
enzedonline opened this issue Sep 23, 2022 · 2 comments
Labels
status:Unconfirmed Issue, usually a bug, that has not yet been validated as a confirmed problem. type:Bug

Comments

@enzedonline
Copy link

When creating multiple backends on a multi-language site, it turns out that the index is simply overwritten for each backend with the result that the last in the list is the one the site will be indexed with.

For example:

WAGTAILSEARCH_BACKENDS = {
    'default': {
        'BACKEND': 'wagtail.search.backends.database',
        'SEARCH_CONFIG': 'english',
    },
    'fr': {
        'BACKEND': 'wagtail.search.backends.database',
        'SEARCH_CONFIG': 'french',
    },    
    'es': {
        'BACKEND': 'wagtail.search.backends.database',
        'SEARCH_CONFIG': 'spanish',
    },
}

In the above example, the entire site will be indexed according to the Spanish dictionary and unaccenting/stemming rules.

If I create a page with the title “We are going walking in the mountains” with backends:

    'spanish': {
        'BACKEND': 'wagtail.search.backends.database',
        'SEARCH_CONFIG': 'spanish',
    },    
    'english': {
        'BACKEND': 'wagtail.search.backends.database',
        'SEARCH_CONFIG': 'english',
    },

The indexed title will appear as expected with stop words omitted and stem words:

'go':3B 'mountain':7B 'today':8B 'walk':4B

Searching with the default (English) backend and the term ‘mountains’ returns the page.

With the same backends reversed in order:

'are':2B 'going':3B 'in':5B 'mountains':7B 'the':6B 'today':8B 'walking':4B 'we':1B

The stop words are no longer filtered nor words stemmed as they are being parsed by the Spanish backend.

Searching the term ‘mountains’ with the english backend produces no match.

When running update_index, I see the following:

Updating backend: english
english: Rebuilding index default
…
Updating backend: spanish
spanish: Rebuilding index default
…

So I would understand this means the default index is being rebuilt over and over for each backend which would explain why the last entry is the one the site is indexed by.

While it says default index, there is nothing in the documentation regarding creating multiple indexes.

Issues:

  1. The documentation needs to be updated to advise that this is the behaviour – knowing this in advance would have saved days of troubleshooting other knock-on effects of this and a whole search coding/infrastructure strategy that ultimately won't work as is.
  2. The 'wagtail.search.backends.database' backend needs to support multiple search configs.

Perhaps supporting multiple search configs could be achieved by adding a column for the backend name with one row per backend + observation in the wagtailsearch_indexentry table - searching with the backend 'es' for example would return results from rows that only match that backend name (this is not the same as filtering pages that match the locale, this is filtering the index to match the required search config). This would only work if the index was not rebuilt for each backend entry though. Alternatively, a separate index per WAGTAILSEARCH_BACKENDS key or as specified in the config declaration.

Wagtail: 3.03
Django: 4.0.2
Python: 3.10.6

@enzedonline enzedonline added status:Unconfirmed Issue, usually a bug, that has not yet been validated as a confirmed problem. type:Bug labels Sep 23, 2022
@enzedonline
Copy link
Author

As a follow-up, I tried adding 'INDEX' to the WAGTAILSEARCH_BACKENDS definitions:

    'es': {
        'BACKEND': 'wagtail.search.backends.database',
        'SEARCH_CONFIG': 'spanish_extended',
        'INDEX' : 'es'
    },
    'english': {
        'BACKEND': 'wagtail.search.backends.database',
        'SEARCH_CONFIG': 'english_extended',
        'INDEX' : 'en'
    },

Rebuilding, I see the output updating indexes with those names:

Updating backend: es
es: Rebuilding index es
...
Updating backend: en
english: Rebuilding index en
...

However, the values in wagtailsearch_indexentry remain unchanged - only one index exists which is destroyed and rebuilt for each WAGTAILSEARCH_BACKEND.

@neil-justice
Copy link

I would say this is confirmed - wagtailsearch_indexentry has no way to disambiguate which search backend an entry is for, so there's no way this could work right now, as far as I can see.

It would be useful for us if this was possible - I can't think of a workaround for postgres search on multi-lingual sites to do stemming differently in different languages without this.

How about this as a fix - add a backend field to BaseIndexEntry (and BaseIndexEntry's unique_together)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status:Unconfirmed Issue, usually a bug, that has not yet been validated as a confirmed problem. type:Bug
Projects
None yet
Development

No branches or pull requests

2 participants