Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some cleanups in PostgreSQL search module #5953

Merged
merged 10 commits into from
May 18, 2020

Conversation

kaedroho
Copy link
Contributor

This reduces the amount of raw SQL in the PostgreSQL search module and makes it easier to follow (I think anyway).

The new query.py is taken from django/django#8313, I plan to add tests/docs for this and resubmit it to Django, so it will only live in Wagtail temporarily.

@squash-labs
Copy link

squash-labs bot commented Apr 14, 2020

Manage this branch in Squash

Test this branch here: https://kaedrohopostgres-search-cleanu-r7gy2.squash.io

Copy link
Contributor

@tomdyson tomdyson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor documentation points. I'm not qualified to review the query code, but I like seeing the removal of the raw SQL. Thanks @kaedroho!

- Excellent performance for sites with up to 200 000 pages.
Stays decent for sites up to a million pages.
- Faster to reindex than Elasticsearch if you use PostgreSQL 9.5 or more.
- Excellent performance for sites with up to 200 000 pages and stays decent for sites up to a million pages.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any data to support this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I made a test using data from a ~ 200 000 pages Wagtail site from Torchbox that @gasman handed me during the Wagtail space where we created this. Response time was still under a second for search queries on those 200 000 pages, including ranking (which is the most expensive operation).
I then extrapolated the data by cloning existing pages until I reached 2 millions if I remember correctly. It was struggling a bit after around a million pages, where response time went over 5 seconds.

But I don’t have a more detailed report nor did a clean script to create “realistic” fake data of that size.

Also, I deleted the data, I don’t remember which site it was from, as it didn’t matter that much.

So if you prefer being cautious about this, you can remove this claim. Even though it was true at that time, it may not stay true in the future.

docs/reference/contrib/postgres_search.rst Outdated Show resolved Hide resolved
wagtail/contrib/postgres_search/backend.py Show resolved Hide resolved
wagtail/contrib/postgres_search/query.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@gasman gasman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a big improvement over the "building format strings with format strings" business that was here before :-) Other than my question over RawSearchQuery (and the one-word docstring fix), this looks good to me.

wagtail/contrib/postgres_search/backend.py Outdated Show resolved Hide resolved
return '%s', [combined_value]


class RawSearchQuery(SearchQueryCombinable, Expression):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaedroho Please can you explain what this class is doing that Django's own SearchQuery class doesn't do? My best guess is that it's the equivalent of the two-line patch to SearchQuery in django/django#12727 , but that implementation has changed so drastically between Django 3.0 and master that I can't see how one relates to the other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up until Django 3.1, it wasn't possible to pass in expressions as the query string: django/django@3baf92c#diff-81374e85d785546664c64c9ecb89b085

@kaedroho kaedroho merged commit 58f6aec into wagtail:master May 18, 2020
@kaedroho kaedroho deleted the postgres-search-cleanup branch May 18, 2020 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants