-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some cleanups in PostgreSQL search module #5953
Conversation
Manage this branch in SquashTest this branch here: https://kaedrohopostgres-search-cleanu-r7gy2.squash.io |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor documentation points. I'm not qualified to review the query code, but I like seeing the removal of the raw SQL. Thanks @kaedroho!
- Excellent performance for sites with up to 200 000 pages. | ||
Stays decent for sites up to a million pages. | ||
- Faster to reindex than Elasticsearch if you use PostgreSQL 9.5 or more. | ||
- Excellent performance for sites with up to 200 000 pages and stays decent for sites up to a million pages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any data to support this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping @BertrandBordage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I made a test using data from a ~ 200 000 pages Wagtail site from Torchbox that @gasman handed me during the Wagtail space where we created this. Response time was still under a second for search queries on those 200 000 pages, including ranking (which is the most expensive operation).
I then extrapolated the data by cloning existing pages until I reached 2 millions if I remember correctly. It was struggling a bit after around a million pages, where response time went over 5 seconds.
But I don’t have a more detailed report nor did a clean script to create “realistic” fake data of that size.
Also, I deleted the data, I don’t remember which site it was from, as it didn’t matter that much.
So if you prefer being cautious about this, you can remove this claim. Even though it was true at that time, it may not stay true in the future.
2f133d3
to
81c8f28
Compare
Pulled in some code that will be included in a future Django release (query.py)
81c8f28
to
cb6b980
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a big improvement over the "building format strings with format strings" business that was here before :-) Other than my question over RawSearchQuery
(and the one-word docstring fix), this looks good to me.
return '%s', [combined_value] | ||
|
||
|
||
class RawSearchQuery(SearchQueryCombinable, Expression): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kaedroho Please can you explain what this class is doing that Django's own SearchQuery
class doesn't do? My best guess is that it's the equivalent of the two-line patch to SearchQuery
in django/django#12727 , but that implementation has changed so drastically between Django 3.0 and master that I can't see how one relates to the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up until Django 3.1, it wasn't possible to pass in expressions as the query string: django/django@3baf92c#diff-81374e85d785546664c64c9ecb89b085
This reduces the amount of raw SQL in the PostgreSQL search module and makes it easier to follow (I think anyway).
The new
query.py
is taken from django/django#8313, I plan to add tests/docs for this and resubmit it to Django, so it will only live in Wagtail temporarily.