Avoid `N+1` in users Index view #10477

Tijani-Dia · 2023-05-24T16:39:37Z

I found an N+1 coming from the wagtail_userprofile field in the users' index view.

squash-labs · 2023-05-24T16:39:55Z

Manage this branch in Squash

Test this branch here: https://tijani-diafeatureusers-index-n-4d5ig.squash.io

wagtail/users/views/bulk_actions/user_bulk_action.py

RealOrangeOne · 2023-05-25T10:03:53Z

wagtail/users/views/users.py

@@ -82,12 +82,13 @@ class Index(IndexView):
    is_searchable = True
    page_title = gettext_lazy("Users")

+    model_fields = set(f.name for f in User._meta.get_fields())


This is a breaking change, as if someone modifies this using self.model_fields + [...], it'll result in an error.

I'd suggest keeping this as a list, as it won't impact the performance of the view, nor the original N+1 issue.

I couldn't think of a use case for modifying the self.model_fields. Could you please give an example of when someone would need to do that?

it won't impact the performance of the view

Well, I've made a quick check on a project I'm working on and the model_fields contains 37 entries:

>>> [f.name for f in get_user_model()._meta.get_fields()] ['field_1', 'field_2', 'customdocument', 'customimage', 'field_3', 'field_4', 'field_5', 'wagtail_userprofile', 'document', 'uploadeddocument', 'image', 'uploadedimage', 'locked_pages', 'owned_pages', 'wagtail_revisions', 'requested_workflows', 'finished_task_states', 'wagtail_admin_comments', 'comments_resolved', 'comment_replies', 'page_subscriptions', 'logentry', 'id', 'password', 'last_login', 'is_superuser', 'field_b', 'uuid', 'first_name', 'last_name', 'email', 'is_staff', 'is_active', 'date_joined', 'groups', 'user_permissions', 'index_entries']

In get_users_filter_query, we're querying the list 4 times and we're doing the same 3 times in the get_queryset method (considering the if 'wagtail_userprofile' in self.model_fields line I've added). Most of the fields we're looking for are located near the end of the list or aren't in it at all (username for example) so I think using a set is more efficient.

I'd agree that this is a negligible part of the work done in the view but why do work we don't need to do?

For a list of ~30 items, the performance difference is going to be huge, in fact I honestly doubt it's detectable (hashing is often slower than iteration for small length lists). It's more significantly larger lists that set can make a difference - in is already pretty fast. This feels firmly in the realm of premature optimisation and extra noise for a PR.

I honestly doubt it's detectable (hashing is often slower than iteration for small length lists)

Python's strings cache their hash value. Moreover, all the strings in model_fields are interned at runtime meaning that their hash value will already be computed at the point we come to this function.

When we iterate over the list, we'll still compare strings one by one and that will first check the strings' hash values. So we'll do at least as much work when using lists than when using sets.

I wrote a quick script to benchmark finding a list of strings in a larger list.

import random import string import timeit import sys def build_random_list_of_strings(n): return [ "".join(random.choices(string.ascii_lowercase, k=random.randint(5, 25))) for _ in range(n) ] def find_strings(strings, to_find): """Find a list of strings in a given collection.""" return list(filter(strings.__contains__, to_find)) if __name__ == "__main__": # Obtain parameters from command line arguments if len(sys.argv) > 3: num_strings = int(sys.argv[1]) num_to_find = int(sys.argv[2]) num_to_not_find = int(sys.argv[3]) else: num_strings = 37 num_to_find = 6 num_to_not_find = 1 # Build a list and set of random strings strings_list = build_random_list_of_strings(num_strings) strings_set = set(strings_list) assert len(strings_list) == len(strings_set) # Build a list of strings to find to_find = random.choices(strings_list, k=num_to_find) to_find += build_random_list_of_strings(num_to_not_find) assert find_strings(strings_list, to_find) == find_strings(strings_set, to_find) print("List:") list_time = timeit.timeit( "find_strings(strings_list, to_find)", number=10000, globals=globals(), ) print(f"Time taken: {list_time}\n") print("Set:") set_time = timeit.timeit( "find_strings(strings_set, to_find)", number=10000, globals=globals(), ) print(f"Time taken: {set_time}\n") faster_structure, time_difference = ( ("Set", list_time / set_time) if list_time > set_time else ("List", set_time / list_time) ) print(f"{faster_structure} is {time_difference:.2f} times faster.")

When I run this script using the default values (which correspond to the setup we have for the model_fields), using a set is around 4 times faster than using a list.

This feels firmly in the realm of premature optimisation and extra noise for a PR.

I'll be happy to remove it if other people have the same opinion. To me, however, this is more a case of using the right tool.

Just so that we don't argue on the less important bits of the PR here and in the spirit of making fewer changes that potentially break user code, how about we just do model_fields = set(self.model_fields) at the beginning of get_queryset(), and use it for the rest of the method (including the get_users_filter_query call)?

That way, we take advantage of set's faster search performance (provided that this still holds if the set is freshly constructed every time), while still retaining the customisability of self.model_fields 🙂

laymonage

I'm going to apply my suggestions to settle the set vs. list debate and approve this, as I think this is a very nice improvement even without the set conversion.

Tijani-Dia · 2023-07-17T19:49:15Z

Thanks @laymonage. FWIW, I had some free time and thought it'd be nice to have a series of small PRs to address the various N+1 and performance optimisations pointed by dj-tracker when running the test suite. The debate raised here refrained me from finishing the series but there is still room for improvements.

Tijani-Dia added the 🚀 Performance label May 24, 2023

Tijani-Dia requested a review from zerolab May 24, 2023 16:40

zerolab reviewed May 24, 2023

View reviewed changes

wagtail/users/views/bulk_actions/user_bulk_action.py Outdated Show resolved Hide resolved

RealOrangeOne reviewed May 25, 2023

View reviewed changes

lb- added status:Needs Work component:Page labels Jun 3, 2023

laymonage approved these changes Jul 17, 2023

View reviewed changes

laymonage force-pushed the feature/users-index-n-plus-one branch from e0d2393 to cac13fb Compare July 17, 2023 16:02

laymonage removed the status:Needs Work label Jul 17, 2023

laymonage force-pushed the feature/users-index-n-plus-one branch from cac13fb to 313ccce Compare July 17, 2023 16:08

Avoid N+1 queries in users index view

a6c9409

laymonage force-pushed the feature/users-index-n-plus-one branch from 313ccce to a6c9409 Compare July 17, 2023 16:23

laymonage merged commit a6c9409 into wagtail:main Jul 17, 2023
0 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid `N+1` in users Index view #10477

Avoid `N+1` in users Index view #10477

Tijani-Dia commented May 24, 2023 •

edited

squash-labs bot commented May 24, 2023

RealOrangeOne May 25, 2023

Tijani-Dia May 25, 2023 •

edited

RealOrangeOne May 25, 2023

Tijani-Dia May 25, 2023 •

edited

laymonage Jun 5, 2023 •

edited

laymonage left a comment

Tijani-Dia commented Jul 17, 2023

Avoid N+1 in users Index view #10477

Avoid N+1 in users Index view #10477

Conversation

Tijani-Dia commented May 24, 2023 • edited

squash-labs bot commented May 24, 2023

Manage this branch in Squash

RealOrangeOne May 25, 2023

Choose a reason for hiding this comment

Tijani-Dia May 25, 2023 • edited

Choose a reason for hiding this comment

RealOrangeOne May 25, 2023

Choose a reason for hiding this comment

Tijani-Dia May 25, 2023 • edited

Choose a reason for hiding this comment

laymonage Jun 5, 2023 • edited

Choose a reason for hiding this comment

laymonage left a comment

Choose a reason for hiding this comment

Tijani-Dia commented Jul 17, 2023

Avoid `N+1` in users Index view #10477

Avoid `N+1` in users Index view #10477

Tijani-Dia commented May 24, 2023 •

edited

Tijani-Dia May 25, 2023 •

edited

Tijani-Dia May 25, 2023 •

edited

laymonage Jun 5, 2023 •

edited