New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API for performantly bulk-creating pages programatically #11480
base: main
Are you sure you want to change the base?
API for performantly bulk-creating pages programatically #11480
Conversation
Manage this branch in SquashTest this branch here: https://nxpy123feature10833-api-bulk-c-y7omf.squash.io |
@ababic does slug conflict matter at different depths of the tree? For example, if there's an object with the path |
I understood your comments earlier but I have a few doubts regarding them:
|
Hi @NXPY123.
|
wagtail/models/__init__.py
Outdated
if kwargs.get("clean", True): | ||
page.full_clean() | ||
|
||
page_post_details["slug_changed"] = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bulk_create() should only be used to create new items, so this shouldn't be needed (We shouldn't ever modify supplied slugs either)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm so sorry the CustomManager
was something I tested out earlier. It didn't function and I forgot to remove it before commiting. Since it was recommended that we don't have to replicate everything, I didn't look further into fixing it. The only change I intended was the bulk_add_children
and _check_unique
wagtail/models/__init__.py
Outdated
page_post_details["slug_changed"] = False | ||
page_post_details["is_new"] = page.id is None | ||
|
||
if page_post_details["is_new"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surely this would be all objects? bulk_update()
should be uses for updates
Can you please check the |
@NXPY123 I'm not sure this is a direction that we can continue with unfortunately (building on top of |
I've made changes to use |
Hi @NXPY123. Thanks for sticking with it. Yes, it definitely won't work as it is. We need to replicate what |
In the django-treebeard docs they suggest not changing the |
@NXPY123 it's good to be cautious. I expect the reason the docs say that is to keep you using the documented APIs, because what happens underneath is quite complicated. However, I don't see how using those methods can work in a bulk creation context, because each use involves at least one write to the database (one of the key aims of this feature is to avoid that). |
In the django-treebeard code, when the children have a sorted order set in the tree, they use this to find the location to add the child and get the path: if self.pos == 'sorted-sibling':
siblings = self.node.get_sorted_pos_queryset(
self.node.get_siblings(), newobj)
try:
newpos = siblings.all()[0]._get_lastpos_in_path()
except IndexError:
newpos = None
if newpos is None:
self.pos = 'last-sibling'
else:
newpos, siblings = None, []
_, newpath = self.reorder_nodes_before_add_or_move(
self.pos, newpos, self.node.depth, self.node, siblings, None,
False)
I'm kinda stuck trying to figure out how to get newpath for each child when we're bulk adding them since we're not committing and updating the query set each time we add a child so there is a possibility of conflict in the newpath we obtain for the children as the function will give out the same newpath for new children that are determined to have the same position in the queryset. I think converting the queryset to a list would help append the children as we create them but |
Hi @NXPY123. I think we can ignore the 'sorted-sibling' case in your code sample ( |
Can you please check the changes I've made? I haven't checked the code when the children are ordered, only the case where it's unordered. |
wagtail/models/__init__.py
Outdated
node.path = self.__class__._get_path(self.path, node.depth, step) | ||
step += 1 | ||
if len(node.path) > max_length: | ||
raise ValidationError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work on the handling of this. This message is nice and helpful!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is coming along nicely. In addition to the points already raised, it would be good to start thinking about:
- Adding tests
- Moving the logic to the Manager class instead of implementing at the Model level
wagtail/models/__init__.py
Outdated
self.numchild += 1 | ||
node._cached_parent_obj = self | ||
|
||
self.save() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be slightly clearer overall to keep this as a 'preparation' step, and leave saving up to a different method. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a great suggestion. It'll help separate the preparation step and the saving step. Since we're loading the children to the db and saving changes to the parent page I think we can keep them in a separate method.
wagtail/models/__init__.py
Outdated
the page to the database using their inbuilt tree functionality. | ||
""" | ||
|
||
if self.node_order_by and not self.is_leaf(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to remove this case for page types, since node_order_by
is not set by default, and overriding it isn't supported by wagtail
wagtail/models/__init__.py
Outdated
slugs.append(candidate_slug) | ||
|
||
if child.locale_id is None: | ||
child.locale = self.get_default_locale() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Localised content tends to live in its own part of the tree, so inheriting the parent page's locale might be a better option here
wagtail/models/__init__.py
Outdated
bulk_add_children can only be used to add new pages, not to update existing pages" | ||
) | ||
# Get the list of existing children to check for duplicate slugs | ||
existing_pages = self.get_children() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this value isn't used for anything else, how about just pulling the slugs out of the database in _check_unique()
? Also, If slug values are all we are interested in, we should probably use values_list()
with flat=True
, so that we're only fetching what we need.
@@ -3468,12 +3627,14 @@ def locked_for_user(self, obj, user): | |||
|
|||
def user_can_lock(self, obj, user): | |||
"""Returns True if a user who would not normally be able to lock the object should be able to if the object is currently on this task. | |||
Note that returning False does not remove permissions from users who would otherwise have them.""" | |||
Note that returning False does not remove permissions from users who would otherwise have them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try to keep unrelated changes out of the PR if possible. It may help to disable any editor-level auto formatting when working on different projects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry about that. I'm not sure why this happened. I only used Ruff and Black for formatting because the checks usually fail when it's not formatted properly. Could it be because of that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The black formatter is reformatting the file and causing this. But if I don't reformat the file won't the checks fail?
wagtail/models/__init__.py
Outdated
"Duplicate slugs in use within the parent page at '%(parent_url_path)s'" | ||
) | ||
% { | ||
"parent_url_path": self.path, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be self.url_path
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! Thank you for catching that! I got confused between path
and url_path
.
wagtail/models/__init__.py
Outdated
Load the pages into the database and save the parent page in the database | ||
""" | ||
|
||
pages = Page.objects.bulk_create(children) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Page.objects.bulk_create()
here suggests we only want to create generic Page objects, but that is not the case is it? The type of each child should determine which manager is used here.
Hi @NXPY123, I've left a little more feedback. Please continue with moving the changes to the Manager and adding tests. Things tend to change slightly during that process, so it will be better for me to review again once everything is finished. Please also consider marking this PR as a draft until you are finished, as it helps other contributors to know that it's with you, and you aren't waiting on anyone else. |
wagtail/models/__init__.py
Outdated
# Add the slugs of the current children | ||
slugs.extend(self.get_children().values_list("slug", flat=True)) | ||
|
||
if len(slugs) != len(set(slugs)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're converting the list to a set
here, why don't we save a reference to that set and use it in the code below, instead of the list? Remember: Sets are difficult to beat when it comes to membership checks
Co-authored-by: Andy Babic <andyjbabic@gmail.com>
Co-authored-by: Andy Babic <andyjbabic@gmail.com>
I've added tests. When I went through the
Will these cause issues when implementing the functionality? |
@NXPY123 Ahh, that second point sounds like it could be problematic. Honestly, I find that quite surprising with how advanced the API is in other areas. Maybe it just hasn't been in-demand enough. |
Hi @NXPY123. Just to put you on the right track, I was imagining that this would be baked into the existing SimplePage.objects.bulk_create(simplepage_list, parent_page) It sounds like we might be blocked on being able to utilise Django's default implementation, but that doesn't necessarily need to be the end of the story... I would suspect there is a workable solution out there in the wild somewhere. |
I thought about it but wouldn't the |
I found examples for workarounds here: https://stackoverflow.com/questions/49826482/django-valueerror-cant-bulk-create-a-multi-table-inherited-model Can we use any of these? |
We'll, yes, but that is really the point! 😀Bulk creation is typically done via a model's default Manager, so why should this be any different? In my view at least, the bulk creation logic isn't important enough to warrant invention of new concepts. I beleive developers would expect it to work as closely to the default
Your first step really needs to be writing test that fail due to this limitation. If you're not getting a clear error about multi-table-inheritance not being supported, then clearly things aren't working the way you're intending. Once you're at the point where you're sure that's where the issue is, you'll be in a good place to experiment and see if you can get past it (we're not quite there yet, though). |
I've put the method in the |
@ababic sorry I couldn't work on this for a while. Are the changes made fine? Is there anything to add to the docs? |
Fixes #10833 adds an API for bulk creating children.
make lint
from the Wagtail root.