Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API for performantly bulk-creating pages programatically #11480

Draft
wants to merge 26 commits into
base: main
Choose a base branch
from
Draft
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
173 changes: 168 additions & 5 deletions wagtail/models/__init__.py
Expand Up @@ -1299,6 +1299,164 @@
def __str__(self):
return self.title

def _process_sibling(self, nodes, parent):
"""
Create child nodes according to the node_order_by attribute.
"""

newpositions = []

Check warning on line 1307 in wagtail/models/__init__.py

View check run for this annotation

Codecov / codecov/patch

wagtail/models/__init__.py#L1306-L1307

Added lines #L1306 - L1307 were not covered by tests
siblings_so_far = self.get_siblings()
for i, node in enumerate(nodes):
pos = self._prepare_pos_var_for_add_sibling("sorted-sibling")
node.depth = self.depth
siblings = self.get_sorted_pos_queryset(siblings_so_far, node)
try:
newpos = siblings.all()[0]._get_lastpos_in_path()
except IndexError:
newpos = None
pos = "last-sibling"
newpositions.append(newpos)

Check warning on line 1318 in wagtail/models/__init__.py

View check run for this annotation

Codecov / codecov/patch

wagtail/models/__init__.py#L1309-L1318

Added lines #L1309 - L1318 were not covered by tests
_, newpath = self.reorder_nodes_before_add_or_move(
pos, newpos, self.depth, self, siblings, None, False
)

parent.numchild += 1

Check warning on line 1323 in wagtail/models/__init__.py

View check run for this annotation

Codecov / codecov/patch

wagtail/models/__init__.py#L1322-L1323

Added lines #L1322 - L1323 were not covered by tests
node.path = newpath
# Add node to siblings_so_far. Make node a queryset to union with siblings_so_far

Check warning on line 1325 in wagtail/models/__init__.py

View check run for this annotation

Codecov / codecov/patch

wagtail/models/__init__.py#L1325

Added line #L1325 was not covered by tests
siblings_so_far = siblings_so_far.union(siblings)

Check warning on line 1327 in wagtail/models/__init__.py

View check run for this annotation

Codecov / codecov/patch

wagtail/models/__init__.py#L1327

Added line #L1327 was not covered by tests
return nodes

def _process_leaf(self, nodes):
"""
Process children to add to a leaf node
"""
# Get number of children

step = 1

Check warning on line 1336 in wagtail/models/__init__.py

View check run for this annotation

Codecov / codecov/patch

wagtail/models/__init__.py#L1335-L1336

Added lines #L1335 - L1336 were not covered by tests
max_length = self.__class__._meta.get_field("path").max_length
for node in nodes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use enumerate() here to handle the step value for you

node.depth = self.depth + 1
node.path = self.__class__._get_path(self.path, node.depth, step)

Check warning on line 1340 in wagtail/models/__init__.py

View check run for this annotation

Codecov / codecov/patch

wagtail/models/__init__.py#L1338-L1340

Added lines #L1338 - L1340 were not covered by tests
step += 1
if len(node.path) > max_length:

Check warning on line 1342 in wagtail/models/__init__.py

View check run for this annotation

Codecov / codecov/patch

wagtail/models/__init__.py#L1342

Added line #L1342 was not covered by tests
raise ValidationError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on the handling of this. This message is nice and helpful!

"The new node is too deep in the tree, try \
increasing the path.max_length property \
and UPDATE your database"
)
self.numchild += 1

Check warning on line 1348 in wagtail/models/__init__.py

View check run for this annotation

Codecov / codecov/patch

wagtail/models/__init__.py#L1347-L1348

Added lines #L1347 - L1348 were not covered by tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be a nice touch if in-memory changes were also reverted in the event of an error.

node._cached_parent_obj = self

Check warning on line 1350 in wagtail/models/__init__.py

View check run for this annotation

Codecov / codecov/patch

wagtail/models/__init__.py#L1350

Added line #L1350 was not covered by tests
self.save()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be slightly clearer overall to keep this as a 'preparation' step, and leave saving up to a different method. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a great suggestion. It'll help separate the preparation step and the saving step. Since we're loading the children to the db and saving changes to the parent page I think we can keep them in a separate method.


Check warning on line 1352 in wagtail/models/__init__.py

View check run for this annotation

Codecov / codecov/patch

wagtail/models/__init__.py#L1352

Added line #L1352 was not covered by tests
return

def _process_unordered_children(self, nodes):
"""
Create child nodes without any specific order.
"""
last_child = self.get_last_child()

Check warning on line 1359 in wagtail/models/__init__.py

View check run for this annotation

Codecov / codecov/patch

wagtail/models/__init__.py#L1358-L1359

Added lines #L1358 - L1359 were not covered by tests
max_length = self.__class__._meta.get_field("path").max_length
for node in nodes:
node.depth = self.depth + 1
node.path = last_child._inc_path()

Check warning on line 1363 in wagtail/models/__init__.py

View check run for this annotation

Codecov / codecov/patch

wagtail/models/__init__.py#L1361-L1363

Added lines #L1361 - L1363 were not covered by tests
last_child = node
if len(node.path) > max_length:

Check warning on line 1365 in wagtail/models/__init__.py

View check run for this annotation

Codecov / codecov/patch

wagtail/models/__init__.py#L1365

Added line #L1365 was not covered by tests
raise ValidationError(
"The new node is too deep in the tree, try \
increasing the path.max_length property \
and UPDATE your database"
)
self.numchild += 1

Check warning on line 1371 in wagtail/models/__init__.py

View check run for this annotation

Codecov / codecov/patch

wagtail/models/__init__.py#L1370-L1371

Added lines #L1370 - L1371 were not covered by tests
node._cached_parent_obj = self

Check warning on line 1373 in wagtail/models/__init__.py

View check run for this annotation

Codecov / codecov/patch

wagtail/models/__init__.py#L1373

Added line #L1373 was not covered by tests
self.save()

Check warning on line 1375 in wagtail/models/__init__.py

View check run for this annotation

Codecov / codecov/patch

wagtail/models/__init__.py#L1375

Added line #L1375 was not covered by tests
return

def _process_child_nodes(self, nodes):
"""
Process the child nodes of this page, setting their paths and depths
as appropriate and also updating the parent's numchild. This is performed by the treebeard library when saving
the page to the database using their inbuilt tree functionality.
"""

if self.node_order_by and not self.is_leaf():

Check warning on line 1385 in wagtail/models/__init__.py

View check run for this annotation

Codecov / codecov/patch

wagtail/models/__init__.py#L1385

Added line #L1385 was not covered by tests
Copy link
Contributor

@ababic ababic Feb 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to remove this case for page types, since node_order_by is not set by default, and overriding it isn't supported by wagtail

return self.get_last_child()._process_sibling(nodes, self)

if self.is_leaf():

Check warning on line 1388 in wagtail/models/__init__.py

View check run for this annotation

Codecov / codecov/patch

wagtail/models/__init__.py#L1388

Added line #L1388 was not covered by tests
return self._process_leaf(nodes)
Copy link
Contributor

@ababic ababic Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming things is hard, and a lot of these methods could benefit from a bit of a rethink. In the context of these additions, they might seem fine, but when you consider the codebase on the whole, names benefit from more detail to distinguish them from existing (or future) similarly named methods.

See how you go with moving the functionality to the Manager first of all, then we can look at the naming in future.

Tip for future: Discussions about naming can often derail an otherwise non-controversial PR, demotivate everybody involved in the process, and reducing the likelihood of changes being merged. You can side-step such problems early on by keeping your additions together in a single method, and using things like comments to explain better what is happening (it also makes code easier to review). It can always be split out at the end if deemed necessary!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tip! I'll keep that in mind 🙌

I didn't know how naming methods would impact the codebase so I didn't think as well as I should've before adding it. Sorry about that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem! learning is all part of the process :)


Check warning on line 1390 in wagtail/models/__init__.py

View check run for this annotation

Codecov / codecov/patch

wagtail/models/__init__.py#L1390

Added line #L1390 was not covered by tests
return self._process_unordered_children(nodes)

def _check_unique(self, children, existing_pages=None):
if not existing_pages:

Check warning on line 1394 in wagtail/models/__init__.py

View check run for this annotation

Codecov / codecov/patch

wagtail/models/__init__.py#L1394

Added line #L1394 was not covered by tests
existing_pages = []
# Extract slugs from the children
# Children is a list of Page objects
slugs = [
child.slug
for child in children + list(existing_pages)
if hasattr(child, "slug") and child.slug
]
if len(slugs) != len(set(slugs)):

Check warning on line 1403 in wagtail/models/__init__.py

View check run for this annotation

Codecov / codecov/patch

wagtail/models/__init__.py#L1403

Added line #L1403 was not covered by tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're converting the list to a set here, why don't we save a reference to that set and use it in the code below, instead of the list? Remember: Sets are difficult to beat when it comes to membership checks

raise ValidationError(
{
"slug": _(
"Duplicate slugs in use within the parent page at '%(parent_url_path)s'"
)
% {
"parent_url_path": self.path,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be self.url_path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Thank you for catching that! I got confused between path and url_path.

}
}
)
for child in children:
if not hasattr(child, "slug") or not child.slug:
NXPY123 marked this conversation as resolved.
Show resolved Hide resolved
candidate_slug = slugify(child.title, allow_unicode=True)

Check warning on line 1416 in wagtail/models/__init__.py

View check run for this annotation

Codecov / codecov/patch

wagtail/models/__init__.py#L1415-L1416

Added lines #L1415 - L1416 were not covered by tests
suffix = 1
while candidate_slug in slugs:
suffix += 1
candidate_slug = "%s-%d" % (candidate_slug, suffix)
child.slug = candidate_slug

Check warning on line 1421 in wagtail/models/__init__.py

View check run for this annotation

Codecov / codecov/patch

wagtail/models/__init__.py#L1418-L1421

Added lines #L1418 - L1421 were not covered by tests
slugs.append(candidate_slug)

if child.locale_id is None:
child.locale = self.get_default_locale()

Check warning on line 1425 in wagtail/models/__init__.py

View check run for this annotation

Codecov / codecov/patch

wagtail/models/__init__.py#L1424-L1425

Added lines #L1424 - L1425 were not covered by tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Localised content tends to live in its own part of the tree, so inheriting the parent page's locale might be a better option here

return

@transaction.atomic
def bulk_add_children(self, children):
"""
Add multiple pages as children of this page.

This calls bulk_create on the Page model, so it bypasses the save method and does not
"""

# Check if the children are in adding state
for child in children:
if not child._state.adding:

Check warning on line 1438 in wagtail/models/__init__.py

View check run for this annotation

Codecov / codecov/patch

wagtail/models/__init__.py#L1438

Added line #L1438 was not covered by tests
raise ValueError(
"Attempted to add a tree node that is \
already in the database.\
bulk_add_children can only be used to add new pages, not to update existing pages"
)
# Get the list of existing children to check for duplicate slugs

Check warning on line 1444 in wagtail/models/__init__.py

View check run for this annotation

Codecov / codecov/patch

wagtail/models/__init__.py#L1444

Added line #L1444 was not covered by tests
existing_pages = self.get_children()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this value isn't used for anything else, how about just pulling the slugs out of the database in _check_unique()? Also, If slug values are all we are interested in, we should probably use values_list() with flat=True, so that we're only fetching what we need.


try:
self._check_unique(children, existing_pages)
except ValidationError as e:

Check warning on line 1449 in wagtail/models/__init__.py

View check run for this annotation

Codecov / codecov/patch

wagtail/models/__init__.py#L1446-L1449

Added lines #L1446 - L1449 were not covered by tests
raise e

# Process the child nodes

Check warning on line 1452 in wagtail/models/__init__.py

View check run for this annotation

Codecov / codecov/patch

wagtail/models/__init__.py#L1452

Added line #L1452 was not covered by tests
self._process_child_nodes(children)

# Load the pages into the database

Check warning on line 1455 in wagtail/models/__init__.py

View check run for this annotation

Codecov / codecov/patch

wagtail/models/__init__.py#L1455

Added line #L1455 was not covered by tests
pages = Page.objects.bulk_create(children)

Check warning on line 1457 in wagtail/models/__init__.py

View check run for this annotation

Codecov / codecov/patch

wagtail/models/__init__.py#L1457

Added line #L1457 was not covered by tests
return pages

@property
def revisions(self):
# Always use the specific page instance when querying for revisions as
Expand Down Expand Up @@ -3456,7 +3614,8 @@

def user_can_access_editor(self, obj, user):
"""Returns True if a user who would not normally be able to access the editor for the object should be able to if the object is currently on this task.
Note that returning False does not remove permissions from users who would otherwise have them."""
Note that returning False does not remove permissions from users who would otherwise have them.
"""

Check warning on line 3618 in wagtail/models/__init__.py

View check run for this annotation

Codecov / codecov/patch

wagtail/models/__init__.py#L3618

Added line #L3618 was not covered by tests
return False

def locked_for_user(self, obj, user):
Expand All @@ -3468,12 +3627,14 @@

def user_can_lock(self, obj, user):
"""Returns True if a user who would not normally be able to lock the object should be able to if the object is currently on this task.
Note that returning False does not remove permissions from users who would otherwise have them."""
Note that returning False does not remove permissions from users who would otherwise have them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try to keep unrelated changes out of the PR if possible. It may help to disable any editor-level auto formatting when working on different projects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry about that. I'm not sure why this happened. I only used Ruff and Black for formatting because the checks usually fail when it's not formatted properly. Could it be because of that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The black formatter is reformatting the file and causing this. But if I don't reformat the file won't the checks fail?

"""

Check warning on line 3631 in wagtail/models/__init__.py

View check run for this annotation

Codecov / codecov/patch

wagtail/models/__init__.py#L3631

Added line #L3631 was not covered by tests
return False

def user_can_unlock(self, obj, user):
"""Returns True if a user who would not normally be able to unlock the object should be able to if the object is currently on this task.
Note that returning False does not remove permissions from users who would otherwise have them."""
Note that returning False does not remove permissions from users who would otherwise have them.
"""

Check warning on line 3637 in wagtail/models/__init__.py

View check run for this annotation

Codecov / codecov/patch

wagtail/models/__init__.py#L3637

Added line #L3637 was not covered by tests
return False

def get_actions(self, obj, user):
Expand Down Expand Up @@ -4288,7 +4449,8 @@
@transaction.atomic
def cancel(self, user=None, resume=False, comment=""):
"""Cancel the task state and update the workflow state. If ``resume`` is set to True, then upon update the workflow state
is passed the current task as ``next_task``, causing it to start a new task state on the current task if possible"""
is passed the current task as ``next_task``, causing it to start a new task state on the current task if possible
"""
self.status = self.STATUS_CANCELLED
self.finished_at = timezone.now()
self.comment = comment
Expand All @@ -4305,7 +4467,8 @@

def copy(self, update_attrs=None, exclude_fields=None):
"""Copy this task state, excluding the attributes in the ``exclude_fields`` list and updating any attributes to values
specified in the ``update_attrs`` dictionary of ``attribute``: ``new value`` pairs"""
specified in the ``update_attrs`` dictionary of ``attribute``: ``new value`` pairs
"""
exclude_fields = (
self.default_exclude_fields_in_copy
+ self.exclude_fields_in_copy
Expand Down