Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] New UI for listing object uses, including in RichText and StreamField. #4702

Closed

Conversation

gasman
Copy link
Collaborator

@gasman gasman commented Jul 24, 2018

Rebase of #4481, with the following fixes applied in response to the earlier code review there:

  • Better cleanup of rich text APIs (LinkHandler and EmbedHandler now deal purely with database-representation / front-end-HTML semantics, and hallo.js-specific legacy code remains elsewhere)
  • the 'find all objects' mode of ModelRichTextCollector.find_object is split into a separate find_all_objects method instead of overloading
  • Removed untested/undocumented logic for finding non-model values and renamed methods accordingly, e.g. StreamFieldCollector.find_values -> find_objects
  • Added special cases for MySQL to account for regexp engine differences

Still to do:

@gasman
Copy link
Collaborator Author

gasman commented Jul 25, 2018

Nested StreamBlock bug now fixed in 7ec418a.

@gasman gasman force-pushed the BertrandBordage-new-uses-ui-rebase-2 branch from 7ec418a to 19952a6 Compare July 25, 2018 15:23

if not remaining_path:
# End of path reached; return model instances found within `value`
if isinstance(current_block, RichTextBlock) and value is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt anyone will actually do it, but what if someone writes a block type returning a RichText object? It will not be handled by this line, while checking for the value will handle such cases. This is why I originally checked the data types instead of the block types.

Also, is it possible to end up with a None value in a RichTextBlock, apart from a manual database rewrite?

(Making this review in the context of 19952a6)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! We need to check against the block definitions for the non-leaf cases, so I was following the same pattern here for consistency... but checking the value type would indeed be more flexible. Will update...

I guess a RichTextBlock can't (currently) be None, but I think it's good to be cautious here.

@BertrandBordage
Copy link
Member

Looks great to me 😄
Thanks for taking over, Matt, and for the review. I know it was pretty hard to review, and I’m sorry about it, but it will be such an awesome addition to Wagtail 🙂

Later we’ll probably have to add a hook to allow users to add their own introspection rules, but that’s for another day!

else:
use = cls(obj, parent=parent, on_delete=on_delete)
if use in exclude:
if use not in originals:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BertrandBordage If multiple objects are being passed to get_all_uses, it's possible for one object to be in the originals list and also part of a CASCADE chain for another of those objects, isn't it? If so, we can't just skip over this logic for objects in originals.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...in fact, what's the purpose of this line at all? As far as I can see, the only reason why we might want to give originals special treatment is that they exist in the exclude list as plain instances rather than Use records, and it may or may not be a good thing to populate Use.parent with a plain model instance rather than a Use.

Given that the parent attribute of Use doesn't appear to be used, tested or documented anywhere, I'm inclined to remove it completely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gasman, to answer your questions (sorry in advance 🙁):

  • originals and exclude always contain Use instances, not Model instances.
  • Use.parent is used in Use.__init__. Each Use instance uses the parent attribute of their ancestors to define its depth. I agree that it could be removed and increment parent.depth instead. However, I still think it’s a good thing to keep a parent attribute in a tree structure.
  • That whole if use in exclude: if use not in originals: […] has a single purpose: work around the messy nested structure returned by Django’s NestedObjects. It can return something like this for an object o1: [o1, [o5], o2, [o4], o3, [o1]] (note that I’m simplifying because the example doesn’t contain multitable inheritance, which is correctly handled by this method). In that case:
    • o2, o3 and o5 are directly related to o1. Yes, it doesn’t seem to make sense because o5 is nested while o2 and o3 are at the same level as o1. This is the main cause for the complexity of this method. [o5] is considered as a list of children of o1 because it’s following o1. o2 and o3 are at a different level because they also contain related objects. I know, this is an awful tree structure, but that’s what we have to deal with.
    • o4 is related to o2. We want to list it, but unlike o5 we have to remember that o4 is contained in o2, whereas o5 was considered at the root of our uses tree structure. So this means we had to remember the previous item, o2, as the parent for the list of children, and therefore we store it in main_use. Before that, we search for other_use, the first instanciation of o2 in case there is any other. This is to avoid using extra RAM because millions of uses can be found for a single deletion…
    • o1 is related to o3, but we don’t want to list o1 since we already know we want to delete it. Since o1 is already in exclude from get_all_uses, it will not be shown. To link o3 to o1, we defined the variable main_use containing the use preceding the list [o1].
  • It’s not possible for one object from originals to be in the CASCADE chain of another object. NestedObjects can only return an object once as a non-leaf node. For example, you couldn’t get [o1, [o2, [o1, [o3]]]]. There are two reasons for that, first, obviously to avoid infinite recursion, and second, because of the SQL optimisations done by NestedObjects to avoid triggering too many requests (even though it’s far from being optimised ☹️).

TLDR:
Without if use not in originals: (wrong because it shouldn’t show a chevron because nothing is contained)
image
With if use not in originals: (right solution)
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we can also get this kind of structure as well: [o1, [o2, [o4], o3, [o1], o5]]. It depends on the case and I couldn’t make sense out of it, it’s either a Django bug or an inconsistency due to some internal way of running the requests. From what I remember and what I’ve tested right now, it seems to depend on the relation type: OneToOneField and GenericForeignKey seem to give the first structure while ForeignKey and ManyToManyField give the second. Maybe I misunderstood, but anyway the difference of structure is pointless to us: we want to show all these relations in the same way.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine, but can this go in code comments please? :-)

Also, could we perhaps remove the Use(foo) == foo behaviour from Use.__eq__? I noticed that the tests are relying on that behaviour, but I have no idea if any other code is - and it's yet another thing that's making this PR extremely difficult to review, because every time I see a comparison between Use objects, I have to stop and ask myself if it's doing the obvious thing or whether there's some secret type-hacking going on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, no problem with removing the ability to compare Use and Model instances.

Well, you see the problem with putting all that as code comments x)
Tricky to do such complex explanations while keeping the code readable. Do you want me to add a docstring+a few inline comments in this method? I can make a commit in my old branch that you could cherry-pick.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please!

if exclude is None:
exclude = set()
main_use = None
for i, obj in enumerate(nested_list):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i isn't being used, so this can be simplified to for obj in nested_list

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Obviously some vestigial code.

@gasman
Copy link
Collaborator Author

gasman commented Jul 26, 2018

Going to have to postpone this for another release :-( The logic around cascading deletions, wrangling the output of NestedObjects into Use records and using exclude lists to de-duplicate the output is still too tangled and undocumented, and it looks like that has a bearing on the view-level logic, so getting to the end of collectors.py isn't going to be the "light at the end of the tunnel" I thought it was... unfortunately I can't spend any more time on this without excessively delaying the 2.2 release.

if self.is_root:
return html
return format_html('{}<i class="icon icon-arrow-right"></i> {}',
(' ' * self.depth * 8), html)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this part got lost somehow, maybe during a merge: the space should be a non-breaking space, I’m sure I originally put it. @gasman please include it again, but with an explicit unicode code: '\u00A0'.

EDIT: It’s one of my merges that lost it 😦

@BertrandBordage
Copy link
Member

BertrandBordage commented Aug 7, 2018

@gasman, you can cherry-pick the latest commits I added to this branch: https://github.com/BertrandBordage/wagtail/commits/BertrandBordage-new-uses-ui-rebase-2
i.e. bbbe75d, e8ca0bd, edc723d, 456c82e & 509e574

They contain all the changes we’ve discussed 😉

@gasman gasman force-pushed the BertrandBordage-new-uses-ui-rebase-2 branch from 4b17017 to a80ba37 Compare August 22, 2018 16:43
@gasman
Copy link
Collaborator Author

gasman commented Aug 22, 2018

Rebased with latest fixes cherry-picked.

# When ``use`` is in ``originals``, we don’t want
# to mark it as parent of the next item because we want
# to display the next item as root, otherwise the first
# level of deleted objects would be shown nested.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BertrandBordage OK, let me see if I've got this right: this logic comes into play when nested_list is a structure such as:

[a, [b], b, [c]]

where:

  • b is one of the items from originals
  • b appears in the child list of an earlier item
  • the top-level occurrence of b in the list is immediately followed by a sub-list

In this case, we do not set parent_use to b - it gets left at its existing value, which is a, and so c ends up being assigned as a child of a. Is that what you intended to happen? Is this actually a legitimate way that NestedObjects might represent c being a child of a, and are we relying on the fact that if c was meant to be a child of b, the list would look like [a, [b, [c]], b, [c]] (in which case the second c would be skipped along with the second b)? Ouch...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gasman No, c will not be marked as a child of a because a is also in originals. So for c, parent_use is None. This makes sense, because when you want to see what’s related to a and b, you only want c to be listed as a root node. You do not want to display a or b.

In other words, the algorithm answers to:

What is related to a & b?

with

c

c is correctly meant to be a child of b, and this algorithm takes it into account: if b was not in originals, then b would be a root node and c would be a child of b.

In other words, the algorithm answers to:

What is related to a?

with

b
- c

You might want to tell me “I didn’t mention a was in originals”. But that’s necessarily the case: NestedObjects returns a structure where the order is important. The first root nodes are always from originals, that’s why this iteration logic remembering the previous root node works.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BertrandBordage Ah, I missed the detail that exclude always contains the originals whenever we call this - lines 565-566 (which are never used?) led me to understand that the initial state was the empty set. Could we make exclude a required argument, or at least change those lines to

if exclude is None:
    exclude = set(originals)

so that it's not misleading?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can indeed add it. I kept it consistent with from_flat_iterable, but that was probably not the best decision.

# to mark it as parent of the next item because we want
# to display the next item as root, otherwise the first
# level of deleted objects would be shown nested.
if use in originals:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BertrandBordage Would it be valid to set parent_use = None at this point? I think this is what's hurting my head so much about this code 😀 : the fact that parent_use is being left at its previous value in this case, which might be something left over from way earlier in the iteration. I suspect you're relying on an unspoken rule about the behaviour of NestedObjects which ensures that parent_use is only ever None at this point: i.e. it will never give you a situation like

originals = exclude = {a, b}
nested_list = [a, c, b, [d]]

where a non-original (c) appears at the top level before all the originals have been output. If so, setting parent_use = None would make the intention clearer, and save us from having to figure out whether that rule is really true.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I’m indeed relying on that order rule, as I mentioned in my previous message.
You’re totally right, we should explicitly add another parent_use = None here to make it extra clear.
I tried too much to do an optimal algorithm…

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry btw for the headache ☹️

@gasman gasman force-pushed the BertrandBordage-new-uses-ui-rebase-2 branch from 5c824e4 to bb84a96 Compare June 5, 2019 14:59
@gasman
Copy link
Collaborator Author

gasman commented Jun 5, 2019

@solarissmoke has fixed the query counts in bb84a96, but combining this PR with a couple of other patches (#5266, #5286, #5353, none of which change the test fixtures AFAICS) for a custom build has caused them to change again. Not sure what's causing the variation, but that'll definitely need to be dealt with before this can be merged.

gasman added a commit to nypublicradio/wagtail that referenced this pull request Jun 5, 2019
gasman added a commit to nypublicradio/wagtail that referenced this pull request Jun 5, 2019
See wagtail#4702 (comment) - they can't be relied on to remain stable when combined with other PRs
@BertrandBordage
Copy link
Member

Alright, I will get back to this cursed PR this summer ^^"

@squash-labs
Copy link

squash-labs bot commented Jun 5, 2019

Manage this branch in Squash

Test this branch here: https://bertrandbordage-new-uses-ui-re-g14rq.squash.io

@bruecksen
Copy link
Contributor

@BertrandBordage do you think there will be any progress soon? One of my clients is wanting this feature and I'm wondering if I should invest any time to make it working or wait until this PR is finished and shipped in a future version.

Copy link
Contributor

@sir-sigurd sir-sigurd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be customizable somehow. For example, is_shown_in_uses() and get_edit_url() can't be easily set on third-party models, also at work we'd like to PROTECT instances found in StreamFields and RichTextFields.

I think we could convert get_all_uses() to a class, set ModelRichTextCollector, StreamFieldCollector and Use as its attributes and make this new class overridable via setting.

{% block titletag %}{% trans "Delete image" %}{% endblock %}

{% block content %}
{% trans "Delete image" as del_str %}
{% include "wagtailadmin/shared/header.html" with title=del_str icon="image" %}
{% include "wagtailadmin/shared/header.html" with title=del_str subtitle=image icon="image" %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change should be moved in another PR.

<input type="submit" value="{% trans 'Yes, delete' %}" class="button serious" />
</form>
{% if uses.are_protected %}
<p>{% trans 'Impossible to delete: this object is referenced by other objects through protected relations.' %}</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might make sense to move this line into shared template. Also might make sense to make it more prominent with class="help-block help-critical".

Comment on lines +229 to +230
for field in self.fields:
filters |= Q(**{field.attname + '__regex': pattern})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should generate pattern for each field separately considering features they are using, i.e. not search for Document in RichTextField(features=('bold', 'italic'))

{% csrf_token %}
<input type="submit" value="{% trans 'Yes, delete' %}" class="button serious" />
<a href="{% if next %}{{ next }}{% else %}{% url 'wagtailsnippets:list' model_opts.app_label model_opts.model_name %}{% endif %}" class="button button-secondary">{% trans "No, don't delete" %}</a>
</form>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{% include 'wagtailadmin/shared/uses.html' %} should be added here.

@gasman
Copy link
Collaborator Author

gasman commented Jun 17, 2020

Now rebased at #6129 - will keep both PRs open for now so that @sir-sigurd's feedback (thanks!) doesn't end up getting lost in transmission... Copying my comment from there:

Right now I think the main thing standing in the way of merging this is that it really needs refactoring to be less "monolithic". As it stands, the logic in collectors.py is touching on the internal details of many areas of Wagtail (rich text, streamfields, foreign key on_delete behaviour...) which is going to make it hard to maintain - for example, if we extend rich text to support other kinds of object references beyond linktype and embedtype (see #4223) then it's not immediately clear what needs to change in collectors.py.

I think the key to that will be making the concept of "a collector" into a well-defined internal API, so that whenever we have a field type that can contain object references (which currently includes ForeignKey, RichTextField and StreamField), there's a corresponding Collector class that implements a few standard methods for querying those references. After the previous review on #4702 (which teased out a lot of the internal workings of collectors.py and added docstrings) we hopefully have a fairly good idea of what those methods will be - something along the lines of:

  • my_collector = RichTextCollector(model_class, field_name) - constructor for a collector that can find object references in one particular field of one particular model
  • my_collector.find_candidate_references(obj) - return a queryset of objects (of type model_class) that may contain references to the given object, within the field under consideration. This will be done through a fast query (e.g. a regexp match) that may return false positives; these will subsequently be verified individually via get_object_references.
  • my_collector.get_object_references(instance) - given an instance of model_class, return a list of (model_class, id, reference_type) tuples for all object references found in the field under consideration. reference_type will be an identifier similar to the on_delete parameter which indicates what the effect of a deletion will be (cascade, protect, set null, ignore etc).

Given how rich text and streamfields interact in RichTextBlock, it may also be necessary to add some lower-level methods to this API, such as "return a database-specific regexp for matching an object reference".

Hopefully splitting the logic up in this way will also help to make our tests more tightly-scoped, and pin down what's causing the changing query counts.

Base automatically changed from master to main March 3, 2021 17:08
yield obj, found_obj


class StreamFieldCollector:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gasman Do you think we could simplify this if we could switch stream fields to be based on JSONField?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, although then we still have to deal with RichTextBlocks, which means we'll end up mixing two techniques (doing regexp matches on items within a JSON field, rather than just one big regexp).

@gasman
Copy link
Collaborator Author

gasman commented Oct 5, 2022

An alternative mechanism for reference-counting has now been implemented in #9279. Rebuilding the deletion confirmation page from this PR on top of that mechanism should be a relatively small task, at which point this can be considered resolved.

@laymonage
Copy link
Member

Superseded by #10072. Instead of showing the references on the delete confirmation view, we're reusing the existing usage view and adding the on delete information there if you access it through a link on the delete confirmation view.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants