Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

innerText issues #1679

Open
11 of 14 tasks
zcorpan opened this issue Aug 16, 2016 · 15 comments
Open
11 of 14 tasks

innerText issues #1679

zcorpan opened this issue Aug 16, 2016 · 15 comments
Labels
compat Standard is not web compatible or proprietary feature needs standardizing

Comments

@zcorpan
Copy link
Member

zcorpan commented Aug 16, 2016

When working on #1678 I found the following issues.

cc @rocallahan

Getter

  • What should non-CSS UAs do? (45253f9)

  • Is "content order" for CSS boxes defined? (Step 2.5.) (Issue moved to [css-text] Define "content order" for innerText w3c/csswg-drafts#421 )

  • The rp check only works for direct children of rp, not descendants (e.g. <ruby><rp><span>(</span></rp>...).
    Do we want that to work? Or change the content model or rp to "text"?
    (Change <rp>'s content model to Text #1690)

  • Issue inline in the spec:

    This algorithm is amenable to being generalized to work on a ranges. Then we can use it as the basis for Selection's stringifier and maybe expose it directly on ranges. See Bugzilla bug 10583.

  • @annevk said:

    The only thing I'd consider changing further is making the recursion less declarative. Have that be some algorithm that is invoked for each child and also put the result in a variable of some kind the rest of the algorithm uses.

    @zcorpan said:

    Yeah, it could use some more cleanup. Possibly also switch to iterative traversal instead of recursive?

    2202c6c

Setter

(Filed chromium bug 639064, webkit bug 160971, edge bug 8536472 for some of the above bullet points)

Other known issues

@domenic
Copy link
Member

domenic commented Aug 16, 2016

Which of these do you think we'd need to resolve before merging #1678? In my opinion the setter issue should block merging but the rest are OK-ish...

@domenic
Copy link
Member

domenic commented Aug 16, 2016

rocallahan/innerText-spec#5 also seems kind of bad

@rocallahan
Copy link

What should non-CSS UAs do?

Just return textContent as if the node was display:none?

Chromium throws for
.innerText = "x" etc, Gecko does not.

That adds complexity and it's hard to imagine it's required for Web compat.

Only update existing text node's data if element has just 1 text node?

Ditto.

rocallahan/innerText-spec#5 also seems kind of bad

Yeah that should be fixed, but you could fix it after merging.

@zcorpan
Copy link
Member Author

zcorpan commented Aug 17, 2016

In my opinion the setter issue should block merging but the rest are OK-ish...

Fixed the setter in 0783a61

@zcorpan
Copy link
Member Author

zcorpan commented Aug 17, 2016

Found a new issue today (updated OP)

If the assigned value starts with a newline, WebKit/Chromium insert an empty text node, Gecko/IE don't. (Current spec matches Gecko/IE.)

@domenic domenic added the compat Standard is not web compatible or proprietary feature needs standardizing label Aug 17, 2016
zcorpan added a commit that referenced this issue Aug 17, 2016
From https://rocallahan.github.io/innerText-spec/
with the following normative changes:

* Defined behavior for non-CSS UAs.
* The setter is better defined.
* Added [CEReactions, TreatNullAs=EmptyString]
  to the IDL.

Fixes #465.

Remaining issues: #1679
@domenic
Copy link
Member

domenic commented Aug 17, 2016

  • I think we forgot to move a mention of innerText as nonstandard in the CEReactions section
  • we should send a PR to the downstream repo with a redirect or other pointer link so as not to cause confusion going forward.

@zcorpan
Copy link
Member Author

zcorpan commented Aug 18, 2016

The rendering section in HTML has this

User agents that do not support correct ruby rendering are expected to render parentheses
around the text of rt elements in the absence of rp elements.

Is this something that people want to do in innerText? (Edit: if we do that, maybe we should also change the algorithm to just ignore rp elements)

zcorpan added a commit to web-platform-tests/wpt that referenced this issue Aug 18, 2016
zcorpan added a commit to web-platform-tests/wpt that referenced this issue Aug 18, 2016
zcorpan added a commit to web-platform-tests/wpt that referenced this issue Aug 18, 2016
zcorpan added a commit to web-platform-tests/wpt that referenced this issue Jan 10, 2017
Ms2ger pushed a commit to web-platform-tests/wpt that referenced this issue Jan 17, 2017
zcorpan added a commit to web-platform-tests/wpt that referenced this issue Mar 14, 2017
annevk pushed a commit to web-platform-tests/wpt that referenced this issue Mar 14, 2017
@domfarolino
Copy link
Member

I'd like to help clean up some of the aforementioned comments on how the algorithm performs recursion if that'd be helpful, as I think it is a little odd. Personally I prefer a recursive definition to an iterative one in this case given it is a classic DFS, but here are some things I noticed:

1.) Cleaning up recursion

Step 2 mentions much of what substep 1 mentions. It like having the following algorithm:

function topProcedure(element) {
  let list = [];
  for (let i = 0; i < element.children.length; ++i) {
    list += recursiveProcedure(element.children[i]);
  }
}

function recursiveProcedure(childElement) {
  let list = [];
  for (let i = 0; i < childElement.children.length; ++i) {
    list += recursiveProcedure(childElement.children[i]);
  }

  // do some parsing/concat with `list`
}

...we unnecessarily repeat ourselves in this when we could just call recursiveProcedure on the original element. With the above being said, could we just make the whole innerText algorithm recursive? Right now it isn't (only substeps are), is this for performance in that steps 3-6 are expensive enough such that we want to run them only once at the end? I also first thought that maybe it isn't recursive so we can prevent checking if a given element is being rendered at each stage (to prevent tons of style recalculations). Then I noticed substep 3 checks to see if a given node has a CSS box, which is pretty much the definition of being rendered and since this algorithm doesn't explicitly define how to act on SVG elements (I think?), could we replace the CSS box check (substep 3) with the full "being rendered" check benefit from that explicitness?

2.) Element (children) vs Node (childNodes) traversal

Step 2 mentions "applying the following recursive procedure to each child node node of...". Later in substep 1, the algorithm receives a given node node and mentions "recursively applying this procedure to each child of node in...". The former indicates that the recursive algorithm will be given all child nodes of the given element (I assume similar to what would be returned with element.childNodes). The latter indicates the substeps are applied only to the HTMLElement children of future nodes (similar to the collection returned from node.children). Does this matter? Is there a reason for this? I'm assuming no, and that substep 1 is just missing a word (thus making it even more like the outer algorithm + making it more enticing to eliminate duplication).

3.) This one gets me. Is there a reason why the two sections in fiddle should have different results? I can't really think of one but it is entirely possible I'm just missing something.

@domenic
Copy link
Member

domenic commented Jul 10, 2017

So @zcorpan is our innerText expert but he's out for a while. Let me see if I can answer these satisfactorily...

Overall it would indeed be great to clean up the recursion to make it more explicit. "Applying the following recursive procedure" is hard to understand; I think we all knew that. So cleaning it up to be more like normal programming would make a lot of sense.

  1. I think you are right that it could be refactored nicely without any normative change.

  2. I'm not sure where you're seeing children. "child" is a synonym for "child node" in the DOM tree context. We would say "element child node" if we meant only elements. It's very important we hit all child nodes, not just element child nodes, as otherwise we'd miss all the text.

  3. This just seems like something where we're codifying majority legacy behavior, without rhyme or reason. It's notable that Edge at least gives the empty string for both, but Chrome and Firefox give the different results like the spec. Not sure about Safari, but assuming they match Chrome and Firefox, it's probably not worth changing everyone.

@domfarolino
Copy link
Member

I think you are right that it could be refactored nicely without any normative change.

Cool I can submit a PR!

It's very important we hit all child nodes, not just element child nodes, as otherwise we'd miss all the text.

Perfect that's what I wasn't sure about. I guess I was referring to how in step 2 we apply the procedure to "each child node node of...." and in substep 1 we apply the procedure to "each child of..." and didn't know if the lack of node in the second example was significant at all.

...different results like the spec. Not sure about Safari, but assuming they match Chrome and Firefox, it's probably not worth changing everyone.

IIRC I believe Safari matches Chrome + Firefox FWIW. Cool thanks!

@domfarolino
Copy link
Member

domfarolino commented Jul 10, 2017

Are Chrome, Firefox, and Safari in violation of the spec at this time?

Consider the following HTML:

<p>
  <span style="display: none">Does this text count as innerText?</span>
</p>

From my understanding the spec indicates that the text "Does this text...." should be returned from p.innerText, though Chrome + Firefox + Safari return nothing.


Explanation:

When calling p.innerText we eventually slip into the substep recursive procedure which first recurses all the way down to the deepest node first (the Text node). Recursion bottoms out and we eventually hit substep 4 since we're at a Text node. We return a list that looks like this: «"Does this text count as innerText?"».

When we're unwinding upward we hit the stack frame that was passed the <span>. In this frame the list is equal to «"Does this text count as innerText?"» since that is what was returned from our previous recursive call. We hit substep 3 since this node does not have any CSS boxes (display: none right?) and return the list. In other words, we don't do anything special or choose to disregard the list though it came from a node who is not "rendered". Eventually this list propagates up to the top and gets returned as the value of p.innerText. Is this right?

Edit

@domenic I assume by

but Chrome and Firefox give the different results like the spec.

You didn't mean they give different values from what the spec says they should (aka they're in violation); but if that's what you meant then sorry this is duplicate info. ...and if they are indeed in violation I assume we should change the spec right?

@rocallahan
Copy link

Recursion bottoms out and we eventually hit substep 4 since we're at a Text node.

No, we don't, because the Text node has no CSS boxes so we bail out at step 3 before reaching step 4.

@domfarolino
Copy link
Member

domfarolino commented Jul 11, 2017

@rocallahan Thank you very much. I had no idea Text node does not have CSS boxes, apparently my understanding of that concept is deficient!

@domfarolino
Copy link
Member

@rocallahan Hm does that mean we'd never meet the condition for substep 4 then? Because if we did, that means we're operating on a Text node, and if we're operating on a Text node we'd always bail out early (substep 3) no?

@rocallahan
Copy link

A display:none Text node has no CSS boxes but a regular display:inline Text node does.

domenic pushed a commit that referenced this issue Jul 14, 2017
alice pushed a commit to alice/html that referenced this issue Jan 8, 2019
From https://rocallahan.github.io/innerText-spec/
with the following normative changes:

* Defined behavior for non-CSS UAs.
* The setter is better defined.
* Added [CEReactions, TreatNullAs=EmptyString]
  to the IDL.

Fixes whatwg#465.

Remaining issues: whatwg#1679
alice pushed a commit to alice/html that referenced this issue Jan 8, 2019
alice pushed a commit to alice/html that referenced this issue Jan 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compat Standard is not web compatible or proprietary feature needs standardizing
Development

No branches or pull requests

4 participants