New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spec innerText #5

Closed
karlcow opened this Issue Jul 29, 2015 · 46 comments

Comments

@karlcow
Copy link
Collaborator

karlcow commented Jul 29, 2015

The innerText is used in a number of Web sites creating interop issues.

see https://bugzilla.mozilla.org/show_bug.cgi?id=264412

@Ms2ger

This comment has been minimized.

Copy link
Member

Ms2ger commented Jul 29, 2015

When Aryeh wrote a spec, nobody was interested in implementing it; what makes you think this attempt will go any better?

@karlcow

This comment has been minimized.

Copy link
Collaborator Author

karlcow commented Jul 29, 2015

@Ms2ger Yeah one of these. It might not get any better. I usually don't think ;) I'm opening the issues on what is needing for the Web Compatibility. We will figure out if/how to spec it if/when implemented.

Currently it is mostly used as a replacement for textContent in all the bugs we can see.

@foolip

This comment has been minimized.

Copy link
Member

foolip commented Jul 29, 2015

I wrote some things in a duped bug:
https://www.w3.org/Bugs/Public/show_bug.cgi?id=25159

Usage is very high:
http://www.chromestatus.com/metrics/feature/timeline/popularity/213
http://www.chromestatus.com/metrics/feature/timeline/popularity/214

As implemented in Blink these attributes depend on layout, but it's not obvious that this is required for web compat, maybe a "semantic textContent" that does special things for a small number of elements like <br>, <p> and <div>.

However someone attempts to specify it, I'd be happy to investigate the difference between that and what's implemented in Blink.

@foolip

This comment has been minimized.

Copy link
Member

foolip commented Jul 29, 2015

Next up on the ladder of crazy would be depending on the computed style, which is still better than depending on the layout structure itself, IMHO.

@myakura

This comment has been minimized.

Copy link

myakura commented Jul 31, 2015

fyi Anne said "To do this right, you would first have to standardize selection." http://discourse.wicg.io/t/standardizing-innertext/799/2

@foolip

This comment has been minimized.

Copy link
Member

foolip commented Jul 31, 2015

@annevk, what does that mean? Not http://w3c.github.io/selection-api/ I take it?

@annevk

This comment has been minimized.

Copy link
Member

annevk commented Jul 31, 2015

Yes, that. innerText is effectively selection.toString().

@foolip

This comment has been minimized.

Copy link
Member

foolip commented Jul 31, 2015

Oh, and that's undefined in the spec with a link to https://www.w3.org/Bugs/Public/show_bug.cgi?id=10583

So in other words, what we need is a common algorithm that takes a range (containing the context object for outerText and the children for innerText) and returns a string?

I guess setting innerText and outerText has no equivalent in the Selection API?

@annevk

This comment has been minimized.

Copy link
Member

annevk commented Jul 31, 2015

Setting operates on the DOM directly, with no regards for layout? Or is setting "magic" too?

@foolip

This comment has been minimized.

Copy link
Member

foolip commented Jul 31, 2015

For some odd reason, HTMLElement::setInnerText checks the layout to see if newlines should be preserved, surrounded with FIXMEs, in what should be equivalent to checking the white-space property on the computed style. I guess it's so that innerText works as "expected" on <pre>.

The outerText setter doesn't do anything like that, though.

@annevk

This comment has been minimized.

Copy link
Member

annevk commented Jul 31, 2015

Well if you just set textContent, <pre> works "as expected". It must be something else.

@foolip

This comment has been minimized.

Copy link
Member

foolip commented Jul 31, 2015

It's weird, in the normal case the text is converted to Text nodes and <br>s, but in the preserveNewline() case newline characters are used instead. Not sure why, the visual result is the same in either case. Test case:
http://software.hixie.ch/utilities/js/live-dom-viewer/saved/3578

@stuartpb

This comment has been minimized.

Copy link

stuartpb commented Aug 2, 2015

I've drafted a proposal for a robust, CSS-based approach to the element -> text conversion both selection.toString() and innerText would use: http://discourse.wicg.io/t/css-plain-text-conversion/976

@karlcow

This comment has been minimized.

Copy link
Collaborator Author

karlcow commented Aug 2, 2015

Searching innerText in Chromium

String Element::outerText()
{
    // Getting outerText is the same as getting innerText, only
    // setting is different. You would think this should get the plain
    // text for the outer range, but this is wrong, <br> for instance
    // would return different values for inner and outer text by such
    // a rule, but it doesn't in WinIE, and we want to match that.
    return innerText();
}
  • and finally the definition of innerText
String Element::innerText()
{
    // We need to update layout, since plainText uses line boxes in the layout tree.
    document().updateLayoutIgnorePendingStylesheets();

    if (!layoutObject())
        return textContent(true);

    return plainText(EphemeralRange::rangeOfContents(*this), TextIteratorForInnerText);
}
@karlcow

This comment has been minimized.

Copy link
Collaborator Author

karlcow commented Aug 2, 2015

Which is basically how it was defined in WebKit source code. It didn't change.
http://trac.webkit.org/browser/trunk/Source/WebCore/dom/Element.cpp#L2333

@rocallahan

This comment has been minimized.

Copy link

rocallahan commented Oct 5, 2015

When Aryeh wrote a spec, nobody was interested in implementing it; what makes you think this attempt will go any better?

We have decided to implement innerText in Gecko, because people keep using it. So we want a spec.

@stuartpb

This comment has been minimized.

Copy link

stuartpb commented Oct 5, 2015

@rocallahan I've started https://github.com/stuartpb/css-plaintext, based on what I've written in http://discourse.wicg.io/t/css-plain-text-conversion/976. It defines a number of new CSS properties, but UAs can choose to treat those properties as having their default values for a "simple" initial implementation.

@rocallahan

This comment has been minimized.

Copy link

rocallahan commented Oct 5, 2015

Thanks. I don't want to try to describe innerText in terms of CSS, because we won't want to implement it that way. I prefer an algorithm like Aryeh's or Kangax's.

@stuartpb

This comment has been minimized.

Copy link

stuartpb commented Oct 5, 2015

Well, if you look at implementations like Chromium's (as @karlcow posted above), it ends up having to go through the layout engine anyway, since any reliable, useful innerText values have to take CSS properties like display: none and visibility: hidden into account.

@rocallahan

This comment has been minimized.

Copy link

rocallahan commented Oct 5, 2015

Yes. The algorithm will need to examine the computed style values of DOM nodes.

@rocallahan

This comment has been minimized.

Copy link

rocallahan commented Oct 5, 2015

But we won't define any new CSS properties.

@stuartpb

This comment has been minimized.

Copy link

stuartpb commented Oct 5, 2015

Well, the CSS properties css-plaintext proposes, which only impact plaintext conversion of elements, are there to ease the rational differences of opinion user agents currently have between each other regarding the processing of said computed styles, which page authors should be able to control (per the Extensible Web Manifesto).

@rocallahan

This comment has been minimized.

Copy link

rocallahan commented Oct 5, 2015

EWM isn't an issue here. Web developers can easily polyfill their own version of innerText using existing standard APIs. In fact, since there is no one true way to convert HTML to plain text, innerText should not be in the Web platform at all. But since sites rely on it, we can't get rid of it, and we need to converge on a good, reasonably simple spec for this misfeature.

@rocallahan

This comment has been minimized.

Copy link

rocallahan commented Oct 5, 2015

And we definitely don't want to extend innerText with additional features such as control over plaintext conversion. If you don't like what innerText does, roll your own from scratch. Many libraries already have.

@rocallahan

This comment has been minimized.

Copy link

rocallahan commented Oct 9, 2015

I have created a proposed spec plus testsuite here: https://github.com/rocallahan/innerText-spec

@cvrebert

This comment has been minimized.

Copy link
Member

cvrebert commented Nov 3, 2015

@foolip

This comment has been minimized.

Copy link
Member

foolip commented Nov 3, 2015

@rocallahan, I see that some tests in http://rocallahan.github.io/innerText-spec/getter-tests.html fail in Blink, did you reverse engineer Blink and would you happen to already know the main changes that would be needed to match your spec? It would be interesting to see how much work that would be, and if it seems risky or not.

@rocallahan

This comment has been minimized.

Copy link

rocallahan commented Nov 3, 2015

I did not reverse engineer Blink's implementation. I was aiming for something that was simple and logical while having a high degree of Blink compatibility as observed in my tests.

Looking at the Blink failures, I think most of them are pretty clearly bugs:

  • Tabs not converted to spaces like they are in CSS
  • Trailing whitespace before a hard line break not trimmed like they are in CSS
  • Newline characters in white-space:pre-line not preserved like they are in CSS
  • text-transform doesn't work with ::first-line (known Blink bug)
  • <option> content returned if the option is in <select size='1'> but not if the option is in <select size='2'>
  • <audio> and <video> being treated differently
  • <canvas> contents not being ignored
  • Asymmetric/inconsistent handling of blank lines around <p> elements
  • Whitespace not being trimmed inside display:inline-block block boundaries
  • Failure to handle CSS display table-related values

The main differences that are not clearly bugs are handling of <rp> (Blink has no special handling), overflow:hidden; width/height:0containers (Blink ignores their content), and floats/abs-pos elements (Blink doesn't put line-breaks around them). Those are debatable and changeable in the spec; changing the latter two would add complexity, and removing <rp> handling would simplify the spec a little bit.

Given the massive failure of interoperability between Blink and Edge in most of the edge cases, plus having looked at how innerText is generally being used, I'm confident that the risk of Blink changing to match the spec would be low for Web content.

@foolip

This comment has been minimized.

Copy link
Member

foolip commented Nov 3, 2015

So it looks like in Blink, innerText is implemented using a code path that's shared (with behavior flags) with a bunch of other contexts, including Range and Selection.

@hayatoito, @yosin-chromium, I think this would fall under the editing code in Blink, are either of you interested in this? Do you know if yoichio@ has a Github account?

@hayatoito

This comment has been minimized.

Copy link

hayatoito commented Nov 4, 2015

I'm pretty sure that @yosin-chromium is interested in this.

I'm also interested in how innerText would interact with Shadow DOM. We discussed this topic internally, however, since innerText doesn't have a spec, we couldn't define the behavior formally.

@yosinch

This comment has been minimized.

Copy link

yosinch commented Nov 4, 2015

Handling of CSS text-transform is also different among browsers:

  • Firefox: original text
  • Chrome: using transformed text
  • IE: original text

BTW, should we include text contents from :before/:after CSS pseudo element?
All browsers doesn't have quote characters from Q element, which uses
:before/:after.

Sample: https://jsfiddle.net/p5f0t0qp/3/

@rocallahan

This comment has been minimized.

Copy link

rocallahan commented Nov 4, 2015

So it looks like in Blink, innerText is implemented using a code path that's shared (with behavior flags) with a bunch of other contexts, including Range and Selection.

My intent is that once innerText is nailed down we spec Selection.toString() to behave the same way. I'm not aware of any existing Web-exposed Range method related to innerText but I've designed the spec (and our implementation) to be smoothly extensible to define Range.innerText and Selection.toString().

I'm also interested in how innerText would interact with Shadow DOM. We discussed this topic internally, however, since innerText doesn't have a spec, we couldn't define the behavior formally.

Currently I'm proposing that innerText stick to the normal document and ignore Shadow DOM. This matches existing innerText behavior for CSS generated content in IE and Chrome (it's ignored).

Handling of CSS text-transform is also different among browsers:
Firefox: original text

The innerText implementation which is in Firefox Nightly, and the spec, apply text-transform.

BTW, should we include text contents from :before/:after CSS pseudo element?

Existing implementations don't do this, so I made the spec not do this.

@hayatoito

This comment has been minimized.

Copy link

hayatoito commented Nov 4, 2015

Currently I'm proposing that innerText stick to the normal document and ignore Shadow DOM. This matches existing innerText behavior for CSS generated content in IE and Chrome (it's ignored).

Yeah, that sounds reasonable as the first step.

@yosin-chromium,
I remember that @yosin-chromium tried to change Blink's innerText so it includes the contents of Shadow DOM. What's the status of it? Have you changed your mind?

@yosinch

This comment has been minimized.

Copy link

yosinch commented Nov 4, 2015

I remember that @yosin-chromium tried to change Blink's innerText so it includes the contents of Shadow DOM. What's the status of it? Have you changed your mind?

We're measuring usage of shadow DOM with innerText, InnerTextWithShadowTree in Blink: https://www.chromestatus.com/metrics/feature/popularity#InnerTextWithShadowTree
Note: we also have counter SelectionToStringWithShadowTree. usage of it is also zero.

It seems there are no usage so far. I think we can omit shadow DOM support from Blink,

@hayatoito

This comment has been minimized.

Copy link

hayatoito commented Nov 4, 2015

I think it needs clarification:

  • Blink's innerText includes the contents of Shadow DOM, as of now, right?
  • However, the usage of innerText (with Shadow DOM) is zero
  • Thus it's okay to change the behavior of Blink so that innerText doesn't include Shadow DOM.

Is my understanding correct?

@yosinch

This comment has been minimized.

Copy link

yosinch commented Nov 4, 2015

@hayatoito
I think it needs clarification:

  • Blink's innerText includes the contents of Shadow DOM, as of now, right?
  • However, the usage of innerText (with Shadow DOM) is zero
  • Thus it's okay to change the behavior of Blink so that innerText doesn't include Shadow DOM.
    Is my understanding correct?

Correct. Thanks for clarification.

@yosinch

This comment has been minimized.

Copy link

yosinch commented Nov 4, 2015

In http://crbug.com/536137, one user wants to have innerText for response of XMLHttpRequest(),
which isn't rendered.

@foolip

This comment has been minimized.

Copy link
Member

foolip commented Nov 4, 2015

I'm not aware of any existing Web-exposed Range method related to innerText

Oh, I just found it used in our Range::text(), but it doesn't look like it's used anywhere web-exposed.

@rocallahan

This comment has been minimized.

Copy link

rocallahan commented Nov 4, 2015

In http://crbug.com/536137, one user wants to have innerText for response of XMLHttpRequest(),
which isn't rendered.

To avoid having to respecify and reimplement CSS features I've made the innerText spec depend heavily on having a CSS layout for the DOM nodes (as it's implemented in Webkit, Blink and now Gecko). I don't think it makes sense to change direction on that. So to apply innerText to the results of an XMLHttpRequest you'll have to put the nodes in a hidden <iframe> or something like that. I don't think that's a major problem.

@kangax

This comment has been minimized.

Copy link

kangax commented Nov 4, 2015

For those that haven't seen yet, here's innerText compat table — http://kangax.github.io/jstests/innerText/

/cc @yosin-chromium

I'll try to add few of Robert's tests as well.

@rocallahan

This comment has been minimized.

Copy link

rocallahan commented Nov 4, 2015

For those that haven't seen yet, here's innerText compat table — http://kangax.github.io/jstests/innerText/

FWIW I tried to incorporate everything in those tests into my tests.

@miketaylr

This comment has been minimized.

Copy link
Collaborator

miketaylr commented Dec 30, 2015

I think we can close this now, @rocallahan has written a spec over at http://rocallahan.github.io/innerText-spec/. Any issues on that can be raised against https://github.com/rocallahan/innerText-spec/issues. 🎈 🍰

@miketaylr miketaylr closed this Dec 30, 2015

@laukstein

This comment has been minimized.

Copy link

laukstein commented Dec 30, 2015

Firefox 46 passes all tests, landed in Firefox 45 https://bugzilla.mozilla.org/show_bug.cgi?id=264412
Chrome 49 has 67 fails https://code.google.com/p/chromium/issues/detail?id=573309

@Ms2ger

This comment has been minimized.

Copy link
Member

Ms2ger commented Jan 4, 2016

@karlcow

This comment has been minimized.

Copy link
Collaborator Author

karlcow commented Jan 6, 2016

Excellent \o/ Thanks 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment