spec innerText #5

karlcow · 2015-07-29T02:29:51Z

The innerText is used in a number of Web sites creating interop issues.

see https://bugzilla.mozilla.org/show_bug.cgi?id=264412

Ms2ger · 2015-07-29T07:20:02Z

When Aryeh wrote a spec, nobody was interested in implementing it; what makes you think this attempt will go any better?

karlcow · 2015-07-29T07:31:09Z

@Ms2ger Yeah one of these. It might not get any better. I usually don't think ;) I'm opening the issues on what is needing for the Web Compatibility. We will figure out if/how to spec it if/when implemented.

Currently it is mostly used as a replacement for textContent in all the bugs we can see.

karlcow · 2015-07-29T07:35:19Z

Some background:
http://perfectionkills.com/the-poor-misunderstood-innerText/#naive-spec
http://kangax.github.io/jstests/innerText/
https://www.w3.org/Bugs/Public/show_bug.cgi?id=13145
https://lists.w3.org/Archives/Public/public-webapps/2014JulSep/thread.html#msg580
http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2011-February/030179.html

webcompat/web-bugs#633
webcompat/web-bugs#1051
webcompat/web-bugs#1071
https://bugzilla.mozilla.org/show_bug.cgi?id=914252#c16 (Google calendar and Gmail)

foolip · 2015-07-29T09:25:51Z

I wrote some things in a duped bug:
https://www.w3.org/Bugs/Public/show_bug.cgi?id=25159

Usage is very high:
http://www.chromestatus.com/metrics/feature/timeline/popularity/213
http://www.chromestatus.com/metrics/feature/timeline/popularity/214

As implemented in Blink these attributes depend on layout, but it's not obvious that this is required for web compat, maybe a "semantic textContent" that does special things for a small number of elements like <br>, <p> and <div>.

However someone attempts to specify it, I'd be happy to investigate the difference between that and what's implemented in Blink.

foolip · 2015-07-29T09:27:21Z

Next up on the ladder of crazy would be depending on the computed style, which is still better than depending on the layout structure itself, IMHO.

myakura · 2015-07-31T00:58:01Z

fyi Anne said "To do this right, you would first have to standardize selection." http://discourse.wicg.io/t/standardizing-innertext/799/2

foolip · 2015-07-31T07:49:34Z

@annevk, what does that mean? Not http://w3c.github.io/selection-api/ I take it?

annevk · 2015-07-31T07:54:31Z

Yes, that. innerText is effectively selection.toString().

foolip · 2015-07-31T08:04:52Z

Oh, and that's undefined in the spec with a link to https://www.w3.org/Bugs/Public/show_bug.cgi?id=10583

So in other words, what we need is a common algorithm that takes a range (containing the context object for outerText and the children for innerText) and returns a string?

I guess setting innerText and outerText has no equivalent in the Selection API?

annevk · 2015-07-31T08:08:26Z

Setting operates on the DOM directly, with no regards for layout? Or is setting "magic" too?

foolip · 2015-07-31T09:50:38Z

For some odd reason, HTMLElement::setInnerText checks the layout to see if newlines should be preserved, surrounded with FIXMEs, in what should be equivalent to checking the white-space property on the computed style. I guess it's so that innerText works as "expected" on <pre>.

The outerText setter doesn't do anything like that, though.

annevk · 2015-07-31T09:59:23Z

Well if you just set textContent, <pre> works "as expected". It must be something else.

foolip · 2015-07-31T11:50:46Z

It's weird, in the normal case the text is converted to Text nodes and <br>s, but in the preserveNewline() case newline characters are used instead. Not sure why, the visual result is the same in either case. Test case:
http://software.hixie.ch/utilities/js/live-dom-viewer/saved/3578

stuartpb · 2015-08-02T09:44:33Z

I've drafted a proposal for a robust, CSS-based approach to the element -> text conversion both selection.toString() and innerText would use: http://discourse.wicg.io/t/css-plain-text-conversion/976

karlcow · 2015-08-02T23:09:10Z

Searching innerText in Chromium

defined as a String https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/core/dom/Element.h&q=innertext&sq=package:chromium&type=cs&l=398
OuterText is defined as identical to innerText https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/core/dom/Element.cpp&sq=package:chromium&type=cs&rcl=1438529374&l=2565

String Element::outerText()
{
    // Getting outerText is the same as getting innerText, only
    // setting is different. You would think this should get the plain
    // text for the outer range, but this is wrong, <br> for instance
    // would return different values for inner and outer text by such
    // a rule, but it doesn't in WinIE, and we want to match that.
    return innerText();
}

and finally the definition of innerText

String Element::innerText()
{
    // We need to update layout, since plainText uses line boxes in the layout tree.
    document().updateLayoutIgnorePendingStylesheets();

    if (!layoutObject())
        return textContent(true);

    return plainText(EphemeralRange::rangeOfContents(*this), TextIteratorForInnerText);
}

karlcow · 2015-08-02T23:14:44Z

Which is basically how it was defined in WebKit source code. It didn't change.
http://trac.webkit.org/browser/trunk/Source/WebCore/dom/Element.cpp#L2333

rocallahan · 2015-10-05T01:01:00Z

When Aryeh wrote a spec, nobody was interested in implementing it; what makes you think this attempt will go any better?

We have decided to implement innerText in Gecko, because people keep using it. So we want a spec.

stuartpb · 2015-10-05T01:12:09Z

@rocallahan I've started https://github.com/stuartpb/css-plaintext, based on what I've written in http://discourse.wicg.io/t/css-plain-text-conversion/976. It defines a number of new CSS properties, but UAs can choose to treat those properties as having their default values for a "simple" initial implementation.

rocallahan · 2015-10-05T01:14:42Z

Thanks. I don't want to try to describe innerText in terms of CSS, because we won't want to implement it that way. I prefer an algorithm like Aryeh's or Kangax's.

stuartpb · 2015-10-05T01:15:58Z

Well, if you look at implementations like Chromium's (as @karlcow posted above), it ends up having to go through the layout engine anyway, since any reliable, useful innerText values have to take CSS properties like display: none and visibility: hidden into account.

rocallahan · 2015-10-05T01:25:57Z

Yes. The algorithm will need to examine the computed style values of DOM nodes.

rocallahan · 2015-10-05T01:26:13Z

But we won't define any new CSS properties.

stuartpb · 2015-10-05T01:28:43Z

Well, the CSS properties css-plaintext proposes, which only impact plaintext conversion of elements, are there to ease the rational differences of opinion user agents currently have between each other regarding the processing of said computed styles, which page authors should be able to control (per the Extensible Web Manifesto).

rocallahan · 2015-10-05T01:51:43Z

EWM isn't an issue here. Web developers can easily polyfill their own version of innerText using existing standard APIs. In fact, since there is no one true way to convert HTML to plain text, innerText should not be in the Web platform at all. But since sites rely on it, we can't get rid of it, and we need to converge on a good, reasonably simple spec for this misfeature.

rocallahan · 2015-10-05T01:52:26Z

And we definitely don't want to extend innerText with additional features such as control over plaintext conversion. If you don't like what innerText does, roll your own from scratch. Many libraries already have.

rocallahan · 2015-10-09T05:55:55Z

I have created a proposed spec plus testsuite here: https://github.com/rocallahan/innerText-spec

cvrebert · 2015-11-03T06:19:35Z

Now implemented in Firefox: https://bugzilla.mozilla.org/show_bug.cgi?id=264412#c125

foolip · 2015-11-03T09:29:15Z

@rocallahan, I see that some tests in http://rocallahan.github.io/innerText-spec/getter-tests.html fail in Blink, did you reverse engineer Blink and would you happen to already know the main changes that would be needed to match your spec? It would be interesting to see how much work that would be, and if it seems risky or not.

rocallahan · 2015-11-03T09:55:01Z

I did not reverse engineer Blink's implementation. I was aiming for something that was simple and logical while having a high degree of Blink compatibility as observed in my tests.

Looking at the Blink failures, I think most of them are pretty clearly bugs:

Tabs not converted to spaces like they are in CSS
Trailing whitespace before a hard line break not trimmed like they are in CSS
Newline characters in white-space:pre-line not preserved like they are in CSS
text-transform doesn't work with ::first-line (known Blink bug)
<option> content returned if the option is in <select size='1'> but not if the option is in <select size='2'>
<audio> and <video> being treated differently
<canvas> contents not being ignored
Asymmetric/inconsistent handling of blank lines around <p> elements
Whitespace not being trimmed inside display:inline-block block boundaries
Failure to handle CSS display table-related values

The main differences that are not clearly bugs are handling of <rp> (Blink has no special handling), overflow:hidden; width/height:0containers (Blink ignores their content), and floats/abs-pos elements (Blink doesn't put line-breaks around them). Those are debatable and changeable in the spec; changing the latter two would add complexity, and removing <rp> handling would simplify the spec a little bit.

Given the massive failure of interoperability between Blink and Edge in most of the edge cases, plus having looked at how innerText is generally being used, I'm confident that the risk of Blink changing to match the spec would be low for Web content.

foolip · 2015-11-03T10:55:39Z

So it looks like in Blink, innerText is implemented using a code path that's shared (with behavior flags) with a bunch of other contexts, including Range and Selection.

@hayatoito, @yosin-chromium, I think this would fall under the editing code in Blink, are either of you interested in this? Do you know if yoichio@ has a Github account?

hayatoito · 2015-11-04T03:55:34Z

I'm pretty sure that @yosin-chromium is interested in this.

I'm also interested in how innerText would interact with Shadow DOM. We discussed this topic internally, however, since innerText doesn't have a spec, we couldn't define the behavior formally.

yosinch · 2015-11-04T04:16:01Z

Handling of CSS text-transform is also different among browsers:

Firefox: original text
Chrome: using transformed text
IE: original text

BTW, should we include text contents from :before/:after CSS pseudo element?
All browsers doesn't have quote characters from Q element, which uses
:before/:after.

Sample: https://jsfiddle.net/p5f0t0qp/3/

rocallahan · 2015-11-04T04:53:27Z

So it looks like in Blink, innerText is implemented using a code path that's shared (with behavior flags) with a bunch of other contexts, including Range and Selection.

My intent is that once innerText is nailed down we spec Selection.toString() to behave the same way. I'm not aware of any existing Web-exposed Range method related to innerText but I've designed the spec (and our implementation) to be smoothly extensible to define Range.innerText and Selection.toString().

I'm also interested in how innerText would interact with Shadow DOM. We discussed this topic internally, however, since innerText doesn't have a spec, we couldn't define the behavior formally.

Currently I'm proposing that innerText stick to the normal document and ignore Shadow DOM. This matches existing innerText behavior for CSS generated content in IE and Chrome (it's ignored).

Handling of CSS text-transform is also different among browsers:
Firefox: original text

The innerText implementation which is in Firefox Nightly, and the spec, apply text-transform.

BTW, should we include text contents from :before/:after CSS pseudo element?

Existing implementations don't do this, so I made the spec not do this.

hayatoito · 2015-11-04T05:36:15Z

Currently I'm proposing that innerText stick to the normal document and ignore Shadow DOM. This matches existing innerText behavior for CSS generated content in IE and Chrome (it's ignored).

Yeah, that sounds reasonable as the first step.

@yosin-chromium,
I remember that @yosin-chromium tried to change Blink's innerText so it includes the contents of Shadow DOM. What's the status of it? Have you changed your mind?

yosinch · 2015-11-04T05:46:40Z

I remember that @yosin-chromium tried to change Blink's innerText so it includes the contents of Shadow DOM. What's the status of it? Have you changed your mind?

We're measuring usage of shadow DOM with innerText, InnerTextWithShadowTree in Blink: https://www.chromestatus.com/metrics/feature/popularity#InnerTextWithShadowTree
Note: we also have counter SelectionToStringWithShadowTree. usage of it is also zero.

It seems there are no usage so far. I think we can omit shadow DOM support from Blink,

hayatoito · 2015-11-04T05:57:34Z

I think it needs clarification:

Blink's innerText includes the contents of Shadow DOM, as of now, right?
However, the usage of innerText (with Shadow DOM) is zero
Thus it's okay to change the behavior of Blink so that innerText doesn't include Shadow DOM.

Is my understanding correct?

yosinch · 2015-11-04T07:37:22Z

@hayatoito
I think it needs clarification:

Blink's innerText includes the contents of Shadow DOM, as of now, right?

However, the usage of innerText (with Shadow DOM) is zero

Thus it's okay to change the behavior of Blink so that innerText doesn't include Shadow DOM.
Is my understanding correct?

Correct. Thanks for clarification.

yosinch · 2015-11-04T07:58:14Z

In http://crbug.com/536137, one user wants to have innerText for response of XMLHttpRequest(),
which isn't rendered.

foolip · 2015-11-04T08:32:03Z

I'm not aware of any existing Web-exposed Range method related to innerText

Oh, I just found it used in our Range::text(), but it doesn't look like it's used anywhere web-exposed.

rocallahan · 2015-11-04T09:25:35Z

In http://crbug.com/536137, one user wants to have innerText for response of XMLHttpRequest(),
which isn't rendered.

To avoid having to respecify and reimplement CSS features I've made the innerText spec depend heavily on having a CSS layout for the DOM nodes (as it's implemented in Webkit, Blink and now Gecko). I don't think it makes sense to change direction on that. So to apply innerText to the results of an XMLHttpRequest you'll have to put the nodes in a hidden <iframe> or something like that. I don't think that's a major problem.

kangax · 2015-11-04T13:06:58Z

For those that haven't seen yet, here's innerText compat table — http://kangax.github.io/jstests/innerText/

/cc @yosin-chromium

I'll try to add few of Robert's tests as well.

rocallahan · 2015-11-04T20:53:35Z

For those that haven't seen yet, here's innerText compat table — http://kangax.github.io/jstests/innerText/

FWIW I tried to incorporate everything in those tests into my tests.

miketaylr · 2015-12-30T18:28:21Z

I think we can close this now, @rocallahan has written a spec over at http://rocallahan.github.io/innerText-spec/. Any issues on that can be raised against https://github.com/rocallahan/innerText-spec/issues. 🎈 🍰

laukstein · 2015-12-30T21:31:17Z

Firefox 46 passes all tests, landed in Firefox 45 https://bugzilla.mozilla.org/show_bug.cgi?id=264412
Chrome 49 has 67 fails https://code.google.com/p/chromium/issues/detail?id=573309

Ms2ger · 2016-01-04T14:28:39Z

FTR whatwg/html#465

karlcow · 2016-01-06T02:37:40Z

Excellent \o/ Thanks 🚀

karlcow · 2021-05-06T14:03:08Z

Firefox: Add support for element.outerText

foolip mentioned this issue Oct 5, 2015

What to do about features that only have one implementation and likely will forever? whatwg/html#209

Closed

miketaylr mentioned this issue Nov 5, 2015

Node.innerText supported since Firefox 45 Fyrd/caniuse#2065

Closed

miketaylr closed this as completed Dec 30, 2015

cvrebert mentioned this issue Dec 31, 2015

innerText: Cite draft spec instead of MSDN Fyrd/caniuse#2183

Merged

vsemozhetbyt mentioned this issue Jan 25, 2016

innerText implementation jsdom/jsdom#1245

Open

spec innerText #5

spec innerText #5

Comments

karlcow commented Jul 29, 2015

Ms2ger commented Jul 29, 2015

karlcow commented Jul 29, 2015

karlcow commented Jul 29, 2015

foolip commented Jul 29, 2015

foolip commented Jul 29, 2015

myakura commented Jul 31, 2015

foolip commented Jul 31, 2015

annevk commented Jul 31, 2015

foolip commented Jul 31, 2015

annevk commented Jul 31, 2015

foolip commented Jul 31, 2015

annevk commented Jul 31, 2015

foolip commented Jul 31, 2015

stuartpb commented Aug 2, 2015

karlcow commented Aug 2, 2015

karlcow commented Aug 2, 2015

rocallahan commented Oct 5, 2015

stuartpb commented Oct 5, 2015

rocallahan commented Oct 5, 2015

stuartpb commented Oct 5, 2015

rocallahan commented Oct 5, 2015

rocallahan commented Oct 5, 2015

stuartpb commented Oct 5, 2015

rocallahan commented Oct 5, 2015

rocallahan commented Oct 5, 2015

rocallahan commented Oct 9, 2015

cvrebert commented Nov 3, 2015

foolip commented Nov 3, 2015

rocallahan commented Nov 3, 2015

foolip commented Nov 3, 2015

hayatoito commented Nov 4, 2015

yosinch commented Nov 4, 2015

rocallahan commented Nov 4, 2015

hayatoito commented Nov 4, 2015

yosinch commented Nov 4, 2015

hayatoito commented Nov 4, 2015

yosinch commented Nov 4, 2015

yosinch commented Nov 4, 2015

foolip commented Nov 4, 2015

rocallahan commented Nov 4, 2015

kangax commented Nov 4, 2015

rocallahan commented Nov 4, 2015

miketaylr commented Dec 30, 2015

laukstein commented Dec 30, 2015

Ms2ger commented Jan 4, 2016

karlcow commented Jan 6, 2016

karlcow commented May 6, 2021