Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update getNumberOfChars() definition #200

Closed
jarek-foksa opened this issue Jul 12, 2016 · 17 comments
Closed

Update getNumberOfChars() definition #200

jarek-foksa opened this issue Jul 12, 2016 · 17 comments

Comments

@jarek-foksa
Copy link

jarek-foksa commented Jul 12, 2016

The current definition of getNumberOfChars() method is as follows:

The getNumberOfChars method is used to determine the total number of characters available for rendering within the current element. When getNumberOfChars() is called, the following steps are run:

  1. Let text be the result that would be obtained from getting the textContent IDL attribute on the current element.
  2. Return the length of text.

This does not look right to me. ThetextContent includes characters inside never-rendered elements such as <title> and <desc> or <tspan style="display: none">. Those characters should not be counted.

I would first spec SVGElement.innerText in a similar fashion to HTMLElement.innerText and then I would update the definition of getNumberOfChars() to rely on innerText rather than textContent.

@jarek-foksa
Copy link
Author

Chrome and Safari do not seem to follow the current definition, I haven't checked the other user agents.

@AmeliaBR
Copy link
Contributor

Good catch, @jarek-foksa.

We do need to distinguish between title/desc text and actual rendered text content. I'm less convinced whether a JS method like getNumberOfChars should reflect CSS display: none; I can see use-cases for both options.

At this point, we can't rely on innerText for anything normative, since it doesn't have a normative specification.

Also, if we do re-write this method in a way that reflects rendered text as currently styled, I would want to write it in a way that would be inclusive of CSS generated text (as discussed in #125).

@jarek-foksa
Copy link
Author

jarek-foksa commented Jul 12, 2016

@AmeliaBR If the goal is to follow the existing implementations, then I would just make the spec state the only characters to be counted are those that have corresponding glyphs rendered on the screen.

That would exclude any characters belonging to never-rendered elements (<title>, <desc>), renderable elements hidden with CSS (<tspan style="display: none;">) and renderable elements disconnected from the current document / render tree.

@nikosandronikos
Copy link
Member

nikosandronikos commented Jul 13, 2016

Ok, so here's a test:
https://jsfiddle.net/dodgeyhack/Lbhtyh1t/
(I can't guarantee the arabic code points will render and combine correctly on your browser)

Some findings:

  • Code points are counted, not glyphs.
  • textContent.length on HTML and SVG content DOES include display:none elements.
  • getNumberOfChars() does NOT include display:none
  • textContent.length on SVG content DOES include title and desc elements..
  • getNumberOfChars() does NOT include title and desc elements
  • Safari, Chrome, and Firefox on OS X or MacOS or whatever it's called today are interoperable

@jarek-foksa
Copy link
Author

jarek-foksa commented Jul 13, 2016

  • For getNumberOfChars() and textContent.length, title and desc are NOT included

From my testing on Chrome, Safari and Firefox, textContent.length DOES indlude <title>, <desc> and <tspan style="display: none"> while getNumberOfChars() DOES NOT include them: https://jsfiddle.net/Lbhtyh1t/12/

@nikosandronikos
Copy link
Member

@jarek-foksa Yep, you're right. I fixed my test and updated my comment above.

@AmeliaBR
Copy link
Contributor

The spec reference to .textContent is new in SVG 2, so let's get rid of that and focus on the prose (# of renderable characters) and existing implementations.

The SVG 1.1 definition was:

Returns the total number of characters available for rendering within the current element, which includes referenced characters from ‘tref’ reference, regardless of whether they will be rendered. Effectively, this is equivalent to the length of the Node::textContent attribute from DOM Level 3 Core ([DOM3], section 1.4), if that attribute also expanded ‘tref’ elements.

Removing the <tref> mentions and focusing on the normative text instead of the incorrect "Effectively..." sentence, we get:

Returns the total number of characters available for rendering within the current element, regardless of whether they will be rendered.

An algorithmic definition thereof would look something like:

Returns the sum of the renderable text length for all child nodes of this element, where the renderable text length is as follows:

  • For a text node, the length of its text content, after normalizing whitespace according to the value of CSS white-space property on this element;
  • For a text content element, <a> element, or SVGUnknownElement, the result of recursively calling this algorithm on that element;
  • For all other elements or nodes, 0.

However, since you need to compute CSS styles for white-space, there's no additional performance or utility impact of also making the function depend on display: none, to match the current behavior.

That could be introduced into the recursive algorithm by either checking whether this element is rendered (and always returning 0 if it isn't), or by looking explicitly at the display property for each child element. I'd prefer the second approach, so that you can still measure the usable text length of an element even when it's inside a <defs> or a <symbol> or such.

However, because there is a dependency on CSS, this algorithm would not be defined for an element that is currently detached from a document, unless we explicitly defined it to use the initial values of the properties.

Also, on the question of display: none, we probably need more explicit guidance on how character-by-character positioning attributes are assigned from a parent element if a nested element is not displayed. The only guidance from SVG 1.1 was that elements with display: none "do not affect text layout". SVG 2 explicitly defines "addressable characters" to exclude those "discarded during layout", but it doesn't explicitly mention display: none.

E.g., for something like the following JSBin demo:

<text x="10 20 30 40 50">12<tspan style="display: none">34</tspan>5</text>

Chrome & Edge render it to match:

<text x="10 20 30">125</text>

But Firefox renders it to match:

<text x="10 20 50">125</text>

Assuming the Chrome/Edge behavior is deemed correct, so that display: none content is removed from per-character positioning, then it makes sense that those characters are also removed from getNumberOfChars().

_But_ the same behavior would also have to be used for all the other character-count methods. E.g., if getNumberOfChars() returns 3, getExtentOfChar(2) better not return an empty box from one of the non-displayed characters.

That behavior, of only counting, positioning, and querying the final rendered text after CSS processing, is consistent with the approach to whitespace (which is or isn't included in the count, based on current collapsing rules) and keeps us open to allowing ::before/::after content. In fact, if we define all SVG text positioning and interfaces to use the rendered text string, I see no reason not to allow CSS generated content now (as @tabatkins would support and @Tavmjong would disagree with, according to the last discussion in #125).

(Aside, this will get really fun when you throw a text-transform: full-width style on it. That would change the byte count of the rendered string!)

For backwards consistency (and consistency with existing implementations), text length must still be measured in UTF-16 code points, equivalent to String.length in JavaScript. Once ECMAScript support for proper counting of mutli-byte characters is stable, we'll probably want to introduce parallel versions of the SVG methods that use those new rules, and an attribute or property on the elements to switch how x/y/dx/dy/rotate are counted.

@Tavmjong
Copy link
Contributor

General comment about character-by-character positioning attributes: The goal should be to have predictability. The Firefox rendering of the test case meets this goal. You don't want characters to be jumping around if you turn on/off the display of a tspan. The "addressable characters" is meant to describe white space that has been discarded. That could be made more clear in the definition.

@AmeliaBR
Copy link
Contributor

You don't want characters to be jumping around if you turn on/off the display of a tspan.

But that's exactly what happens in normal text layout. You remove some text, and the rest moves over to replace it.

If you don't want the other characters to move, you should be using visibility: hidden, not display: none.

@AmeliaBR
Copy link
Contributor

See also this demo of hacking support for conditional line breaks, which is broken in Firefox because x/y attributes on the un-rendered <tspan> itself nonetheless affect subsequent characters.

@Tavmjong
Copy link
Contributor

@AmeliaBR Your last example could be handled by disabling dx/dy attributes in un-rendered <tspan> elements... but characters inside the <tspan> would still count in assigning the values from a wrapping <text> or <tspan> element:

<text dx="5 3 0 0 5 3">AB<tspan dx="2 2">CD</tspan>EF<text>

This way you get predictability in setting the spacing between E & F without having to wrap them in a new <tspan>.

@Tavmjong
Copy link
Contributor

This issue should be run by @heycam .

@AmeliaBR
Copy link
Contributor

I think the whitespace issue (the number of characters can change based on computed CSS style) makes "predictable" a bit of a lost cause. As mentioned previously, I also would like to keep the option open to have CSS-generated text for auto-numbering tick marks and so on, and I don't see how that's possible if coordinates are assigned to characters before applying CSS modifications.

That said, my top concerns are that (1) all behavior is clearly defined and (2) attributes that are declared directly on a display-none element do not affect layout.

@nikosandronikos
Copy link
Member

@AmeliaBR Could you review those commits please? I'm not so familiar with the text chapter.

AmeliaBR added a commit that referenced this issue Aug 1, 2016
Including some final edits with respect to issues #200 and #210, as well as general typo fixes, formatting, and clarification.

Changes the definition and example of `shape-margin` to reflect the fact that this is a property of the excluded element, not the element from which the exclusion is subtracted.  (attn @Tavmjong)

Adds a note about ::before and ::after being possible in the future (see #125).

Makes ::selection and :focus styles a requirement for interactive user agents, while leaving open how that is implemented.  (Most web browsers use background color as part of a selection style, and we don't have a way of specifying background color on a span of text!)

Not the full edit I'd like to have done, but hopefully caught most normative issues.
@nikosandronikos
Copy link
Member

Thanks!

@nikosandronikos
Copy link
Member

Milestone achieved =)

@AmeliaBR
Copy link
Contributor

AmeliaBR commented Aug 1, 2016

@nikosandronikos

I did an end-to-end review of the chapter, found a couple other places which needed adjustments based on this decision,

For the getNumberOfChars method in particular, I re-factored the algorithm to avoid confusing DOM "text nodes" with "SVG text content elements". I also switched the variable to count instead of length to be consistent with the other algorithms in the section. My version is:

  1. Let node be the element or node upon which this method was called
  2. If node is a DOM text node, return the length of the text content of node, after normalizing whitespace according to the value of the 'white-space' property on its parent element.
  3. If node is an Element:
    • If the element is not rendered (e.g., because the 'display' property has the used value none), then return 0;
    • Otherwise, set count to 0, and for each child of node:
      • Recursively call this algorithm and add the returned value to count.
      Return count.
  4. For all other node types (e.g., DOM comments), return 0.

I was going to suggest that you review & close, but it seems you've already got there!

Now I just have to commit a fix for some markup errors I discovered when previewing this post...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants