Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify whitespace handling when xml:space="default" #224

Closed
chrisb-bbcrd opened this issue Mar 27, 2017 · 8 comments
Closed

Clarify whitespace handling when xml:space="default" #224

chrisb-bbcrd opened this issue Mar 27, 2017 · 8 comments

Comments

@chrisb-bbcrd
Copy link

A feature of the fillLineGap example file (example 7) in IMSC1.0.1 has raised a question regarding the handling of whitespace, which @nigelmegitt has suggested I raise here to get clarification.

Example 7 has tabs at the end of some of its lines. The section I'm particularly interested is as follows (with tabs shown as '\t'):

[...]
<span style="spanStyle">jumps over the </span><span style="spanStyleSmall">lazy</span><span style="spanStyle"> dog</span><br/>\t\t\t\t
\t\t\t\t<span style="spanStyle">##Line gaps##</span>
[...]

Between the <br/> and the last span in this section we have an anonymous span with the following text:

"\t\t\t\t
\t\t\t\t"

As I read the specs, the linefeed-treatment and white-space-collapse rules apply as follows:

  1. Replace newline by space:

"\t\t\t\t \t\t\t\t"

  1. Collapse down the whitespace, leaving the initial tab:

"\t"

Then, when it comes to line building, the last line of the block will contain the final span ("##Line gaps##") preceded by the single remaining tab character. According to the suppress-at-line-break="auto" rules, only space (U+0020) characters have a value of 'suppress' applied to them. Thus, the white-space-treatment="ignore-if-surrounding-linefeed" rules won't remove the tab at the start of this final line, and the line is rendered with an indent.

If the sequence I've just outlined is the correct interpretation of the specs, then Fig. 1, showing how the lines should be rendered, is wrong, as the last line in each image should be indented.

The question is: Is this the correct interpretation of the specs?

@skynavga
Copy link
Contributor

skynavga commented Mar 27, 2017 via email

@chrisb-bbcrd
Copy link
Author

[...]

No. The sequence of tabs and the space from the newline are collapsed to (SPACE), and not (HT). See [1] for details. [1] https://www.w3.org/TR/xsl/#white-space-collapse

That doesn't seem to correspond with the rules in the referenced section:

Specifies, for any character flow object such that:

  • its character is classified as white space in XML, and
  • it is not, however, a U+000A (linefeed) character, and
  • the immediately preceding flow object is a character flow object with a character classified as white space in XML or the immediately following flow object is a linefeed,

that flow object shall not generate an area.

So whitespace characters (other than linefeed) don't generate an area if the immediately preceding character is another whitespace character. In the string above ("\t\t\t\t \t\t\t\t"), that applies to every character except the first tab; therefore what remains after this rule is applied is a single tab, is it not?

@skynavga
Copy link
Contributor

skynavga commented Mar 28, 2017 via email

@palemieux
Copy link
Contributor

Is this the correct interpretation of the specs?

Looks like it. I suggest filing a bug against Example 7 to remove tabs.

@nigelmegitt
Copy link
Contributor

This is scheduled for discussion in tomorrow's TTWG call. I suspect we probably need a bug against TTML1 and TTML2 which are the same in this respect, if this behaviour is not actually what we want.

Alternatively we may reasonably conclude that what we have is deterministic and that the only improvement needed is some informative explanation to warn people about this particular scenario.

@palemieux
Copy link
Contributor

palemieux commented Mar 29, 2017 via email

@nigelmegitt
Copy link
Contributor

Meeting 2017-03-30: All agreed to fix the example to remove the tabs.

There appears to be a discrepancy between the spec detail and what implementations do; it is @skynavga 's view that implementations such as Antenna House and FOP would not present anything for the first tab, so if this differs from what the specs say then something needs to be clarified in TTML1 and TTML2. Having investigated during the meeting, the origin of the XSL-FO attributes was in CSS2 yet appear to differ from the CSS2 white-space: normal; property.

Aside from fixing the example in IMSC @skynavga will raise issues on TTML1 and TTML2 and communicate out to other experts who may be able to assist in the correct interpretation. It could be that some changes are needed to clarify this in TTML.

@palemieux
Copy link
Contributor

Filed #225

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants