-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify whitespace handling when xml:space="default" #224
Comments
On Mon, Mar 27, 2017 at 8:17 AM, Chris Bass ***@***.***> wrote:
A feature of the fillLineGap example file (example 7) in IMSC1.0.1 has
raised a question regarding the handling of whitespace, which @nigelmegitt
<https://github.com/nigelmegitt> has suggested I raise here to get
clarification.
Example 7 has tabs at the end of some of its lines. The section I'm
particularly interested is as follows (with tabs shown as '\t'):
[...]
<span style="spanStyle">jumps over the </span><span style="spanStyleSmall">lazy</span><span style="spanStyle"> dog</span><br/>\t\t\t\t
\t\t\t\t<span style="spanStyle">##Line gaps##</span>
[...]
Between the <br/> and the last span in this section we have an anonymous
span with the following text:
"\t\t\t\t
\t\t\t\t"
As I read the specs, the linefeed-treatment and white-space-collapse
rules apply as follows:
1. Replace newline by space:
"\t\t\t\t \t\t\t\t"
1. Collapse down the whitespace, leaving the initial tab:
"\t"
Then, when it comes to line building, the last line of the block will
contain the final span ("##Line gaps##") preceded by the single remaining
tab character. According to the suppress-at-line-break="auto" rules, only
space (U+0020) characters have a value of 'suppress' applied to them. Thus,
the white-space-treatment="ignore-if-surrounding-linefeed" rules won't
remove the tab at the start of this final line, and the line is rendered
with an indent.
If the sequence I've just outlined is the correct interpretation of the
specs, then Fig. 1, showing how the lines should be rendered, is wrong, as
the last line in each image should be indented.
The question is: Is this the correct interpretation of the specs?
No. The sequence of tabs and the space from the newline are collapsed to
  (SPACE), and not 	 (HT). See [1] for details.
[1] https://www.w3.org/TR/xsl/#white-space-collapse
… —
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#224>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAXCb6T7JyzWK2xdzt5xiC3427xonAUUks5rp8TvgaJpZM4MqVIF>
.
|
[...]
That doesn't seem to correspond with the rules in the referenced section:
So whitespace characters (other than linefeed) don't generate an area if the immediately preceding character is another whitespace character. In the string above ( |
On Tue, Mar 28, 2017 at 8:35 AM, Chris Bass ***@***.***> wrote:
[...]
No. The sequence of tabs and the space from the newline are collapsed to
(SPACE), and not (HT). See [1] for details. [1]
https://www.w3.org/TR/xsl/#white-space-collapse
That doesn't seem to correspond with the rules in the referenced section:
Specifies, for any character flow object such that:
- its character is classified as white space in XML, and
- it is not, however, a U+000A (linefeed) character, and
- the immediately preceding flow object is a character flow object
with a character classified as white space in XML or the immediately
following flow object is a linefeed,
that flow object shall not generate an area.
So whitespace characters (other than linefeed) don't generate an area if
the immediately preceding character is another whitespace character. In the
string above ("\t\t\t\t \t\t\t\t"), that applies to every character
except the first tab; therefore what remains after this rule is applied is
a single tab, is it not?
ok, I understand your concern now, so the group will discuss any necessary
follow on action; nonetheless, I can say definitively that the intended
behavior is that the first whitespace character is mapped to SPACE except
in the possible case that xml:space="preserve", but even then, we may need
to define such a mapping (to a single SPACE);
at present, there is no expectation that a TAB in significant whitespace
should cause indentation behavior or that multiple SPACE characters be
substituted for the TAB;
… —
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#224 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAXCb85hFppvqwjHo3AxWRmHY_OuCvicks5rqRqmgaJpZM4MqVIF>
.
|
Looks like it. I suggest filing a bug against Example 7 to remove tabs. |
This is scheduled for discussion in tomorrow's TTWG call. I suspect we probably need a bug against TTML1 and TTML2 which are the same in this respect, if this behaviour is not actually what we want. Alternatively we may reasonably conclude that what we have is deterministic and that the only improvement needed is some informative explanation to warn people about this particular scenario. |
I will not be able to attend tomorrow's call. My take so far is that the
algorithm is deterministic and unambiguous. The bug seems to be with the
example, which introduces tabs.
…On Wed, Mar 29, 2017 at 20:02 Nigel Megitt ***@***.***> wrote:
This is scheduled for discussion in tomorrow's TTWG call. I suspect we
probably need a bug against TTML1 and TTML2 which are the same in this
respect, if this behaviour is not actually what we want.
Alternatively we may reasonably conclude that what we have is
deterministic and that the only improvement needed is some informative
explanation to warn people about this particular scenario.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#224 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AEpUtiRafgk7vSl9r5erCFas7ZzW_I8Qks5rqh4UgaJpZM4MqVIF>
.
|
Meeting 2017-03-30: All agreed to fix the example to remove the tabs. There appears to be a discrepancy between the spec detail and what implementations do; it is @skynavga 's view that implementations such as Antenna House and FOP would not present anything for the first tab, so if this differs from what the specs say then something needs to be clarified in TTML1 and TTML2. Having investigated during the meeting, the origin of the XSL-FO attributes was in CSS2 yet appear to differ from the CSS2 Aside from fixing the example in IMSC @skynavga will raise issues on TTML1 and TTML2 and communicate out to other experts who may be able to assist in the correct interpretation. It could be that some changes are needed to clarify this in TTML. |
Filed #225 |
A feature of the fillLineGap example file (example 7) in IMSC1.0.1 has raised a question regarding the handling of whitespace, which @nigelmegitt has suggested I raise here to get clarification.
Example 7 has tabs at the end of some of its lines. The section I'm particularly interested is as follows (with tabs shown as '\t'):
Between the
<br/>
and the last span in this section we have an anonymous span with the following text:As I read the specs, the
linefeed-treatment
andwhite-space-collapse
rules apply as follows:"\t\t\t\t \t\t\t\t"
"\t"
Then, when it comes to line building, the last line of the block will contain the final span ("##Line gaps##") preceded by the single remaining tab character. According to the
suppress-at-line-break="auto"
rules, only space (U+0020) characters have a value of 'suppress' applied to them. Thus, thewhite-space-treatment="ignore-if-surrounding-linefeed"
rules won't remove the tab at the start of this final line, and the line is rendered with an indent.If the sequence I've just outlined is the correct interpretation of the specs, then Fig. 1, showing how the lines should be rendered, is wrong, as the last line in each image should be indented.
The question is: Is this the correct interpretation of the specs?
The text was updated successfully, but these errors were encountered: