Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[css-gcpm][css-content] "string-set" on elements with no boxes: How to determine which page the assignment is on #8404

Open
bernhardf-ro opened this issue Feb 6, 2023 · 4 comments
Labels
css-gcpm-3 Current Work

Comments

@bernhardf-ro
Copy link

The string-set property applies to "all elements, but not pseudo-elements", which includes elements with no boxes (e.g. via display: none).
Regarding such elements both specifications say ( after https://drafts.csswg.org/css-content-3/#valdef-string-first-except ):
"The content values of named strings are assigned at the point when the content box of the element is first created (or would have been created if the element’s display value is none)."

Determining where a box "would have been" is rather complex, as that could be influenced by it's height (near a page break) which in turn could be influenced by it's content.
Adding that the hypothetical box should be considered to have a block-size of 0 would solve this for simple cases.

However in more complex contexts this would still not be easy to determine, e.g. when the order property is involved.
We would propose to go backwards through the DOM from the element until an element is found that has boxes in the content flow.
The page that the last of those boxes is on is the page of the assignment.
If coordinates on the page are also required they should be at the block-end of that box.
However, we would prefer that inside the page the order of assignments is strictly the DOM order.

@faceless2
Copy link

Agree this is complex, and definitely agree going back through the DOM makes more sense rather than to try and guess what-might-have-been based on layout.

I created a test to see what current implementations do:

<!DOCTYPE HTML>
<html>
<head>
<style>
@page {
    margin: 72px;
    @top-center {
        content: "String is "  string(foo, first);
    }
}
.set {
    display: none;
    string-set: foo attr(string);
}
h1 {
    break-after: page;
}
</style>
</head>
<body>
 <h1>Page 1</h1>
 <div class="set" string="string1">string1</div>
 <h1>Page 2</h1>
 <div class="set" string="string2">string2</div>
 <h1>Page 3</h1>
 <div class="set" string="string3">string3</div>
</body>
</html>

If I understand your proposal correctly you'd expect to see "string1, string2, string3" at the top of the three pages, correct? Testing with the engines at https://printcss.live gives:

  • AH Formatter is "string1, string1, string2"
  • Prince is "blank, string1, string1"
  • BFO (our dev build) and Vivliostyle are both "blank, string1, string2"
  • Weasyprint, paged.js and (currently) PDFReactor don't seem to support strings on display:none elements.

other findings

  • the results are unchanged if I change h1 to break-before: page
  • if I move the string-set element before the h1 then AH, BFO and Vivliostyle all give "string1, string2, string3". Prince gives "string1, string1, string1"
  • if I change the string-set element to display:block then AH Formatter, BFO, PDFReactor, Prince, Vivliostyle and Weasyprint all give the correct output, which is "blank, string1, string2"

So of the three tools trying to do this properly (Prince seems a bit lost here), the only ambiguity is where the string-set is immediately followed by a page break. In this situation, both Vivliostyle and BFO are moving the string-set to the top of the next page, rather than the bottom of the previous one as in your proposal. break-before and break-after are ignored.

So that's the data. Based on this I think there are four options.

  1. go with your proposal; this means a change for BFO and Vivliostyle
  2. go with the behaviour of BFO and Vivliostyle; I presume this is going to mean a change to your current work.
  3. Try and consider the behaviour of break-before and break-after in this decision - I fear this might be over-complicating things for little benefit.
  4. when string-set falls on a page boundary like this, leave it undefined as to whether it is considered to be on the previous or next page, so long as it follows the DOM order.

I'm obviously leaning towards (2), not just because that's what we've done but because if string-set is used, it seems like this would be done at the start of any section relating to that string: it should stick to content that follows it.

<section>
 <div style="display:none; string-set: chapter "Widget"></div>
 ... content relating to widgets ...
</section>

<section>
 ... content relating to widgets ...
 <div style="display:none; string-set: chapter "Widget"></div>
</section>

The first of those two seems more likely to me, and the existing css-gcpm test at https://github.com/web-platform-tests/wpt/blob/master/css/css-gcpm/string-set-011.html seems to agree.

cc @MurakamiShinyu for Vivliostyle's opinion.


Related question: The definition for string(nnn, start) states that "If the element is the first element on the page, the value of the first assignment is used". Can an element that is display:none ever be the first element on the page?

@MurakamiShinyu
Copy link
Collaborator

I agree with @faceless2, (2) is most preferable. Or (4) also makes sense, because page breaking occurs between elements with boxes, and the element with no boxes at the page boundary comes with the previous or the next element, that would depend on implementation.

Related question: The definition for string(nnn, start) states that "If the element is the first element on the page, the value of the first assignment is used". Can an element that is display:none ever be the first element on the page?

In HTML documents, the root <html> element is the first element on the first page, and also <body> and the first child element of <body>, and its first child element etc. can be a first element on the first page. The <title> element which is display:none and precedes the <body> element should also be a first element on the first page, for title { string-set: title content() } to work.

@MurakamiShinyu
Copy link
Collaborator

The wording "If the element is the first element on the page," may be misleading. In my understanding, "the first element on the page" here should not be a single element. It was "if the element begins the page" in the old draft:

CSS GCPM, 24 September 2013 Editor’s Draft:
https://hg.csswg.org/drafts/raw-file/6a5c44d11c2b/css-gcpm/Overview.html#using-named-strings

  • ‘start’: the value of the first assignment on the page is used if the element begins the page or the named string has not been assigned a value. Otherwise, the named string's entry value is used.

The wording was rewritten from "begins the page" to "is the first element on the page" but no spec change would be intended. See the "Changes" section of the CSS GCPM:

https://drafts.csswg.org/css-gcpm-3/#changes

Changes since the 24 September 2013 Editor’s Draft:

  • The spec has a new editor.
  • All text and examples rewritten.

Also, the wording about the display: none case was rewritten from

… If the element does not have any content boxes (e.g., if ‘display: none’ is set), the assignment is considered to take place on the page where the first content box would have occured if the element had been in the normal flow.

to

The content values of named strings are assigned at the point when the content box of the element is first created (or would have been created if the element’s display value is none).

I found a related discussion on the www-style ML when the first wording was written:

Does string-set work on elements or on boxes?

It may be helpful to understand the intent of the spec. At that time there was probably no consideration for the case that the element with no boxes falls on a page boundary.

@bernhardf-ro
Copy link
Author

We agree that option 2 is probably more useful to authors than option 1. It would be the same as our previous suggestion, but searching forward and picking the first box of the found element, right?

We don't consider options 3 or 4 viable. However, we would like to propose one, that would cover both examples by combing options 1 and 2:

  1. First go up the DOM tree until you find an element that has boxes.
    If it has boxes on the earlier of the 2 prospective pages but not the later one, search backwards, like in option 1.
    If it has boxes on the later page but not the previous one, search forwards, like in option 2.
    Otherwise use the default direction (probably forwards, but would have to be discussed).

This way well-structured documents should work as expected in a vast majority of cases. Also it would implicitly eliminate edge-cases of options 1 and 2, where the assignment could end up before the first or after the last page, assuming that one of the two ifs is automatically satisfied if one of the two pages does not exist.

On the topic of start: For PDFreactor we consider a box to be the beginning/first of a page when it is the first fragment of it's element and is in the first branch of the box tree inside it's page's content (plus some details like ignoring siblings that collapse through). That same condition could be applied to the box found by searching as in options 2 or 5.

@fantasai fantasai added the css-gcpm-3 Current Work label Jun 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
css-gcpm-3 Current Work
Projects
None yet
Development

No branches or pull requests

4 participants