Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interoperable handling of invalid markup #136

Closed
fred-wang opened this issue Feb 19, 2019 · 34 comments
Closed

Interoperable handling of invalid markup #136

fred-wang opened this issue Feb 19, 2019 · 34 comments

Comments

@fred-wang
Copy link
Contributor

https://mathml-refresh.github.io/mathml/chapter2.html#interf.error

If a MathML-input-conformant application receives input containing one or more elements with an illegal number or type of attributes or child schemata, it should nonetheless attempt to render all the input in an intelligible way, i.e., to render normally those parts of the input that were valid, and to render error messages (rendered as if enclosed in an merror element) in place of invalid expressions.

Gecko renders (not very useful) "invalid-markup" message
WebKit just layout a 0x0 box for the invalid subtree

I think the latter is more efficient and less code so I'd prefer it. Probably if we want to log errors, it should be in the console instead.

@davidcarlisle
Copy link
Collaborator

I certainly would not want error reporting to be placing any constraints on browsers here, so I'd agree to any wording here that tells authors not to produce bad markup, but basically authorises browsers to do whatever they find convenient if there is bad markup. If making a zero sized subtree works (could be made to work) for all the current implementations, specifying that to get interoperable behaviour would also be Ok I think.

@fred-wang fred-wang changed the title Handling of invalid markup interoperability Interoperable handling of invalid markup Feb 19, 2019
@davidcarlisle davidcarlisle transferred this issue from w3c/mathml-core Feb 20, 2019
@fred-wang
Copy link
Contributor Author

Adding mathml4 label too, as we might want to adjust the behavior.

@fred-wang
Copy link
Contributor Author

cc @rwlbuis

@fred-wang
Copy link
Contributor Author

Just for the record, there is a third option which is probably a bit more complicate to implement/describe but more aligned with to what people expect with CSS / HTML5: Add anonymous <mrow> elements (e.g. one for numerator and one for denominator for mfrac) to ensure that MathML elements have the right number of arguments.

@fred-wang
Copy link
Contributor Author

Quoting @bfgeek

See:
https://www.software.hixie.ch/utilities/js/live-dom-viewer/?saved=6704
... for a simple example.

https://www.software.hixie.ch/utilities/js/live-dom-viewer/?saved=6705
... for a more complex example.

The above example indicates that the implementations are trying to do something funny with various inputs which an implementation thinks is invalid.

I suggest removing these, and instead making sure that all the MathML algorithms can deal with the edge cases which developers might feed them.

(an alternative would be to strictly define, with algorithms, what type of DOM input is required, and the validation steps, however this would be prone to bugs, and compatibility issues).

This the a bit of a meta issue, however would need lots of tests with the "edge" cases for all of the different layout algorithms within MathML.

@fred-wang
Copy link
Contributor Author

I'm not sure there is a canonical way to "fix" invalid markup (Ignore extra children? Add empty anonymous? Wrap a subset of children into anonymous containers?). My preference would be to align on WebKit's behavior as it's the easiest to specify/test/implement. This is already how we (at Igalia) implement it our Chromium branch and we would only have to fix Firefox. Basically, my proposal would be:

  1. Add a section about invalid markup in MathML Core (maybe import from MathML 4 full), define the invalid constructions (essentially elements that don't have the expected number of children). There is probably an exception do make for mtable, until we clarify what the table layout is.
  2. Say that invalid markup has intrinsize size of 0, the layout results as an empty 0x0 box and no painting.
  3. In each section where we define algorithm and intrinsic size (Modify reference point for "shape a stretchy glyph" #54 etc), say that we do an early return with the result specified in (2)
  4. Write testharness tests with all markup of the form <span style="display: inline-block"><math>INVALID MARKUP</math></span> as defined in (1) and ensure that the span has size 0 (intrinsic size) and that the "INVALID MARKUP" subtree has size 0x0. Write reftests checking that they don't paint anything (e.g. for fractions, roots...). Probably more tests involving valid and invalid subtrees. I believe we already have tests for invalid markup in Gecko and WebKit that we could "import". However, they would need to be tweaked a bit probably.

Putting this on the agenda of next meeting.

@bfgeek
Copy link

bfgeek commented Feb 28, 2019

WebKits implementation causes issues in other areas of the browser. For example the "invalid" is still accessible from VoiceOver, etc.

This type of "easy" fixup produces technical debt in the codebase, and complexity later on which I'd be strongly opposed to adding in chromium.

Having said that one guiding principle that CSS usually follows is that it should attempt to display all content. This is probably a good principle for MathML to also follow.

I think that for display types which can only accept up to "N" children, inserting any additional children in an anonymous mrow type would be acceptable.

E.g.

<mfrac><mi>a</mi><mi>b</mi><mi>c</mi></mfrac>

would render as:

<mfrac><mi>a</mi><mrow><mi>b</mi><mi>c</mi></mrow></mfrac>

@emilio @heycam

@bfgeek
Copy link

bfgeek commented Feb 28, 2019

@tabatkins also might have some good thoughts.

@fred-wang
Copy link
Contributor Author

So the alternative proposal would be:

  1. Add a section about invalid markup in MathML Core defining the invalid constructions. Essentially elements they have n children while they expect N != n children. If n < N, they append N - n anonymous empty mrow's. If n > N, they wrap the last N - n + 1 children into an anonymous mrow. An alternative is to always generate N anonymous children but that would lead to bigger render/layout trees and complexity for such elements (even when they are valid) and so is less optimal. Again, there is an exception to make for mtable, were the fixup is instead using anonymous mtr and mtd like in HTML.
  2. In each section where we define elements, mention the expected number of children if appropriate (import from MathML 4 full) and link to the general section about how to fix it.
  3. Write static reftests checking that invalid markup render the same as their "fixed" version. Also write dynamic versions where n varies between 0 and (at least) N + 2.
  4. Implement the behavior for WebKit, Gecko and Chromium.

My main concern (in addition to adding more work) is how dynamic changes will be handled. In the past, WebKit generated some anonymous elements (sometimes two levels of nested anonymous e.g. for scripted elements) in the Renderer/Layout classes and set/remove anonymous style on them: That caused at best inconsistencies after dynamic changes and at worst asserts/crashes. Hopefully, this proposal is a bit safer and we could maybe even handle the Render/Layout tree fixup & reorganization from the DOM classes.

However, I agree that the proposed behavior would be more aligned with HTML5 / CSS (adding the label now btw) and more natural for users, so it's probably worth the added complexity/work.

I don't really like the alternative of "making sure that all the MathML algorithms can deal with the edge cases" as that makes the algorithm more complex and harder to read. For example, the advantage of WebKit's current implementation is that once we know an element is valid we can just call the children and associated variable numerator/denominator/base/superscript etc and that makes the algorithm (which is already relatively complex when we follow all TeX / Open Type MATH metrics) much easier to read and understand.

@fred-wang
Copy link
Contributor Author

An alternative is to always generate N anonymous children but that would lead to bigger render/layout trees and complexity for such elements (even when they are valid) and so is less optimal.

Since VoiceOver was mentioned above, I however have to say that adding such anonymous children could help accessibility ( w3c/mathml#9 ). Typically,

mfrac
  |_mfrac
  |_mfrac

would become

mfrac
  |_mrow
    |__frac
  |_mrow
    |__frac

and then we can add roles numerator/denominator on the anonymous mrow's while still keeping roles fraction on the mfrac. Otherwise, this fixup would have to be done in the a11y tree.

@NSoiffer
Copy link
Contributor

NSoiffer commented Mar 3, 2019

Doesn't HTML5's parsing spec already say what should happen for syntax errors such as missing close tags?

I didn't see in the parsing spec what should happen when the wrong number of arguments are given. Maybe I missed it or maybe because that isn't related to what gets built in the DOM.

HTML5 has some elements that must be used in ordered pairs (i.e., they don't make sense out of context). For example, dl, dd, and dt all work together. I saw the parsing spec has some repair mechanisms related to them for parse errors, but I didn't see anything related to inserting missing tags. Does any one what the HTML5 spec says to do in these cases? If there is a precedent (and maybe existing code), it is something we should follow if it makes sense for MathML.

If there is no precedent, then I like the idea of adding anonymous mrows and attaching some CSS to them to visually and hopefully accessibly indicate an error.

@bfgeek
Copy link

bfgeek commented Mar 4, 2019

@NSoiffer Yes the parsing spec does specify what should happen with missing close tags, but that isn't what is being discussed.

I didn't see in the parsing spec what should happen when the wrong number of arguments are given. Maybe I missed it or maybe because that isn't related to what gets built in the DOM.

The primary issue here is that DOM/CSS tries to display all content on the page, HTML doesn't really have a concept of an "incorrect" number of children for any given element (there may be a counter example here, but I'm not aware of any).

E.g. a <table> element can have multiple <caption>s or <thead> even though it only makes "sense" to have one. The browser will display all of them not to drop content.

@fred-wang
Copy link
Contributor Author

An alternative option could be to say that invalid markup are just laid out as mrow. That way we keep a simple fallback similar to the empty 0x0 but ensure that all the content is displayed.

@davidcarlisle
Copy link
Collaborator

@bfgeek I think the html spec is trying to specify this

https://html.spec.whatwg.org/#mathml

says

User agents must act as if any MathML element whose contents does not match the element's content model was replaced, for the purposes of MathML layout and rendering, by a MathML merror element containing some appropriate error message

which means that

<mfrac><mi>a</mi><mi>b</mi><mi>c</mi></mfrac>

should render as

<merror> ...</merror>

That said, if that proves tricky to specify in an interoperable way, we should change the spec.

@NSoiffer
Copy link
Contributor

NSoiffer commented Mar 4, 2019

At the risk of drifting off-topic, is merror something that should be in core? It is currently listed in the core spec, but I don't think that means much at the moment. If it should remain in core, I think we should follow the HTML spec and use it as @davidcarlisle suggested. However, the HTML5 spec is short on detail about what the content of the merror element should be in this case. The core spec says merror should be treated as mrow, but doesn't say how it handles content errors.

I think the merror description in the spec needs a tweak. I've added w3c/mathml#66 about that. Basically that tweaks gives a way of describing the error without interfering with the display of the content for this use of merror.

With that tweak, @fred-wang's solution of using mrow for display (inside of the merror) would be compatible with the HTML5 spec and (according to @fred-wang) be easy to implement.

@bfgeek
Copy link

bfgeek commented Mar 4, 2019

@davidcarlisle Whatever is decided here, should be specified in the MathML spec, and that removed from the HTML spec.

I've spoken to mozilla engineers about that behaviour and they dislike it, and the technical debt it creates.
cc/ @emilio

@bfgeek
Copy link

bfgeek commented Mar 4, 2019

@davidcarlisle Replacing the content with <merror> is probably going to be a bunch of work to specify and test that it works correctly, just a few questions that you have to answer:

  • When does this happen within the document lifecycle.
  • Does it operate on the DOM tree, or CSS box tree?
  • Does this update the DOM nodes underneath a <math> element, when does this happen?
  • etc.

@davidcarlisle
Copy link
Collaborator

@bfgeek I'm perfectly happy to let implementation issues take the lead here, I just wanted to flag the reference in the HTML5 spec so if we end up specifying something different here that gets pushed back to the w3c and whatwg html editors to update the spec (there are references to the mathml3 W3C CVS draft spec location rather than the currently maintained github drafts which also need updating in the html5 spec).

@fred-wang
Copy link
Contributor Author

@bfgeek I also don't like at all Mozilla / MathML3 / HTMLT5's suggestion to render as if an "merror" was inserted. Firefox actually implements it by displaying a "invalid-markup" message which is not really helpful. The standard way to report an error is via the browser console, which is what Firefox does.

If we want to be consistent with HTML, we should try and render all the content without extra visual indication for error, so so I suggest one of this option:

@fred-wang
Copy link
Contributor Author

Just to add more to the discussions, my colleague @rwlbuis pointed out that SVG has some error rules https://www.w3.org/TR/SVG11/implnote.html#ErrorProcessing that implies not displaying some parts of the content. I'm not sure how it is actually implemented in browsers.

There are also empty elements in MathML like mspace that are not supposed to have any child. I wonder how we should handle them compared to e.g. empty HTML elements. See web-platform-tests/wpt#15722

fred-wang referenced this issue in web-platform-tests/wpt Mar 8, 2019
Also add some edge cases (zero/one/more than two mfrac children).
@fred-wang
Copy link
Contributor Author

Consensus from 2019/09/12 was:

  • Send invalid error to the console so that authors know they have to fix it.
  • Always render something to the user, at minimum mrow-like layout. Experiment more to see if we can do something that does not alter the visual rendering (Rob already did that for mfrac).

@NSoiffer
Copy link
Contributor

From 2020/05/18 meeting:
Rob felt the experiment showed preserving structure and wrapping up the extra elements into the last child was easy to do. The group agreed he should go forward with that and assuming Google is ok with it, expand that "error handling" to the rest of the MathML elements that take a fixed number of arguments as that seems to be in line with the HTML philosophy of trying to do a sensible job of fix up.

There won't be a visual indication of the cleanup, but a warning should be sent to the console.

@fred-wang
Copy link
Contributor Author

Adding the label back because I don't think there is any consensus on the proposed "wrapping up the extra elements" approach and unless I'm missing what @rbuis was referring to, the layout-side experiment he made some months ago (for the mfrac element only) didn't demonstrate at all "it was easy" ; actually it was removed from our branch because it added extra complexity and it is not being upstreamed. There are still open questions like what happens with mmultiscripts, are we ok with "not perfectly equivalent" mentioned above, or if we should consider an anonymous box fixup approach instead (which is probably what is best in the long term) etc

So I don't think the CG is in position to draw any conclusion here without more work on implementations.

On the other hand, the current approach with mrow-like fallback well-described in the text of the spec, implementing it is very easy, chromium reviewers are fine with it and is what the WPT tests assume.

@fred-wang
Copy link
Contributor Author

Adding "need polyfill" as it should be easy to implement in JS what the people have requested to fixup the markup.

@NSoiffer
Copy link
Contributor

NSoiffer commented Aug 3, 2020

w3c/mathml#175 was resolved with 'render as an mrow'.

I'm not sure what is left, but @bkardell pointed out this can't be closed until there is agreement from the other implementations that rendering as an mrow is ok with them -- they do something different now as pointed out in the initial comment.

@NSoiffer
Copy link
Contributor

@fred-wang: I don't think this can be closed unless you heard from other implementers as per @bkardell comment.

@fred-wang
Copy link
Contributor Author

Checking the discussions, it seems Mozilla and Apple are happy with the "render as mrow" approach, but let's see what @bkardell says.

@NSoiffer
Copy link
Contributor

For my benefit and the benefit of others that are interested in what was said by Mozilla and Apple, can you provide a link to the discussions?

@fred-wang
Copy link
Contributor Author

For my benefit and the benefit of others that are interested in what was said by Mozilla and Apple, can you provide a link to the discussions?

This was discussed in an Apple/Igalia private chat, but Brian will be able to explain.

About Mozilla, I don't have a link either but I'm pretty sure @emilio had mentioned several times in the past that having this error message layout was complicating things (I'll ping him again). I actually had written a WIP patch one year ago https://bugzilla.mozilla.org/show_bug.cgi?id=1583037

@fred-wang
Copy link
Contributor Author

Closing this bug:

There is also a WebKit bug to align with the spec: https://bugs.webkit.org/show_bug.cgi?id=123348

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants