Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML Parser: confusion about adoption agency algorithm #9559

Open
stasoid opened this issue Jul 25, 2023 · 9 comments
Open

HTML Parser: confusion about adoption agency algorithm #9559

stasoid opened this issue Jul 25, 2023 · 9 comments
Labels
clarification Standard could be clearer topic: parser

Comments

@stasoid
Copy link

stasoid commented Jul 25, 2023

https://html.spec.whatwg.org/multipage/parsing.html#adoption-agency-algorithm

What exactly does this text mean:

If there is no such element, then return and instead act as described in the "any other end tag" entry above.

What UA should do after executing steps in "any other end tag"?

  1. Should a call site look like this:
if (adoption_agency(token) == run_any_other_end_tag)
    goto any_other_end_tag;
// further steps will NOT be executed
  1. Or like this:
if (adoption_agency(token) == run_any_other_end_tag)
    any_other_end_tag(token);
// further steps will be executed

Ladybird behaves like 1, Chrome and html5lib - like 2 (they call any_other_end_tag directly from adoption_agency). Firefox, gumbo, and lexbor don't call any_other_end_tag at all (they ignore the return value of adoption_agency).

@annevk annevk added clarification Standard could be clearer topic: parser labels Jul 26, 2023
@annevk
Copy link
Member

annevk commented Jul 26, 2023

cc @whatwg/html-parser

@zcorpan
Copy link
Member

zcorpan commented Aug 14, 2023

Chromium and html5lib both have a return, so further steps in AAA will not be executed. Assuming a return, is there a difference between 1 and 2?

Do you have a test that shows different output between implementations here?

@stasoid
Copy link
Author

stasoid commented Aug 14, 2023

Further steps at AAA call site, not in the AAA algorithm itself. See how it is implemented in Ladybird.

No, I don't have a test.

@zcorpan
Copy link
Member

zcorpan commented Aug 14, 2023

@stasoid
Copy link
Author

stasoid commented Aug 14, 2023

If Chrome/html5lib behavior is the correct one, I propose to change the wording to:

If there is no such element, then act as described in the "any other end tag" entry above and return.

I.e. swap "return" and "act as described..." This makes it much clearer what implementation is supposed to do.

@zcorpan
Copy link
Member

zcorpan commented Aug 31, 2023

This might be a test for this:

<nobr><nobr>

If the remaining steps at the call site at https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-inbody:adoption-agency-algorithm-3 are not run, then the second nobr element would not be inserted. But Chrome/Safari/Firefox do insert it.

@stasoid
Copy link
Author

stasoid commented Aug 31, 2023

No, we need AAA step 4.3 to not find formatting element with the tag name subject to trigger the desired behavior. In this case it finds it. (AAA is called on the second <nobr>, so it finds the first <nobr> in the list of active formatting elements.)

@stasoid
Copy link
Author

stasoid commented Sep 1, 2023

I believe this is the test (found by brute force):

<object><a><p><a>

Update: No, Ladybird tree is the same as Chrome:

<html>
    <head>
    <body>
        <object>
            <a>
            <p>
                <a>
                <a>

@stasoid
Copy link
Author

stasoid commented Sep 3, 2023

If lexbor behavior is changed to match that of Ladybird, it still passes all the html5lib tests.

If lexbor behavior is changed to match that of Chrome, it still passes all the html5lib tests.

Lexbor author says that "for open tags (A and NOBR) there can be no situation when lxb_html_tree_adoption_agency_algorithm() function returns true" (true == run_any_other_end_tag here).

So there might indeed be no difference between Ladybird and Chrome behavior, but this is not obvious at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clarification Standard could be clearer topic: parser
Development

No branches or pull requests

4 participants
@zcorpan @annevk @stasoid and others