Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify the After attribute value (quoted) state #7430

Closed
wants to merge 2 commits into from

Conversation

untitaker
Copy link

@untitaker untitaker commented Dec 19, 2021

Fix #1323. As pointed out in the issue, the number of branches in this
state can be reduced.


/acknowledgements.html ( diff )
/parsing.html ( diff )

Fix whatwg#1323. As pointed out in the issue, the number of branches in this
state can be reduced.
Copy link
Member

@zcorpan zcorpan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've stepped through each character's state transitions in the spec and I believe this is equivalent.

@untitaker
Copy link
Author

Thanks, Fwiw I ran automated equivalence tests + fuzzing to verify the change here: untitaker/html5gum#23

@zcorpan
Copy link
Member

zcorpan commented Dec 22, 2021

@untitaker ok nice! Does the equivalence test care about parse errors?

cc @whatwg/html-parser for a second review or chance to object. If we hear nothing by the end of January 2022 I think this can be merged.

@untitaker
Copy link
Author

Yes but error positions are not compared

@annevk
Copy link
Member

annevk commented Jan 3, 2022

It's not clear to me what we prefer in the specification. Fewer total steps or fewer branches. @hsivonen thoughts?

@zcorpan
Copy link
Member

zcorpan commented Feb 1, 2022

It's no longer January. @hsivonen WDYT?

@hsivonen
Copy link
Member

hsivonen commented Feb 1, 2022

I don't see anything semantically wrong with this change. However, I think it's in general not particularly beneficial to tweak the states, since it causes review churn to compare implementations with the spec. Also, there is always a chance for bugs with supposed no-op changes.

To the extent implementations don't try to manually inline across these transitions, which would put more distance between what the code looks like and what the spec looks like, it seems to me that this ends up with more branches executed in the common case even if it results in less code. That is, the current character gets checked for being > more times in the case where the tag ends.

I'm not strongly against this change, but I'm also not convinced that it's a useful change.

@untitaker
Copy link
Author

untitaker commented Feb 1, 2022

I agree with @hsivonen, when I wrote this I didn't think that this would not be a clear performance win for every impl (it gets particularly complicated when you ask how expensive reconsume is), and also didn't consider the amount of reviews such a change requires. I'll close this PR

the way the spec is written, and given the sheer complexity of what it tries to define, there unfortunately is a strong motivator to keep implementations line-by-line in sync with the spec for ease of maintenance, and, well, here it goes both ways.

I wish there was a machine-readable version of the spec (essentially a reference implementation running on a custom abstract machine), it would make it significantly easier to do automated tests to check for equivalence of the "optimized state machine" of an implementation vs the spec's state machine. there are too many holes in html5lib-tests atm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

"12.2.4.39 After attribute value (quoted) state ..."
5 participants