Description
What is the issue with the HTML Standard?
Elements that are parsed as RAWTEXT or RCDATA in HTML context but as normal elements in foreign content context have been used for mXSS vectors. Examples:
- https://research.securitum.com/dompurify-bypass-using-mxss/
- https://x.com/Sonar_Research/status/1866135979880300830
- https://bughunters.google.com/blog/5038742869770240/escaping-and-in-attributes-how-it-helps-protect-against-mutation-xss#does-this-change-prevent-all-mxss-vectors-
Some also used the fact that <
and >
were not escaped in attribute values, which was fixed in #6362. But mXSS attacks are likely still possible.
Elements of interest:
- title
- textarea
- style
- script
- xmp
- iframe
- noembed
- noframes
- noscript
- plaintext
In https://x.com/zcorpan/status/1339517144053243906 I argued that it would have been better to have consistent parsing of style
(and others) between HTML and in foreign content. Then, a mutation that causes a style
element to be in a different namespace after parse-serialize-parse wouldn't matter.
It's not clear if it's still possible to change this, or if there's web content that depends on style
, script
or title
in SVG to have normal parsing. Also, changing this would likely cause some problems until all parsers agree on the new parsing. But in the long term it may be better if we can solve the mXSS vector completely.
(Another feature that is conditionally available based on the current element during parsing is CDATA
sections. Maybe it's possible to make them work in HTML context also?)
cc @whatwg/html-parser