-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/escape XML characters in sitemap url #77422
base: canary
Are you sure you want to change the base?
Conversation
Allow CI Workflow Run
Note: this should only be enabled once the PR is ready to go and can only be enabled by a maintainer |
* Others like © or are escaped to prevent invalid XML. | ||
|
||
* - Prevents double-escaping of known entities like & | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I Added the Function to this utils file because it was already being used/imported in resolve-route-data.ts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Note on unrecognized HTML entities
This fix only escapes standard XML-safe characters (&, <, >, ", ') and prevents double-escaping of valid entities like &.
It does not warn if the input contains unsupported HTML entities like ©, , etc. — those will be escaped as ©, which renders literally in XML.
If needed, a warning or validation layer could be added in the future to detect and notify about non-XML entities. For now, this is intentionally left silent to avoid unnecessary noise or breaking existing behavior.
@@ -99,6 +99,43 @@ describe('resolveRouteData', () => { | |||
" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote the tests in the current generate sitemap test block
What?
Fixes an issue where
sitemap.[ts/js]
generates invalid XML when URLs contain special characters like&
,<
,>
,"
, and'
. While the source file is JavaScript or TypeScript, the final output is XML — which requires strict character escaping.See: #77340
Why?
Unescaped characters break XML parsing, causing search engines to reject the sitemap or ignore URLs. While some users manually escape these (
&
→&
), this leads to brittle workarounds and potential double-escaping.How?
escapeXmlValue()
&
)resolveSitemap()
for<loc>
valuesThis fix maintains backward compatibility for users who have already manually escaped their URLs.
Fixes #77340