Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Take encoding into account when parsing link headers & early hints #9715

Open
noamr opened this issue Sep 10, 2023 · 4 comments · May be fixed by #9764
Open

Take encoding into account when parsing link headers & early hints #9715

noamr opened this issue Sep 10, 2023 · 4 comments · May be fixed by #9764

Comments

@noamr
Copy link
Contributor

noamr commented Sep 10, 2023

See #9709 (comment)

Usually we use the document's encoding when parsing URLs in link headers, but that doesn't exist yet for early hints & link headers, so we need to use something, probably the charset param of the document's content-type header. /cc @bashi

@domenic
Copy link
Member

domenic commented Sep 10, 2023

I think using UTF-8 would be better, ignoring the Content-Type. Especially because Content-Type might not arrive by early hints time, right?

@noamr
Copy link
Contributor Author

noamr commented Sep 10, 2023

I think using UTF-8 would be better, ignoring the Content-Type. Especially because Content-Type might not arrive by early hints time, right?

Right. I think it's a matter of calling steps 3-6 of https://html.spec.whatwg.org/#parse-a-url instead of running the whole algorithm.

@bashi
Copy link

bashi commented Sep 13, 2023

+1 to use UTF-8. Early hints are introduced recently so I guess it's not so harmful to assume servers that speak early hints use UTF-8.

@domenic
Copy link
Member

domenic commented Sep 13, 2023

It would be good to write tests to see what browsers do for non-early Link headers. Do they use Content-Type, or do they always use UTF-8?

I hope that at least some browsers always use UTF-8, and so we can have the simple rule "if it's a Link header, we use UTF-8; if it's <link>, we use the document's encoding".

noamr added a commit to noamr/html that referenced this issue Sep 20, 2023
- Use the document encoding for link elements
- Always use UTF8 for link headers/early hints

Closes whatwg#9715
@noamr noamr linked a pull request Sep 20, 2023 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

4 participants
@bashi @noamr @domenic and others