-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"UTF-8 percent encode c using the path percent-e..." #296
Comments
None of |
Would it be clearer if the step
was taken out of "UTF-8 percent encode" algorithm? And the places that call UTF-8 percent encode first check if c is in the relevant encode set? |
Oh wow yeah that's really unclear. The "UTF-8 percent encode" algorithm, in normal operation, generally does not encode? Very confusing. :-) |
The "UTF-8 percent encode" algorithm also appears to be unclear as to whether it's operating on code points or bytes. What's its return type?
|
Well codePoint should be self-evidently a code point. UTF-8 encode converts the code point into a byte sequence bytes. Percent encode then converts every byte into a scalar value string (in fact a percent-encoded byte, a special type of string). So the return type of UTF-8 percent encode is a string, while it takes in a code point. |
Ah, OK, I follow now. Although codePoint is just the name of the variable; there is nothing self-evident about its type other than human intuition. It should say "code point codePoint". Also, is there a reason these steps are not explicitly using another variable to keep track of the output? It seems unnecessarily convoluted to implicitly keep track of "the results concatenated". |
I think it's fair that some of the names are confusing a bit but this stems from the concept being known as percent-encoding. Anyone have suggestions for how to rename these but keep the type signatures if we did something here? (More elaborate suggestions welcome if you're interested in taking into account all callers in URL and HTML.)
|
Oh, I guess OP is mostly about UTF-8 percent encode not always doing something, not about the type signature. Would "conditionally-UTF-8-percent encode" work? |
#503 has the direction we're taking this. We'll keep the existing name, but we'll add a table to clarify the operations and use more overloading to reduce the high-level number of operations. |
https://url.spec.whatwg.org/commit-snapshots/488c459d9e4245a3f6bf087e7dcd2c7e91487ac5/#url-parsing
It's not at all clear when reading the parser how a path segment consisting of "%62[" should end up. If I'm reading it right, it should end up as "%25%36%32%5B", which doesn't seem right.
The text was updated successfully, but these errors were encountered: