-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add string UTF-8 percent encode #503
Conversation
I am fine with existing names. Although "conditionally-UTF-8-percent encode" isn't bad too. |
I've been thinking about this a lot and I think part of the problem is the use of the word "string" to sometimes mean a sequence of bytes, a sequence of code points, a sequence of abstract characters. If you name the levels of encoding, some of the ambiguity drops away. |
If we use string (as defined by https://infra.spec.whatwg.org/#string) to mean bytes anywhere that's a bug. With regards to UTF-8 percent encode there's two confusing things I think:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As both will be used from other specs
I think it would help to have an example table with clear inputs and outputs. Something like: Here is a summary, by way of example, of these various operations:
I think having these examples of inputs and outputs, which use the appropriate typography, will convey more than simply stating the return types. |
That sounds pretty good. Note to self: Fetch and HTML have dependencies on "string percent decode". |
(Warning: This long comment may seem like getting into “how many angels can fit on head of pin” territory but is offered FWIW from my (long-ago) training and experience as a technical writer.) As far as use of hyphens/dashes in the operation names, some options to consider:
Option 2 above is most-clearly not the best choice in this case, because if you apply it, you end up with that “UTF-8-percent-encode-with-userinfo-percent-encode-set” monstrosity. Option 3 is also sub-optimal in this case, since in the same operation name, you end up with both “percent encode” (unhyphenated) and “percent-encode” (hyphenated). That risks making readers stumble — because at first reading, it might seem odd/inconsistent (even though it really isn’t). Option 1 in this case therefore kind of hits the sweet spot (given all of the above). Further rationale While there aren’t set-in-stone rules for when to add hyphens — it’s always partly a judgement call — the main guiding principle (especially in technical docs) is to add hyphens to avoid ambiguity; that is, to avoid having readers mis-read or mis-parse or stumble over a particular phrase. That’s especially important in cases where what you’re hyphenating is a compound adjective — because those are the cases where there’s the greatest risk of ambiguity and misreading. It’s less important in cases where what you’re hyphenating is a noun or verb — which is essentially the case here, because these are operation names. But another principle is consistency: If you know you’re otherwise gonna end up with cases like “UTF-8 percent encode with userinfo percent-encode set” — one operation name that has the same combination of words both unhyphenated (“percent encode”) and hyphenated (“percent-encode”) — then avoid that by just always hyphenating it. And the final principle is: When you’ve ended up with a term that’s itself referencing/qualifying some other multi-word term, use hyphens to make that more clear; so that’s what leads you to “conditionally UTF-8-percent-encode” rather than just “conditionally UTF-8 percent-encode”. |
And use it internally. This is also an initial step for #369.
6932e5b
to
bc67da4
Compare
It's UTF-8 percent-encode (for code points) and percent-decode (for strings) now. See whatwg/url#503.
It's UTF-8 percent-encode (for code points) and percent-decode (for strings) now. See whatwg/url#503.
And use it internally. This is also an initial step for #369.
If reviewers have thoughts on how to resolve #296 that'd be appreciated. Renaming things is in scope for this PR modulo the caveats mentioned there. Resolving to continue using the existing names is also fine with me.
Preview | Diff