You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Background: I'd like to understand "When to Encode or Decode" from RFC 3986 - and come from google/guava#2078 providing some hints.
When parsing a Host, the hostname is "percent decoded", but when a host is serialized, there is no percent encoding done. Is serializing not the opposite of parsing?
When parsing a URL, the path is encoded. I really wonder, why the path is encoded and not decoded (like in the hostname). During URL serialization there is no encoding or decoding done at all.
Can this difference in the approaches (decoding vs. encoding in parsing) be clarified? Why is no encoding done during serialization?
The text was updated successfully, but these errors were encountered:
No non-ASCII characters are allowed in hosts anyway, so for hosts with percent encoding they could be represented in ASCII, encoded with Punycode, or an invalid host. github%2ecom for example will be parsed and then stored as github.com, %E4%B8%AD%E6%96%87.com (中文.com) as xn--fiq228c.com, while github%00.com is an invalid host. Because they are stored already in a form that is guaranteed to contain only printable ASCII characters, the serialization algorithm does not include another step for percent encoding.
For path, they are encoded during parsing and subsequently stored in an encoded form, so encoding isn't necessary either when serializing. See path state step 2.3:
UTF-8 percent encode c using the path percent-encode set, and append the result to buffer.
Background: I'd like to understand "When to Encode or Decode" from RFC 3986 - and come from google/guava#2078 providing some hints.
When parsing a Host, the hostname is "percent decoded", but when a host is serialized, there is no percent encoding done. Is serializing not the opposite of parsing?
When parsing a URL, the path is encoded. I really wonder, why the path is encoded and not decoded (like in the hostname). During URL serialization there is no encoding or decoding done at all.
Can this difference in the approaches (decoding vs. encoding in parsing) be clarified? Why is no encoding done during serialization?
The text was updated successfully, but these errors were encountered: