Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is the first example (an octet 0x7F is percent encoded as "%23") correct? #545

Closed
wlammen opened this issue Sep 27, 2020 · 4 comments · Fixed by #546
Closed

Is the first example (an octet 0x7F is percent encoded as "%23") correct? #545

wlammen opened this issue Sep 27, 2020 · 4 comments · Fixed by #546

Comments

@wlammen
Copy link

wlammen commented Sep 27, 2020

Hi,

This is my first post to this group, so I apologize for any violation of rules I might have committed.

I refer to the first example in the examples block at the end of section 1.3 Percent-encoded bytes in the URL standard, that reads

Percent-encode input | 0x7F | "%23"

I don't understand this example and think it is incorrect. If I follow the links and do what they suggest, I come up with "%7F" instead.
If, for some reason, this particular octet encoding receives special handling, I think this should be pointed out more clearly.

By the way, what you call a byte (in its original meaning it denotes the smallest addressable unit in memory, and that can consist of more or less than 8 bits), is in fact an octet. I concede this meaning has become out of use for decades now, and most people identify a byte with an octet, the correct name for an 8-bit unit. Maybe one should hint to this alternative somewhere?

Wolf Lammen

@achristensen07
Copy link
Collaborator

I agree, 0x7F should not percent encode to %23. That looks like an error that should be fixed.
No strong opinion on byte/octet. It's not confusing to me.

@wlammen
Copy link
Author

wlammen commented Sep 27, 2020

Octet/Byte

The URL Standard states in section 4.3

A byte is a sequence of eight bits...

So it redefines the meaning of 'byte' to be an octet. Other standards are more aware of (and picky about) the difference. For example, the RFC list of the IETF avoids 'byte', and the MIME type application/octet-stream (RFC-1341) is deliberately not a byte-stream.

My remark is not about confusion, but about consistent wording across standards. As a minimum concession one could rephrase the above definition to something like

Following meanwhile widespread usage, a byte in this standard is always used as a synonym of an octet, a sequence of eight bits,...

I am not going to pursue this issue beyond this post. I just noticed the slightly improper usage of 'byte' in this standard, because I am used to extremely precise wording in documents of this kind.

Wolf Lammen

@annevk
Copy link
Member

annevk commented Sep 28, 2020

Interesting, 0x23 was suggested at #503 (comment) so I suppose I did some copypasta. I can create a PR.

It's not the URL Standard that defines byte, but the Infra Standard (in that section indeed), and all WHATWG standards (and quite a few W3C standards) are consistent about using that. We chose that because the difference doesn't come up in practice and it's the term everyone is already familiar with. I wouldn't say it's not precise though as it has a rather exact definition that is linked from where it's used.

@annevk
Copy link
Member

annevk commented Sep 29, 2020

Thanks for reporting this Wolf!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

3 participants