Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ContextJ (RFC 5892) is Security Theater #776

Open
adraffy opened this issue Jun 15, 2023 · 2 comments
Open

ContextJ (RFC 5892) is Security Theater #776

adraffy opened this issue Jun 15, 2023 · 2 comments

Comments

@adraffy
Copy link

adraffy commented Jun 15, 2023

Is CheckJoiners/ContextJ set in stone or can it be debated? If so, I'd like to present some arguments.

@annevk
Copy link
Member

annevk commented Jun 15, 2023

It depends? 😊 I haven't looked into it, probably depends a lot on what it ends up meaning for ToASCII and what the arguments are.

@adraffy
Copy link
Author

adraffy commented Jun 15, 2023

  • There are 3K+ RGI emoji and 1/3 of them involve ZWJ sequences. CheckJoiners chooses a few exotic characters (that can easily be enforced at the registrar level) for 1350 emoji sequences that are used internationally by billions of people.

  • RFC 5892 is both outdated (2010) and misguided. AFAICT it's trying to allow ZW(N)J for typographical reasons yet I don't think there's any ambiguity with or without a joiner.

    • Are there any registrars that allow both virama with and without ZWNJ as separate names (no)
    • How many actual domains benefit from this rule?
  • If you look across the internet, there are thousands of developer hours wasted on deciding these choices one way or another, but at the end of the day, CheckJoiners is just a convoluted way to disallow 200C and 200D.


For a concrete example: 1F468 200D 1F4BB
image

  • This emoji was released in 2016 (7 years ago)
  • Major browsers don't agree on it's validity: Compare Chrome/Brave vs Safari/Firefox
  • The punycode of this emoji is xn--1ugz855pfha
  • This emoji is invalid with CheckJoiners.
  • In some browsers, this encodes as xn--qq8hgf which is wrong1F468 1F4BB is not the same as 1F468 200D 1F4BB
  • NodeJS recently switched to Ada which uses WHATWG. This means that even if you correctly punycode the domain, a WHATWG URL implementation will prevent its use, even though the punycode is valid and the domain is DNS compatible.
    image
  • In general, the validity of URLs seems to change randomly between browser releases as libraries are periodically replaced and the standards aren't clear.

The simplest solution is that CheckJoiners should be false

  • Any name with a joiner is already punycode.
  • UTS-46 provides poor guidance regarding spoofs and confusables and has forced developers to implement various parts of UAX-39 and their own logic to decide when to display punycode as Unicode.
  • UTS-46 advice about validating punycode is also strange because name validity is a registrar problem, not a resolution problem.
  • This is a disaster for the end-user because the rules are constantly changing, yet at the same time, there are thousands confusables and mixed scripted spoofs that slip right through the implemented standards.

For reference, I recently implemented a normalization standard for the Ethereum Name Service ecosystem. I used a combination of UTS-51 + UTS-46 + significantly safer character set (banned punctuation, parens, brackets, vocalizations, obsolete, deprecated, ancient, reversed, turned, flipped, many ligatures, etc.) + an intelligent confusable system (that isn't just a warning system: eg. rn is a footgun confusable.) Demo | Github

From my experience with the Unicode and RFC documentation, the primary source of confusion and bugs is due to the documentation itself. Many of these rules should be deprecated and the rules should be clarified and modernized.

I think WHATWG made the correct decision with AllowHyphens and finally broke away from archaic DNS rules.

I think they should do the same with CheckJoiners. If the WHATWG really wants to protect end-users, it should recommend UTS-51 RGI pre-processing and outright disallow ZW(N)J outside of emoji.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants