Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The domain names with the spacing modifier letters are misrecognized #5585

Closed
2 tasks done
noraworld opened this issue Nov 2, 2017 · 5 comments
Closed
2 tasks done
Labels
bug Something isn't working status/wontfix This will not be worked on

Comments

@noraworld
Copy link
Contributor

noraworld commented Nov 2, 2017

For example, the following URL is recognized correctly, but it's not on Mastodon.

https://www.ᴳoogle.com

We copy the above link as a string, paste it to Mastodon, and post the status. Then, the link is recognized as a punycode like the following, and we cannot access to the true server.

https://www.xn--oogle-r89a.com


  • I searched or browsed the repo’s other issues to ensure this is not a duplicate.
  • This bug happens on a tagged release and not on master (If you're a user, don't worry about this).
@noraworld
Copy link
Contributor Author

Perhaps, is this problem the same as #4837?

@nightpool
Copy link
Member

Is there a standard that says were not supposed to recognize this as the literal form? In my view this is working as designed—google.com and ᴳoogle.com are two different URLs

@noraworld
Copy link
Contributor Author

Do you mean that the link "ᴳoogle.com" leads you to a server that is different from google.com? In other words, does your browser recognize the following two links as the same URL in the end?

@unarist
Copy link
Contributor

unarist commented Nov 4, 2017

is disallowed in IDNA2008, but mapped to normal character in UTS#46. Browsers behavior would be due to UTS#46.

https://www.unicode.org/cldr/utility/idna.jsp?a=%E1%B4%B3oogle.com
http://unicode.org/faq/idn.html#10

Mastodon uses addressable for Punycode encoding, and addressable uses lbidn which only supports IDNA2003. So using it should be an error due to unassigned code points, but it's allowed now by ALLOW_UNASSIGNED option (c.f. #4496).

Options:

  • Don't care: Twitter also encodes it to https://www.xn--oogle-r89a.com.
  • Use UTS#46: libidn2 with IDN2_TRANSITIONAL option does this, although I don't know Ruby bindings for libidn2.
  • Don't convert by ourselves: GitHub handles like this and also fixes Don't normalize URIs in Unicode NFKC #4837 (I'm not sure about side effects on this)

@noraworld
Copy link
Contributor Author

Then, can't Mastodon follow the same behavior as the browser one (UTS46) so far?

@Gargron Gargron added bug Something isn't working status/wontfix This will not be worked on labels Oct 20, 2018
@Gargron Gargron closed this as completed Oct 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working status/wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

4 participants