Broken Conformance: URLs with unicode chars in them #104

Open
yaauie opened this Issue Dec 19, 2013 · 1 comment

Comments

Projects
None yet
2 participants
Contributor

yaauie commented Dec 19, 2013

A conformance spec is currently broken on master.

  1) Failure:
test_urls Autolink URLs with unicode chars in them(ConformanceTest) [test/conformance_test.rb:126]:
<"See: <a href=\"http://example.com/tsa-pre✓™\">http://example.com/tsa-pre✓™</a> is a link"> expected but was
<"See: <a href=\"http://example.com/tsa-pre\">http://example.com/tsa-pre</a>✓™ is a link">

In this case, the unicode characters are not being included in the matched URL when we expect them to be.

I believe @psychs addressed the rationale behind what should and shouldn't match in #91, but I don't believe the spec is clear enough which unicode codepoint ranges should be considered part of the URL, and which shouldn't.

A solution to this issue would be to fix the spec; alternatively, I can take on the task of fixing it given documentation on what codepoint ranges should be considered part of the URL.

Contributor

jakl commented Dec 20, 2013

Our international team will be looking more deeply into this after the holidays. I'm not sure offhand how to pick the best unicode ranges without understanding all supported languages coupled with research. A safe immediate fix for this test is to link non-language characters like ✓™

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment