New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test case for 2 letter top level domains #59

chmac opened this Issue Sep 26, 2013 · 2 comments


None yet
2 participants
Copy link

chmac commented Sep 26, 2013

I've been testing a little by posting to twitter, and domains like and and are not auto linked. However, domains like and are. These differences are not covered in the test cases.

I tried to dig into the javascript implementation to see what the actual behaviour is, but I spent enough time on PHP regexs today, maybe another day!

I just did another test and a domain is auto linked, even though it's a non existent second level domain. While which is a valid domain is not linked, but is.

I'd hazard a guess and say that any 2 letter top level domain (a.xx) is not linked, while domains like or a.xx.xx are linked. Ironically, is not linked!


This comment has been minimized.

Copy link

jakl commented Sep 27, 2013

Good sleuthing! Signal to noise in tweets is often more towards noise so we only autolink if we're mostly certain it's a URL; it could be an emoticon or internet meme or have meaning in another language. Some domains are treated as especially strong signals like .com

For now, you're right, we should have clear tests. Later we can revisit which domains should link and how changes would affect a sample set of tweets.


This comment has been minimized.

Copy link

chmac commented Sep 27, 2013

A little more sleuthing later, it looks like on both the API and the javascript frontend on, the following happens:

  • Any first level valid CC domain is not linked
  • Any second level valid CC domain is linked or
  • Any domain with a non existent CC TLD is not linked, so github.pp, www.github.pp or chmac.github.pp
  • Any first or second level valid global or US domain is linked,, or, or

There's presumably a list of valid TLDs in the js and other codebases. It's probably possible to tests from that.

I'd suggest that domains like probably should be linked automatically, because a non existent domain like github.pp wouldn't be linked anyway, so things like that.ll wouldn't be linked, but that's a decision for somebody else to make.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment