Skip to content


Support standard URL spec #91

jakl opened this Issue · 4 comments

2 participants

Twitter, Inc. member

Note that we should have a different spec from the RFC. Because we need to recognize URLs in natural languages where we can't assume words are separated with white spaces. The original author really took care of that.

Twitter, Inc. member

Oh you mean a natural language URL might have a natural contextual ending rather than a space?

URL identification has been painfully recurring and there must be a more standard way to implement this, maybe tweaking the ending delimiter to support natural languages.

Twitter, Inc. member

We have two points on this.

We treat natural language representations of URLs, not strict URLs in RFC3986

We want to recognize natural language URLs like below.ärz

It doesn't conform to the RFC. But you can see it in the address bar of web browsers like Safari, Chrome and Firefox.

As you know, it's actually encoded into a strict URL internally.

But it's not readable. That's why these browsers decided to show representation forms instead of strict URLs in its address bar.

To make it natural for users, we need to treat natural language representations of URLs, instead of strict URLs conform to the RFC. That's the most important point here.

It means we need to define our own spec for acceptable natural language URLs. As you can see in the above example, it must be different from RFC3986. It depends on our use cases.

Recognize URLs in natural language text

It's difficult to recognize URLs in natural language text.

What do you think about this?


I think most users would expect to be extracted, instead of But RFC3986 allows ')' in path. So if you implement the recognizer by strictly conforming to the RFC, it will extract I think most people don't like the behavior.

But what about this?

I think most users expect The current twitter-text recognize this correctly.

I think you know some languages like Japanese, Chinese and Thai don't use spaces for word delimiters. In Japanese, many people write like below.だよね?

In this case, most people expect

As you can see in the above examples, it's clearly not easy. There is a trade off between natural behaviors and false positives.

The original author understood the problem and tuned up carefully, so that we can recognize URLs in a natural way for many people in various languages.

Twitter, Inc. member

It's interesting that this comment system on GitHub does URL auto linkification. It should be similar to ours. They just have different design for the last example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.