Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow emoji domain names #420

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions rb/lib/twitter-text/extractor.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
# http://www.apache.org/licenses/LICENSE-2.0

# encoding: utf-8
require 'idn'
require 'simpleidn'

class String
# Helper function to count the character length by first converting to an
Expand Down Expand Up @@ -57,6 +57,8 @@ module Extractor extend self
# Maximum URL length as defined by Twitter's backend.
MAX_URL_LENGTH = 4096

MAX_DOMAIN_LABEL_LENGTH = 63

# The maximum t.co path length that the Twitter backend supports.
MAX_TCO_SLUG_LENGTH = 40

Expand Down Expand Up @@ -373,7 +375,12 @@ def is_valid_domain(url_length, domain, protocol)
begin
raise ArgumentError.new("invalid empty domain") unless domain
original_domain_length = domain.length
encoded_domain = IDN::Idna.toASCII(domain)
encoded_domain = SimpleIDN.to_ascii(domain)
# If the domain starts with xn-- but is not only ASCII characters, it's invalid.
return false if domain.start_with?("xn--") && !domain.ascii_only?
labels = encoded_domain.split('.')
# If any label of the domain is longer than 63 characters, it's invalid.
return false if labels.any?{|label| label.length > MAX_DOMAIN_LABEL_LENGTH}
updated_domain_length = encoded_domain.length
url_length += (updated_domain_length - original_domain_length) if (updated_domain_length > original_domain_length)
url_length += URL_PROTOCOL_LENGTH unless protocol
Expand Down
4 changes: 3 additions & 1 deletion rb/spec/test_urls.rb
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,9 @@ module TestUrls
"http://foobar.中国",
"http://foobar.پاکستان",
"https://www.youtube.com/playlist?list=PL0ZPu8XSRTB7wZzn0mLHMvyzVFeRxbWn-",
"http://ああ.com"
"http://ああ.com",
"twitter.联通",
"https://🌈🌈🌈.st"
] unless defined?(TestUrls::VALID)

INVALID = [
Expand Down
5 changes: 2 additions & 3 deletions rb/twitter-text.gemspec
Original file line number Diff line number Diff line change
Expand Up @@ -19,14 +19,13 @@ Gem::Specification.new do |s|

s.add_development_dependency "test-unit"
s.add_development_dependency "multi_json", "~> 1.3"
s.add_development_dependency "nokogiri", "~> 1.10.9"
s.add_development_dependency "nokogiri", "~> 1.15.3"
s.add_development_dependency "rake"
s.add_development_dependency "rdoc"
s.add_development_dependency "rspec", "~> 3.0"
s.add_development_dependency "simplecov"
s.add_runtime_dependency "unf", "~> 0.1.0"
# Use of idn-ruby requires libidn to be installed separately
s.add_runtime_dependency "idn-ruby"
s.add_runtime_dependency "simpleidn"

s.files = `git ls-files`.split("\n") + ['lib/assets/tld_lib.yml'] + Dir['config/*']
s.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
Expand Down