Skip to content

Improved support for internationalised domains #102

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 20, 2018
Merged

Improved support for internationalised domains #102

merged 6 commits into from
Jun 20, 2018

Conversation

snh
Copy link
Member

@snh snh commented Jun 9, 2018

I came across a couple of issues with internationalised domain names today that I hope to solve in this PR.


PublicSuffix doesn't support Punycode encoded internationalised public suffixes, and instead expects them to be Unicode encoded (weppos/publicsuffix-ruby#46). If you feed in a Punycode encoded public suffix, then no result, or a more broader result, may be returned.

Impacts:

  • Domain.valid_domain? can return false, leading to almost everything in Domain failing, including Domain.https_eligible? always returning false.
  • CAA.records isn't able to retrieve the CAA records for the apex domain.

Unfortunately Dnsruby is the reverse way around, and expects the internationalised domain to be Punycode encoded, and exceptions if fed a Unicode encoded internationalised domain (alexdalitz/dnsruby#94).

Impacts:

  • Feeding a Unicode encoded internationalised domain in to most of the GitHubPages::HealthCheck::Domain or GitHubPages::HealthCheck::CAA methods will lead to a Dnsruby exception.

To work around these issues, I am feeding the domain in to Addressable's normalized_host method, which converts the internationalised domain to Punycode encoding, which appears to be the generally more accepted way to work with domains in code. I am then re-encoding in Unicode where required (which is just when using PublicSufix for now).

Addressable's normalized_host method will also take care of removing a trailing dot, making our existing way to doing this redundant.

These encoding transformations cope fine with a non-internationalised domain.

I have added tests, though these actually pass even without these fixes, as to surface this bug you need to use a domain with internationalised characters in the public suffix. I didn't want to pick a random name, since it may be in use, or registered in the future. It might be worth us registering a domain in one of the internationalised country code top-level domain's which appears in the public suffix list.

Glossary:

Public Suffix: As defined at https://publicsuffix.org/, an example is the com portion of pages.github.com.
Apex Domain: As defined at https://help.github.com/articles/about-supported-custom-domains/#apex-domains, an example is the github.com portion of pages.github.com`.

snh added 3 commits June 9, 2018 18:49
PublicSuffix doesn't support Punycode encoded internationalised domain
names, and instead expects them to be unicode encoded. Ref:
weppos/publicsuffix-ruby#46.
Dnsruby expects the string containing the host to be ascii encoded. Ref:
alexdalitz/dnsruby#94.

Addressable's normalized_host method will also take care of removing a
trailing dot.
@snh snh added the bug label Jun 9, 2018
end

context "CNAME record pointed to username" do
let(:cname) { "foobar.github.io" }
before(:each) { allow(subject).to receive(:dns) { [cname_packet] } }
before(:each) { allow(subject.send(:caa)).to receive(:query) { [cname_packet] } }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required so that we return the CNAME when no CAA exists, imitating the behaviour seen in the real world. This hasn't been a issue until now, as the existing checks use a domain of pages.github.com, so we get back entries from the real world here. It is only now that I have started using fake names that this has become a problem.

end

context "A records pointed to new IPs" do
let(:ip) { "185.199.108.153" }
before(:each) { allow(subject).to receive(:dns) { [a_packet] } }
before(:each) { allow(subject.send(:caa)).to receive(:query) { [a_packet] } }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required so that we return the A when no CAA exists, imitating the behaviour seen in the real world. This hasn't been a issue until now, as the existing checks use a domain of pages.github.com, so we get back entries from the real world here. It is only now that I have started using fake names that this has become a problem.

end

context "CNAME record pointed elsewhere" do
let(:cname) { "jinglebells.com" }
before(:each) { allow(subject).to receive(:dns) { [cname_packet] } }
before(:each) { allow(subject.send(:caa)).to receive(:query) { [cname_packet] } }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required so that we return the CNAME when no CAA exists, imitating the behaviour seen in the real world. This hasn't been a issue until now, as the existing checks use a domain of pages.github.com, so we get back entries from the real world here. It is only now that I have started using fake names that this has become a problem.

@@ -836,13 +860,27 @@
context "A records pointed to old IPs" do
let(:ip) { "192.30.252.153" }
before(:each) { allow(subject).to receive(:dns) { [a_packet] } }
before(:each) { allow(subject.send(:caa)).to receive(:query) { [a_packet] } }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required so that we return the A when no CAA exists, imitating the behaviour seen in the real world. This hasn't been a issue until now, as the existing checks use a domain of pages.github.com, so we get back entries from the real world here. It is only now that I have started using fake names that this has become a problem.

Copy link
Contributor

@benbalter benbalter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't do a code review, but 👍 to the approach (and GREAT write up).

@@ -100,7 +100,7 @@ def initialize(host, nameservers: :default)

@host = normalize_host(host)
@nameservers = nameservers
@resolver = GitHubPages::HealthCheck::Resolver.new(host,
@resolver = GitHubPages::HealthCheck::Resolver.new(@host,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found this while undertaking some more testing just then when I managed to trigger some further Dnsruby exceptions when using a Unicode encoded IDN. Seems we weren't passing the normalised host through to the Resolver class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The host method should have been normalized?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As host exists as a locally scoped variable in the initialize method, my understanding is that we need to explicitly refer to the instance level scoped host where we want to use it, otherwise we will get back the locally scoped one, which is pre normalize_host.

$ cat local-vs-instance.rb
def test(host)
  @host = host.upcase
  puts "host: #{host}, @host: #{@host}"
end

test("github.com")

$ ruby local-vs-instance.rb
host: github.com, @host: GITHUB.COM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally right! The host parameter is first in the order of precedence, then the interpreter will attempt to find a method called host. I'd love to use self.host here so that we're calling the method. I'm worried about reordering these calls and causing @host to be nil by accident. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @parkr, I'm a little lost here, as from what I understand we don't have a host method?, would you mind expanding on what you mean here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line 6, there's an attr_reader for :host which creates a reader for @host! That's where I'm understanding that there is a host method. 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @parkr, it finally clicked 😄 I've updated the PR to use self.host.

snh added 2 commits June 10, 2018 18:38
While not necessary at the moment, doing this in case we ever use this class directly elsewhere.
@parkr parkr merged commit 97ce501 into github:master Jun 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants