-
Notifications
You must be signed in to change notification settings - Fork 43
Improved support for internationalised domains #102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
PublicSuffix doesn't support Punycode encoded internationalised domain names, and instead expects them to be unicode encoded. Ref: weppos/publicsuffix-ruby#46.
Dnsruby expects the string containing the host to be ascii encoded. Ref: alexdalitz/dnsruby#94. Addressable's normalized_host method will also take care of removing a trailing dot.
end | ||
|
||
context "CNAME record pointed to username" do | ||
let(:cname) { "foobar.github.io" } | ||
before(:each) { allow(subject).to receive(:dns) { [cname_packet] } } | ||
before(:each) { allow(subject.send(:caa)).to receive(:query) { [cname_packet] } } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is required so that we return the CNAME
when no CAA
exists, imitating the behaviour seen in the real world. This hasn't been a issue until now, as the existing checks use a domain
of pages.github.com
, so we get back entries from the real world here. It is only now that I have started using fake names that this has become a problem.
end | ||
|
||
context "A records pointed to new IPs" do | ||
let(:ip) { "185.199.108.153" } | ||
before(:each) { allow(subject).to receive(:dns) { [a_packet] } } | ||
before(:each) { allow(subject.send(:caa)).to receive(:query) { [a_packet] } } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is required so that we return the A
when no CAA
exists, imitating the behaviour seen in the real world. This hasn't been a issue until now, as the existing checks use a domain
of pages.github.com
, so we get back entries from the real world here. It is only now that I have started using fake names that this has become a problem.
end | ||
|
||
context "CNAME record pointed elsewhere" do | ||
let(:cname) { "jinglebells.com" } | ||
before(:each) { allow(subject).to receive(:dns) { [cname_packet] } } | ||
before(:each) { allow(subject.send(:caa)).to receive(:query) { [cname_packet] } } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is required so that we return the CNAME
when no CAA
exists, imitating the behaviour seen in the real world. This hasn't been a issue until now, as the existing checks use a domain
of pages.github.com
, so we get back entries from the real world here. It is only now that I have started using fake names that this has become a problem.
@@ -836,13 +860,27 @@ | |||
context "A records pointed to old IPs" do | |||
let(:ip) { "192.30.252.153" } | |||
before(:each) { allow(subject).to receive(:dns) { [a_packet] } } | |||
before(:each) { allow(subject.send(:caa)).to receive(:query) { [a_packet] } } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is required so that we return the A
when no CAA
exists, imitating the behaviour seen in the real world. This hasn't been a issue until now, as the existing checks use a domain
of pages.github.com
, so we get back entries from the real world here. It is only now that I have started using fake names that this has become a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't do a code review, but 👍 to the approach (and GREAT write up).
@@ -100,7 +100,7 @@ def initialize(host, nameservers: :default) | |||
|
|||
@host = normalize_host(host) | |||
@nameservers = nameservers | |||
@resolver = GitHubPages::HealthCheck::Resolver.new(host, | |||
@resolver = GitHubPages::HealthCheck::Resolver.new(@host, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found this while undertaking some more testing just then when I managed to trigger some further Dnsruby exceptions when using a Unicode encoded IDN. Seems we weren't passing the normalised host through to the Resolver
class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The host
method should have been normalized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As host
exists as a locally scoped variable in the initialize
method, my understanding is that we need to explicitly refer to the instance level scoped host
where we want to use it, otherwise we will get back the locally scoped one, which is pre normalize_host
.
$ cat local-vs-instance.rb
def test(host)
@host = host.upcase
puts "host: #{host}, @host: #{@host}"
end
test("github.com")
$ ruby local-vs-instance.rb
host: github.com, @host: GITHUB.COM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally right! The host
parameter is first in the order of precedence, then the interpreter will attempt to find a method called host
. I'd love to use self.host
here so that we're calling the method. I'm worried about reordering these calls and causing @host
to be nil
by accident. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @parkr, I'm a little lost here, as from what I understand we don't have a host
method?, would you mind expanding on what you mean here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On line 6, there's an attr_reader
for :host
which creates a reader for @host
! That's where I'm understanding that there is a host method. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @parkr, it finally clicked 😄 I've updated the PR to use self.host
.
While not necessary at the moment, doing this in case we ever use this class directly elsewhere.
I came across a couple of issues with internationalised domain names today that I hope to solve in this PR.
PublicSuffix doesn't support Punycode encoded internationalised public suffixes, and instead expects them to be Unicode encoded (weppos/publicsuffix-ruby#46). If you feed in a Punycode encoded public suffix, then no result, or a more broader result, may be returned.
Impacts:
Domain.valid_domain?
can returnfalse
, leading to almost everything inDomain
failing, includingDomain.https_eligible?
always returningfalse
.CAA.records
isn't able to retrieve the CAA records for the apex domain.Unfortunately Dnsruby is the reverse way around, and expects the internationalised domain to be Punycode encoded, and exceptions if fed a Unicode encoded internationalised domain (alexdalitz/dnsruby#94).
Impacts:
GitHubPages::HealthCheck::Domain
orGitHubPages::HealthCheck::CAA
methods will lead to a Dnsruby exception.To work around these issues, I am feeding the domain in to Addressable's
normalized_host
method, which converts the internationalised domain to Punycode encoding, which appears to be the generally more accepted way to work with domains in code. I am then re-encoding in Unicode where required (which is just when using PublicSufix for now).Addressable's
normalized_host
method will also take care of removing a trailing dot, making our existing way to doing this redundant.These encoding transformations cope fine with a non-internationalised domain.
I have added tests, though these actually pass even without these fixes, as to surface this bug you need to use a domain with internationalised characters in the public suffix. I didn't want to pick a random name, since it may be in use, or registered in the future. It might be worth us registering a domain in one of the internationalised country code top-level domain's which appears in the public suffix list.
Glossary:
Public Suffix: As defined at https://publicsuffix.org/, an example is the
com
portion ofpages.github.com
.Apex Domain: As defined at https://help.github.com/articles/about-supported-custom-domains/#apex-domains, an example is the
github.com portion of
pages.github.com`.