Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[9496] Names SecondaryAuthority traceback #1212

Merged
merged 19 commits into from
Jan 10, 2020
Merged

[9496] Names SecondaryAuthority traceback #1212

merged 19 commits into from
Jan 10, 2020

Conversation

twm
Copy link
Contributor

@twm twm commented Dec 30, 2019

There is a lot of string type confusion in twisted.names. This is mostly about how domain names (and labels) are represented.

I'll be using the Python 3 names for types here: bytes and str.

The twisted.internet.interfaces.IResolver interface defines a bunch of methods, about one per DNS record type. These all take the same arguments, most importantly name, but the types aren't consistent — the lookupAddress() method defines name as bytes, but every other method (including lookupIPV6Address() documents it as str. Presumably this is a Python 3 porting oversight, but the confusion goes deeper.

Since IResolver has so many methods all of the implementers in Twisted subclass twisted.names.common.ResolverBase, which maps all the methods to a _lookup() bottleneck method that can be easily overridden. This includes lookupAddress(), so clearly its type signature is supposed to align with the rest.

Internally the query is usually translated into a dns.Query instance, which represents the name as a dns.Name instance. dns.Name coerces the name to bytes, encoding Unicode using the idna codec. So most implementations simply deal with bytes.

However, some implementers override a few methods directly. For example, twisted.names.hosts.Resolver only deals in IP addresses, so it overrides lookupAddress() and lookupIPV6Address() directly. This produces inconsistent behavior: the hosts implementation only accepts bytes, not str.

All of this confusion is borne out in the tests: some pass name as str, a few as bytes. There is no systematic coverage of both types.

This is basically what's going on in twisted.names.secondary.SecondaryAuthority in Twisted #9496: it's not coercing the type of its domain parameter to bytes, so it doesn't match the query.

The design intent here seems to be that everything is bytes. However, the reality is that most things support both bytes and str, so code relies on that. I tried rejecting str, but stuff broke (and that wouldn't be backwards compatible anyway. I have therefore:

  1. Updated IResolver to permit "bytes and str" for name parameters. We should not mislead third-party implementers.
  2. Added a public function, twisted.names.dns.domainString(), to do the coercion in a consistent way. It's public so third-party IResolver implementations can use it.
  3. Added coercion to all of the common.BaseResolver wrapper methods, and document _lookup()'s type to be only bytes. This makes it easier for third-party implementers.
  4. Updated miscellaneous IResolver implementers to also do coercion using domainString().
  5. Use the dns.Name.name attribute instead of calling str() to coerce dns.Name to a string, the former being bytes.

Contributor Checklist:

@twm twm marked this pull request as ready for review December 30, 2019 06:24
Copy link
Member

@glyph glyph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. I have some minor feedback inline that you can respond to, but please land when you have addressed to your satisfaction. Thanks for fixing it!

@@ -26,6 +26,9 @@ def getSerial(filename='/tmp/twisted-names.serial'):
State is stored in the given file. If it does not exist, it is
created with rw-/---/--- permissions.

This manipulates process-global state by calling C{os.umask()}, so it isn't
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this note.

@@ -60,6 +63,9 @@ class FileAuthority(common.ResolverBase):
"""
An Authority that is loaded from a file.

This is an abstract class that implements record search logic. To create
a functional resolver, subclass it and override the L{loadFile()} method.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I don't think the parens are necessary for an epytext reference.)

Load DNS records from a file.

This method populates the I{soa} and I{records} attributes. It must be
overridden in a subclass. It is called once from the initializer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to add a raise NotImplementedError just to make this clearer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that, but didn't think that would be backward-compatible. Right now a you can subclass FileAuthority, ignore loadFile(), and populate soa and records in your __init__ instead.

Or something even weirder if you like!

class NoFileAuthority(authority.FileAuthority):
def __init__(self, soa, records):
# Yes, skip FileAuthority
common.ResolverBase.__init__(self)
self.soa, self.records = soa, records


@param filename: The I{filename} parameter that was passed to the
initilizer.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an explicit @return might also be nice.

This is an implementation detail of L{ResolverBase.getHostByName}.

@param resolver: The resolver to use for the next query (unless handling
an I{NS} referral).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I{} or C{} ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I{} is used in most of the places record types are given any syntax, so I stuck with that.

@returns: L{bytes} suitable for network transmission.
@rtype: L{bytes}

@since: Twisted NEXT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 thank you for remembering the compat marker and the right syntax for it :)

@since: Twisted NEXT
"""
if isinstance(domain, unicode):
domain = domain.encode('idna')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to use twisted/internet/_idna.py here rather than a straight .encode. (The python stdlib codec is a much older version of the spec.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do.

@@ -0,0 +1 @@
twisted.names.secondary.SecondaryAuthority now accepts str for its domain parameter, so twist dns --secondary now functions on Python 3.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this explanation.

@returns: A L{Deferred} that fires with L{None} when attempted zone
transfer has completed.
"""
if self.transferring: # <-- never true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File a bug for this, too?

@returns: A L{Deferred} that fires with L{None} when attempted zone
transfer has completed.
"""
if self.transferring: # <-- never true
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it would make the docstring a lie, if it actually worked 😬

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid that 😬 summarizes this module too well. :(

Copy link
Contributor Author

@twm twm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done what I can for the moment. Filing tickets will have to wait until dornkirk is back up.

Load DNS records from a file.

This method populates the I{soa} and I{records} attributes. It must be
overridden in a subclass. It is called once from the initializer.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that, but didn't think that would be backward-compatible. Right now a you can subclass FileAuthority, ignore loadFile(), and populate soa and records in your __init__ instead.

Or something even weirder if you like!

class NoFileAuthority(authority.FileAuthority):
def __init__(self, soa, records):
# Yes, skip FileAuthority
common.ResolverBase.__init__(self)
self.soa, self.records = soa, records

This is an implementation detail of L{ResolverBase.getHostByName}.

@param resolver: The resolver to use for the next query (unless handling
an I{NS} referral).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I{} is used in most of the places record types are given any syntax, so I stuck with that.

@since: Twisted NEXT
"""
if isinstance(domain, unicode):
domain = domain.encode('idna')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do.

@returns: A L{Deferred} that fires with L{None} when attempted zone
transfer has completed.
"""
if self.transferring: # <-- never true
return
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid that 😬 summarizes this module too well. :(

@twm
Copy link
Contributor Author

twm commented Jan 7, 2020

Hmm, so switching to _idnaBytes results in test failures. This one in a test I added looks meritorious:

[ERROR]
Traceback (most recent call last):
  File "/home/twm/twisted/build/py36-alldeps-nocov/lib/python3.6/site-packages/twisted/names/test/test_dns.py", line 70, in test_idna
    self.assertEqual(b'xn--fwg.test', dns.domainString(u'\u203D.test'))
  File "/home/twm/twisted/build/py36-alldeps-nocov/lib/python3.6/site-packages/twisted/names/dns.py", line 280, in domainString
    domain = _idnaBytes(domain)
  File "/home/twm/twisted/build/py36-alldeps-nocov/lib/python3.6/site-packages/twisted/internet/_idna.py", line 30, in _idnaBytes
    return idna.encode(text)
  File "/home/twm/twisted/build/py36-alldeps-nocov/lib/python3.6/site-packages/idna/core.py", line 358, in encode
    s = alabel(label)
  File "/home/twm/twisted/build/py36-alldeps-nocov/lib/python3.6/site-packages/idna/core.py", line 281, in alabel
    check_label(label)
  File "/home/twm/twisted/build/py36-alldeps-nocov/lib/python3.6/site-packages/idna/core.py", line 261, in check_label
    raise InvalidCodepoint('Codepoint {0} at position {1} of {2} not allowed'.format(_unot(cp_value), pos+1, repr(label)))
idna.core.InvalidCodepoint: Codepoint U+203D at position 1 of '‽' not allowed

twisted.names.test.test_dns.DomainStringTests.test_idna

However most of the failures are to do with SRV support:

Traceback (most recent call last):
  File "/home/twm/twisted/build/py36-alldeps-nocov/lib/python3.6/site-packages/twisted/names/test/test_srvconnect.py", line 118, in test_SRVPresent
    self.connector.connect()
  File "/home/twm/twisted/build/py36-alldeps-nocov/lib/python3.6/site-packages/twisted/names/srvconnect.py", line 109, in connect
    nativeString(self.domain)),
  File "/home/twm/twisted/build/py36-alldeps-nocov/lib/python3.6/site-packages/twisted/names/client.py", line 731, in lookupService
    return getResolver().lookupService(name, timeout)
  File "/home/twm/twisted/build/py36-alldeps-nocov/lib/python3.6/site-packages/twisted/names/common.py", line 138, in lookupService
    return self._lookup(dns.domainString(name), dns.IN, dns.SRV, timeout)
  File "/home/twm/twisted/build/py36-alldeps-nocov/lib/python3.6/site-packages/twisted/names/dns.py", line 280, in domainString
    domain = _idnaBytes(domain)
  File "/home/twm/twisted/build/py36-alldeps-nocov/lib/python3.6/site-packages/twisted/internet/_idna.py", line 30, in _idnaBytes
    return idna.encode(text)
  File "/home/twm/twisted/build/py36-alldeps-nocov/lib/python3.6/site-packages/idna/core.py", line 358, in encode
    s = alabel(label)
  File "/home/twm/twisted/build/py36-alldeps-nocov/lib/python3.6/site-packages/idna/core.py", line 270, in alabel
    ulabel(label)
  File "/home/twm/twisted/build/py36-alldeps-nocov/lib/python3.6/site-packages/idna/core.py", line 304, in ulabel
    check_label(label)
  File "/home/twm/twisted/build/py36-alldeps-nocov/lib/python3.6/site-packages/idna/core.py", line 261, in check_label
    raise InvalidCodepoint('Codepoint {0} at position {1} of {2} not allowed'.format(_unot(cp_value), pos+1, repr(label)))
idna.core.InvalidCodepoint: Codepoint U+005F at position 1 of '_xmpp-server' not allowed

twisted.names.test.test_srvconnect.SRVConnectorTests.test_SRVPresent

This looks like kjd/idna#17, which indicates that this is a little more complex than it seemed to me at first:

  • IDNA encoding applies only to hostnames
  • Hostnames must not contain underscores
  • DNS names may contain underscores, e.g. as a prefix like in SRV

Sadly, it looks like the name handling behavior we want actually varies by record type. We can't apply the idna package blindly (as dns.Name did with str.encode('idna')), since it actually validates its inputs as hostnames.

I think that I will back out this change and file a ticket once dornkirk returns.

@twm
Copy link
Contributor Author

twm commented Jan 8, 2020

Some quality time with RFC 5890 has convinced me that handling IDNA is entirely inappropriate at this layer. (Indeed, a more technically correct representation of DNS names would be Tuple[bytes], as the dots are a convention of representation, though I doubt the practical utility of that.) I have backed out the use of _idnaBytes. We'll stick with the still incorrect — but compatible — use of IDNA2003. I'll go ahead and merge when the build is green.

@twm
Copy link
Contributor Author

twm commented Jan 8, 2020

The build is broken because of hamcrest/PyHamcrest#131

@twm
Copy link
Contributor Author

twm commented Jan 8, 2020

Now the coverage service is failing. 😠

@twm twm closed this Jan 8, 2020
@twm twm reopened this Jan 8, 2020
@glyph
Copy link
Member

glyph commented Jan 9, 2020

@twm I hopefully have addressed the thing that was breaking coverage and kicked the build via the trunk merge, so hopefully you can land when it completes.

@twm twm closed this Jan 10, 2020
@twm twm reopened this Jan 10, 2020
@twm
Copy link
Contributor Author

twm commented Jan 10, 2020

Kicking build due to https://twistedmatrix.com/trac/ticket/8879#comment:36

@twm
Copy link
Contributor Author

twm commented Jan 10, 2020

Thank you @glyph!

@twm twm merged commit 1d68f66 into trunk Jan 10, 2020
@twm twm deleted the 9496-names-traceback branch January 10, 2020 05:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants