Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#8912] Safer, better UNIX socket ancillary data decoding #652

Merged
merged 32 commits into from
Feb 16, 2017

Conversation

exvito
Copy link
Contributor

@exvito exvito commented Dec 30, 2016

Addresses https://twistedmatrix.com/trac/ticket/8912

Notes:

  • Improves robustness / correctness in two ways:
    • Processes all received ancillary data messages (previous code would only process the first).
    • Validates level and type before decoding file descriptors from the data (which would otherwise be wrong and prone to all sorts of failures by producing invalid/unpredictable descriptors).
  • Inspired by Tom Prince's sketch in the ticket, this code logs a message if unsupported ancillary data is received.

This is ready as far as I can tell, but I'd like feedback on:

  • The code for ancillary data processing is growing in _SendmsgMixin.doRead. Should it be factored out? If so, name suggestions for such a method?
  • On some platforms -- Mac, for example -- sendmsg fails when called with two CMSGs. Decided to raise SkipTest in those cases: continuing after such failure would be pointless and letting the error through would be misleading because the failure is not related to the code under test; it is preventing it from being tested, instead.
  • General naming / structure of the tests. They do what they need to do, but maybe someone can suggest improvements.

Thanks in advance.

@codecov-io
Copy link

codecov-io commented Dec 30, 2016

Codecov Report

Merging #652 into trunk will increase coverage by <.01%.
The diff coverage is 98.7%.

@@            Coverage Diff             @@
##            trunk     #652      +/-   ##
==========================================
+ Coverage   91.27%   91.28%   +<.01%     
==========================================
  Files         844      844              
  Lines      147472   147532      +60     
  Branches    13047    13055       +8     
==========================================
+ Hits       134612   134669      +57     
- Misses      10623    10626       +3     
  Partials     2237     2237

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 78679af...f4dbdbb. Read the comment docs.

glyph
glyph previously requested changes Jan 14, 2017
Copy link
Member

@glyph glyph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from the comments in-line here, the other issue is code coverage. Please address codecov/patch by trying to ensure that everything is covered. Luckily the coverage problem is around the SkipTest I think needs to be removed anyway, so you shouldn't have any trouble!

@@ -383,26 +383,34 @@ def test_fileDescriptorOverrun(self):
test_fileDescriptorOverrun.skip = sendmsgSkip


def test_multiFileDescriptorReceivedPerRecvmsg(self):
def _SendmsgMixinFileDescriptorReceivedDriver(self, ancillaryPacker):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This S should be lower-case to adhere to the coding standard.

should return a two-tuple where: the first item is an iterable of
zero or more (cmsg_level, cmsg_type, cmsg_data) tuples for actual
sending via sendmsg; the second item is an integer indicating the
expected number of FDs to be received.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There needs to be an @param here describing ancillaryPacker.

# - Call doRead in the FakeReceiver.
# - Verify results on FakeProtocol.

# TODO: replace FakeReceiver test approach with one based in
# IReactorSocket.adoptStreamConnection once AF_UNIX support is
# implemented; see https://twistedmatrix.com/trac/ticket/5573.

from socket import socketpair
import socket
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These imports should really be moved up to the top level. I realize you're just copying the surrounding pattern, but this is making a bad pattern worse :).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, these imports are here because they will otherwise generate import failures on Windows, maybe I'm wrong. All these tests should be skipped on Windows, given that they're testing UNIX sockets, but I'm not sure they are, as of now.

PS: That does not apply to the "import socket" here, which I put in place to handle the socket.error exception in the "dynamic" test skipping code you pointed out below, which I will improve; but it does apply, for example, to the "from socket import socketpair".

# Some platforms fail if ancillary contains more than
# one entry (Mac OS 10.9 and 10.10, for example).
# No point in continuing if sendmsg fails, skip the test.
raise SkipTest('sendmsg failed: %s' % (e,))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an instance of http://xunitpatterns.com/Conditional%20Test%20Logic.html.

Skips should be computed outside of the tests. The failure mode of computing a skip in-line like this is that a buggy implementation looks the same to us as an unsupported platform. Explicitly check if it's a known-to-not-support-multiple-sendmsg()s platform, then skip the test by setting its skip attribute on the outside.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out. I somehow felt unconfortable raising the skip from within the test code. That helps understand my initial gut feeling... :)

@exvito
Copy link
Contributor Author

exvito commented Jan 14, 2017

@glyph, thanks for your review. The latest commits address the issues you pointed out, including factoring the test skipping decision to "outside" the test code.

Coverage should be 100% once the Mac buildbots run through them.

Copy link
Member

@exarkun exarkun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This looks really good. I made a few minor points about the changes and one slightly more significant one that's actually about unmodified but nearby code.

Please address at least the points marked (required) and then merge (or ping me and I will).

Thanks again!

for cmsgLevel, cmsgType, cmsgData in ancillary:
if (cmsgLevel != socket.SOL_SOCKET or
cmsgType != sendmsg.SCM_RIGHTS):
log.msg(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(optional)

It seems like it should be the case that new code should use twisted.logger instead of twisted.python.log. I don't know if this is really true. It would probably be a good thing to document somewhere.

That said, this module already uses twisted.python.log everywhere so this message fits in with the existing code fairly well. Unless there's an explicit policy to the contrary, I don't have a problem, with leaving this as-is and doing the logging conversion work at some later point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure such policy would be beneficial in this case: if this PR uses twisted.logger, logging becomes inconsistent throughout the module; making use twisted.logger all around would bring lots of unrelated changes.

I'll go with (optional) not changing the logging approach and I've filed a ticket to convert this module's twisted.python.log to twisted.logger: https://twistedmatrix.com/trac/ticket/9009

)
continue
fdCount = len(cmsgData) // 4
fds = struct.unpack('i'*fdCount, cmsgData)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(optional)

Given the structure of this code, I'm excited to introduce support for receiving other kinds of ancillary data. It looks like it should be quite straightforward to do so. If you wanted, you could take one additional step towards this and improve the way dispatch is done here. For example, the fd handling logic could move into an _ancillary_SOL_SOCKET_SCM_RIGHTS method which is invoked via dict-style lookup. The fall-through case would then be the log-and-drop logic.

However, for the single case that's actually supported now, the code is perfectly fine as-is. Also, I know there's also SCM_CREDENTIALS (and googling turns up some really obscure stuff like SCM_WIFI_STATUS). I don't know if there's anything else useful beyond these two ... If there are only two then dict-based dispatch is not much of a win (certainly not performance; perhaps still readability).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When working on this my mind also went there: it should be simple to support other types of ancillary data. That's also one of the reasons I initially asked reviewers about factoring out ancillary message decoding.

Having investigated other types of ancillary data I came down to the same conclusion. Other than SCM_RIGHTS and SCM_CREDENTIALS there are a few other obscure ones which tend to be platform-specific.

With that in mind, I decided to:

  • Keep the dispatching logic within a single if .. else block given that there's no benefit in making it more generic now.
  • Factor out the SOL_SOCKET / SCM_RIGHTS decoding and processing to the method you suggested, _ancillary_SOL_SOCKET_SCM_RIGHTS. This keeps the code cleaner, points out a way of handling other types of ancillary data and simplifies future dispatching improvements, dict-based or otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS: Had to rename the _ancillary_SOL_SOCKET_SCM_RIGHTS method to comply with twistedchecker.

sendmsg(sendSocket, dataToSend, ancillary)

receiver.doRead()

# Verify that fileDescriptorReceived was called twice.
self.assertEqual(len(proto.fds), 2)
self.assertEqual(len(proto.fds), expectedCount)

# Verify that received FDs are different from the sent ones.
self.assertFalse(set(fdsToSend).intersection(set(proto.fds)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(required)

Hmm. The existing assertion doesn't verify that the order of file descriptors received is the same as the order of file descriptors sent? So if the implementation mixed them up we wouldn't notice... Is that right?

If so, that seems moderately unfortunate. It's a pre-existing defect so it could be fixed in a follow-up ticket. Please either fix it here or file that follow-up ticket.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. This requires the non-trivial generic challenge of asserting two file descriptors actually "point" to the same underlying "kernel managed object".

The current test implementation only makes this more difficult, I guess, given that it is using the same UNIX socket FDs obtained via socketpair for two purposes:

  • To call sendmsg / doRead() on them.
  • As ancillary data FD payloads (because we need FDs to do that, and those are at hand).

Verifying that two UNIX socket FDs actually "point" to the same underlying socket is something I couldn't figure out how to do. In particular given the fact that these sockets were obtained via socketpair.

Two ideas come to my mind:

  • Trying to bind each of the UNIX sockets to a known address (filename, thus inode) and using os.fstat later to verify which is which. Not sure if socketpair connected sockets can be bound when obtained + how different platforms handle that.
  • Using known temporary files instead of the socket FDs as ancillary data payloads. These can be more simply verified via (filename + inode) and/or file contents.

Maybe there is some better approach at this. Which are your thoughts on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed a commit with the first approach I commented above:

  • Bind sockets to known addresses.
  • Have FakeProtocol track not only FDs but also fstat's (st_dev, st_ino) tuples.
  • Do sendmsg / doRead.
  • Verify the sent (st_dev, st_ino) tuples against the ones tracked by FakeProtocol.

Feedback welcome.

flags = 0
return sendmsg.RecievedMessage(data, ancillary, flags)

origRecvmsg = sendmsg.recvmsg
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(required)

The monkey-patching should be done using self.patch(sendmsg, "recvmsg", fakeRecvmsgUnsupportedAncillary).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. Addressed.


def fakeRecvmsgUnsupportedAncillary(skt, *args, **kwargs):
data = b'some data'
ancillary = [(SOL_SOCKET+1, sendmsg.SCM_RIGHTS+1, b'')]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(required)

This is unsupported now but SCM_RIGHTS+1 is SCM_CREDENTIALS. SOL_SOCKET+1 doesn't appear to collide but I may be missing something and some other platform may have different constants. It looks like these are int fields. There's probably room to go far, far, far away from the currently assigned values? Or, alternatively, to pick actual values that are known to be invalid in this context (with a comment). For example ... SOL_TCP? SOL_UDP? Unfortunately Python doesn't expose many SCM_* constants (any, on Python 2...) making this strategy difficult there.

Ah, of course this is all happening at the Python level. So an out-of-bounds-for-int value would probably also work to guarantee non-support in the implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I faced this exact challenge when coding this - which values to use such that it consistently represents an unsupported level/type accross different platforms and Python versions.

Given that this code is exercising the not-supported handling, the first thing I did was improving the test such that it verifies the expected message was logged.

The other thing that can be done, since we're working at the Python level like you point out, is using non-int values in fakeRecvmsgUnsupportedAncillary: if we set ancillary = [(None, None, b'')] it will certainly trigger the not-supported case and be even better than an out-of-bounds-for-int which may become within-bounds-for-int in some future.

What do you say?

# it -- we'll temporarily replace recvmsg with a fake one that produces
# a non-supported ancillary message level/type. This being said, from
# the perspective of the ancillaryPacker, all that is required it to
# let the test driver know that 0 file descriptors are expected.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(optional)

The need to monkey-patch here suggests the implementation should probably accept recvmsg (and so probably also sendmsg) as parameters - thus removing the need for monkey-patching.

Making the sendmsg module a class attribute of _SendmsgMixin would probably accomplish this well-enough. That would allow the fake sendmsg to just be set on the FakeReceiver instance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I'll go with (optional) for now and leave the code as is even though such an approach would decouple the code from the underlying I/O implementation which is generally a good idea.

@exvito
Copy link
Contributor Author

exvito commented Jan 19, 2017

@exarkun, thanks for your detailed review and comments.

Most of the aspects you point out crossed my mind when working on this so they don't come up as complete surprises, so to speak. I'll consider each of them and get back here with further comments and/or code.

I'll probably only be able to do it sometime next week but wanted to acknowledge your feedback right away. Again thanks.

@exvito
Copy link
Contributor Author

exvito commented Jan 23, 2017

@exarkun, this is ready for review. The failing codecov test should pass once the Mac buildbot runs. Thanks.

@glyph
Copy link
Member

glyph commented Jan 26, 2017

Looks to me like all the feedback was addressed. I'll merge when the buildbot turns green.

@glyph glyph dismissed their stale review January 26, 2017 22:46

changes were addressed

@exvito
Copy link
Contributor Author

exvito commented Jan 27, 2017

@glyph, thanks for the review.

WIth the trac/github duality, I ended up commenting in the ticket, never being 100% sure where to add a comment :) If possible, see https://twistedmatrix.com/trac/ticket/8912#comment:13.

Feedback is welcome.

@glyph
Copy link
Member

glyph commented Jan 27, 2017

WIth the trac/github duality, I ended up commenting in the ticket, never being 100% sure where to add a comment

Either place is fine :).

@exvito
Copy link
Contributor Author

exvito commented Jan 30, 2017

@glyph, I've addressed the failing Mac OS X test. I believe the buildbots need some signalling to run... If that is the case, please do so when possible.

Thanks.

Copy link
Member

@glyph glyph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally like what I see here, and I've kicked off a build. Thanks again for following through on these complex, low-level issues with such tenacity! Twisted is far better for your participation.

I may not be able to stick around until the builds turn green this time, so please address the minor nitpicks below if you like (although feel free to leave these for a follow-up PR) and poke another committer to land the changes.


return self._dataReceived(data)


def _ancillaryLevelSOLSOCKETTypeSCMRIGHTS(self, cmsgData):

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this newline should be removed; it's unnecessary, per the coding standard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. It'll go away. :)

@@ -0,0 +1 @@
twisted.internet.unix.Server.doRead and twisted.internet.unix.Client.doRead now process all messages from recvmsg's ancillary data, while discarding ones of unsupported level or type.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a little clearer. Don't get me wrong, it's very precise! Technically, you're adhering to the letter of the law for news fragments, by calling out twisted.internet.unix.Server.doRead by its full name, but doRead is probably not the API of interest to users here; UNIX endpoints (i.e. serverFromString("unix:...") or UNIXServerEndpoint are the things that are going to be invoked.

Generally, as you're writing NEWS fragments, imagine you're a user of Twisted interested in this specific feature, and you're trying to understand what has changed. Reactor internals often have to change to support some other thing, so focus on the other thing.

Also: what's a "supported" level or type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, agreed. A good balance between a meaningful message for users while being technically detailed enough is not always easy - I'll give it a go, though.

PS: What "supported" means is also a very good question. I'll try to reword that, as well.

"descriptor received (from %(peerAddress)r)."),
hostAddress=self.getHost(), peerAddress=self.getPeer(),
protocolName=self._getLogPrefix(self.protocol),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nitpick: I prefer the style where this parenthesis is un-indented, so it's at the same indent level as the line with the open paren on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also prefer that style, I just went with what the existing code had. Changed.

@exvito
Copy link
Contributor Author

exvito commented Jan 31, 2017

This should be nearly ready. Two things may need attention and your input:

  • codecov test is failing, claiming that a particular line isn't covered by the tests; this line hasn't been changed, though. It's being included in the diff -3/+3 line boundaries. If this prevents merging, what would your advice be?
  • I reworded the top file per @glyph's note, trying to make it more accessible to Twisted users while maintaining technical details. I still feel it can be improved and your experience here will be valuable.

Thanks for your continued support and guidance.

@glyph
Copy link
Member

glyph commented Feb 16, 2017

O K A Y

Finally

@glyph glyph merged commit 3544422 into twisted:trunk Feb 16, 2017
@glyph
Copy link
Member

glyph commented Feb 16, 2017

OK, before merging, codecov reported a successful status, then it switched to a failing status, so I think it's just a tooling bug, and I'm not really sure what the issue is.

Hopefully future maintenance will cover those lines.

@exvito exvito deleted the 8912-exvito-safer-unix-do-read-tk2 branch February 16, 2017 11:27
@exvito
Copy link
Contributor Author

exvito commented Feb 16, 2017

@glyph, @exarkun, thanks for your support and guidance throughout this PR/ticket.
Your feedback really helped moving it forward.

@exarkun
Copy link
Member

exarkun commented Feb 16, 2017

My pleasure @exvito ! Thank you very much for the contribution and for persevering through the process. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants