Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

abortConnection() method for transports #7153

Closed
twisted-trac opened this issue Jul 15, 2003 · 79 comments
Closed

abortConnection() method for transports #7153

twisted-trac opened this issue Jul 15, 2003 · 79 comments

Comments

@twisted-trac
Copy link

itamarst's avatar @itamarst reported
Trac ID trac#78
Type defect
Created 2003-07-15 00:54:40Z
Branch https://github.com/twisted/twisted/tree/abortConnection-78-10

Attachments:

Searchable metadata
trac-id__78 78
type__defect defect
reporter__itamarst itamarst
priority__high high
milestone__ 
branch__branches_abortConnection_78_10 branches/abortConnection-78-10
branch_author__itamarst__jknight__exarkun__ivank__glyph itamarst, jknight, exarkun, ivank, glyph
status__closed closed
resolution__fixed fixed
component__core core
keywords__ 
time__1058230480000000 1058230480000000
changetime__1488724283180941 1488724283180941
version__ 
owner__itamar itamar
cc__glyph cc__itamarst cc__jknight cc__naked cc__exarkun cc__ivank cc__spiv cc__PenguinOfDoom cc__smira cc__atsuoi cc__kontinuity
@twisted-trac
Copy link
Author

itamarst's avatar @itamarst commented
#!html
<pre>
Similar to loseConnection(), but doesn't flush the write
buffers or wait for producer to finish. Useful for when you
know connection should die ASAP, because the other side is
totally screwed up anyway.

</pre>

@twisted-trac
Copy link
Author

glyph's avatar @glyph commented
#!html
<pre>
I hope you don't object.  BUT I DON'T CARE IF YOU DO, HAHA

</pre>

@twisted-trac
Copy link
Author

glyph's avatar @glyph commented
#!html
<pre>
This should probably just be an optional argument to
loseConnection, i.e.

    loseConnection(terminateTimeout=600)

A lot of code uses loseConnection() and expects the socket
to be closed eventually, but it is pretty easy to construct
an attack which will cause Twisted to never ever close a
socket this way.  By adding an optional argument to this
interface, we can enhance the semantics of the existing
method in a mostly-backwards-compatible way.

</pre>

@twisted-trac
Copy link
Author

itamarst's avatar @itamarst commented
#!html
<pre>
Sounds good. Maybe with a default timeout? Which should
probably be high enough that existing code won't be affected
by slow connections (and for producers this is especially an
issue, though I'm not sure if we have any code that depends
on loseConnection's producer interaction).

</pre>

@twisted-trac
Copy link
Author

glyph's avatar @glyph commented
#!html
<pre>
I bet that our existing code deals with
loseConnection+Producers differently in different reactors.
 That should be on our list of things to clean up when we
revamp the Producer API.

</pre>

@twisted-trac
Copy link
Author

itamarst's avatar @itamarst commented
#!html
<pre>
Actually it's in tcp.py (or actually maybe abstract.py), so
the reactor shouldn't matter.

</pre>

@twisted-trac
Copy link
Author

itamarst's avatar @itamarst commented
#!html
<pre>
Hm. Wouldn't loseConnection time out potentially break existing code? Like,
"send really huge file and then loseConnection". Of course, if the default
timeout was large enough it'd mostly be OK...
</pre>

@twisted-trac
Copy link
Author

naked's avatar naked commented
#!html
<pre>
I wrote a patch to this effect, although the method is called abortConnection().

My needs were to explicitly close a connection without waiting for write
availability on the socket/whatever first. Also this means that with SSL, we
don't send or wait for the reply to close notify, which is important.

There is a problem though, that in the normal case, connectionLost() is invoked
from the reactor after doWrite or doRead has returned CONNECTION_LOST or
CONNECTION_DONE. In this case, connectionLost() is called directly as a result
of calling abortConnection(). I do not know enough about twisted internals to
say if this is a serious problem or not, and how it should best be fixed.
</pre>

@twisted-trac
Copy link
Author

glyph's avatar @glyph commented
#!html
<pre>
We do have code that depends on that behavior.  Quite a bit, actually.  I think I prefer 
abortConnection(), since terminateTimeout is a convenience and will have all the issues normally 
associated with callLater.
</pre>

@twisted-trac
Copy link
Author

jyknight's avatar @jyknight commented
#!html
<pre>
I think there's two issues here. One is a API to abort the connection. The other is the timeout 
issue.

For the write timeout issue, I don't think an absolute timeout on loseConnection is the right 
thing. Instead, it should have behavior like: "If there is data in the write buffer, and I have not sent any 
bytes for the last minute, the connection is as good as dead." I believe this behavior can and should 
default to on without compatibility problems.
</pre>

@twisted-trac
Copy link
Author

jyknight's avatar @jyknight commented
#!html
<pre>
Here's a different patch. transport.abortConnection() loses the connection and tcp's _closeSocket is 
updated to do an abrupt TCP disconnect unless the reason is ConnectionDone. Seems like it works but 
mostly untested.
</pre>

@twisted-trac
Copy link
Author

foom's avatar foom commented

(In [17458]) An abortConnection() function for transport, and some tests.

Refs #7153

@twisted-trac
Copy link
Author

jyknight's avatar @jyknight set owner to @jyknight

@twisted-trac
Copy link
Author

kvogt's avatar kvogt commented

This is actually a pretty evil security hole. I logged into a server to see why it was out of memory, and found 1434 open connections with no activity. A malicious user was opening hundreds of connections to a twisted server but not reading from them. Since loseConnection() won't close the sockets unless the write buffer is empty, I have no way to fight this attack at the application level.

We have had a twisted IRC server and an entire twisted live video cluster (93 nodes) get destroyed by this exact exploit once a certain hacking group figured it out. In the mean time I'm patching in abortConnection2.diff, but I thought this real world example might be useful information for people trying to fix this.

@twisted-trac
Copy link
Author

itamarst's avatar @itamarst commented

I'm torn between happiness at live video clustering and sadness about the exploit. Upping priority, changing to defect.

@twisted-trac
Copy link
Author

glyph's avatar @glyph commented

I'm a little concerned about the form of the solution here. It means that every protocol that is not specifically designed to be resistant to this particular exploit (in other words, all the protocols in Twisted) will continue to be a problem. Perhaps a better way to deal with the exploit would be to have a default timeout on loseConnection if no data is being consumed by the client? Or an easy way to set up administrative limits on the number of simultaneous connections from a single IP?

I haven't thought too hard about the solution yet, but I definitely don't think we could consider this form of DoS dealt with if you have to hand-patch some code for every port that you want to open.

@twisted-trac
Copy link
Author

exarkun's avatar @exarkun commented

Mechanism separate from policy.

@twisted-trac
Copy link
Author

itamarst's avatar @itamarst commented

Obviously we should have abortConnection that is separate from loseConnection. The question is, should loseConnection additionally have some sort of a default timeout case where it calls abortConnection. loseConnection essentially is a policy, since it can be implemented in terms of producer API + abortConnection.

@twisted-trac
Copy link
Author

glyph's avatar @glyph commented

I was not trying to make a point about the validity of this ticket.

I was responding to the comment by kvogt that "This is actually a pretty evil security hole". This ticket isn't going to fix any security hole (or, more accurately, DoS). It may be a necessary prerequisite, but we should probably have a separate issue to deal with the larger problem.

@twisted-trac
Copy link
Author

ivank's avatar ivank commented

Many tests fail with abortConnection-78 when using iocpreactor. One issue is that it doesn't pass in the 'orderly' argument. I think there are other issues too.

@twisted-trac
Copy link
Author

ivank's avatar ivank commented

The patch causes almost all SSL-related tests to fail on Windows, even with the select reactor. If I discover why, I will post here.

@twisted-trac
Copy link
Author

ivank's avatar ivank commented

By SSL-related, I meant anything that uses SSL. Not SSL tests in particular.

@twisted-trac
Copy link
Author

ivank's avatar ivank commented

Why the tests fail on Windows:

On Windows, twisted.internet.tcp.Connection.connectionLost is incorrectly called with reason ConnectionLost even for "normal" disconnects.

[Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion.

Sometimes you see ConnectionDone, but most of the time it is the incorrect ConnectionLost. It "should" be called with reason ConnectionDone (the correct behavior, which you see on Linux)

This incorrect reason causes a disorderly _closeSocket on almost every SSL connection, which causes data loss and test failures.

Hopefully someone who knows how PyOpenSSL and OpenSSL work will comment :-)

@twisted-trac
Copy link
Author

itamarst's avatar @itamarst commented

Could you open a ticket for the windows connectionlost/connectiondone issue you discovered? With a failing test or minimal example if at all possible. Thanks!

@twisted-trac
Copy link
Author

exarkun's avatar @exarkun commented

Shouldn't we just fix this patch? The ticket is still open, after all...

@twisted-trac
Copy link
Author

itamarst's avatar @itamarst commented

Oh, I hadn't realized the bug was in the patch. Good plan!

@twisted-trac
Copy link
Author

truekonrads's avatar truekonrads commented

Lo and behold!
I have the connectionLost example nailed down:

from twisted.internet import win32eventreactor 
win32eventreactor.install()

from twisted.internet import reactor
from twisted.protocols.policies import TrafficLoggingFactory
from twisted.web.client import HTTPClientFactory,_parse
from twisted.python.util import println
from twisted.internet.defer import DeferredList
from OpenSSL import SSL

class NoVerifyClientContextFactory:
    """A context factory for SSL clients."""

    isClient = 1
    method = SSL.SSLv3_METHOD

    def getContext(self):
        def x(*args):
            return True
        ctx=SSL.Context(self.method)
        #print dir(ctx)
        ctx.set_verify(SSL.VERIFY_NONE,x)
        return ctx


url="https://online.lkb.lv/"
httpfactory=HTTPClientFactory(url)
dl=[]
for i in xrange(0,10):
    print i
    factory = TrafficLoggingFactory(httpfactory,"C:\\foo%i" %i)
    dl.append(httpfactory.deferred)
    pscheme,host, port,ppath = _parse(url)
    reactor.connectSSL(host, port, factory,NoVerifyClientContextFactory())
reactor.run()
DeferredList(dl).addCallback(lambda _:reactor.stop())                                     

Requirmentes are win32eventreactor and SSL connections!

@twisted-trac
Copy link
Author

ivank's avatar ivank commented

I've been looking at this again. I noticed that abortConnection in abstract.py does a very wrong thing -- it calls connectionLost synchronously, leading to some really broken behavior where a server calling abortConnection would have connectionLost called twice.

The 20091225 patch makes abortConnection empty the buffers and go through the typical loseConnection codepath, but with changes to return the correct error reason in _postLoseConnection. Hopefully someone knows if these changes are on the right track.

Like the previous patches, this patch patch breaks the IOCP reactor, and still has the OpenSSL problem mentioned above (so it breaks many tests that use SSL on Windows). It also needs a test for a server aborting a client, not just a client aborting. There's also likely an untested codepath in _sendCloseAlert, but it is beyond my abilities to describe.

@twisted-trac
Copy link
Author

ivank's avatar ivank commented

Not below, above.

@twisted-trac
Copy link
Author

ivank's avatar ivank commented

Here is why I suspected there is an untested codepath in _sendCloseAlert: before I changed this in _closeWriteConnection, all of the tests still passed:

-	        if result is main.CONNECTION_DONE: 
+	        if shouldClose:

@twisted-trac
Copy link
Author

itamarst's avatar @itamarst set owner to PenguinOfDoom

Pavel, could you take a look at the IOCP code and see if it makes sense to you? Thanks!

@twisted-trac
Copy link
Author

itamarst's avatar @itamarst commented

Ready for review, I guess. Some notes:

  1. I'd like feedback from Pavel about IOCP version.
  2. http://buildbot.twistedmatrix.com/boxes-supported?branch=/branches/abortConnection-78-8 has some remaining mysterious errors which seem like they shouldn't be my fault but maybe they are (py-without-models and the fedora builder - the win32 interface/adapter ones are known issue, I think.) Thoughts?
  3. The change to _oldtls appears necessary for tests to pass, but makes me slightly wary given _oldtls sets disconnected=True even when it's not really disconnected...
  4. Use callLater(0, f) to make protocol's connectionLost call non-reentrant. Sorry, but there's no better API at the moment; we can fix it along with all the other internal uses of this idiom when we do have a better API.

@twisted-trac
Copy link
Author

itamarst's avatar @itamarst removed owner

Anyone can review!

@twisted-trac
Copy link
Author

exarkun's avatar @exarkun set owner to @itamarst
  1. The buildbot failures look pretty unrelated to me, but the rhel/fedora failures on gtk may mean that these new tests push us over some limit (remember, gtk leaks file descriptors like crazy, probably at least 1 per reactor).
  2. The howto says that connectionLost will be invoked re-entrantly, but it looks like the implementation goes for non-re-entrant.
  3. in _SocketCloser._closeSocket, I'm not sure how accurate the last bit of the comment is. I think SO_LINGER affects shutdown the same way it affects close. It seems fine to skip the shutdown in the non-orderly case, but if it's not actually required then the comment shouldn't say it is.
  4. Oh no, Connection._aborting. This absolutely must to be the last ad hoc state attribute we add to transports. After this, anything new needs to be added to an explicit state machine. For now, please add documentation for the attribute.
  5. Oh no, hasattr in Connection.connectionLost. Sure would be nice to have an explicit state machine. :)
  6. the abortConnection in twisted/internet/tcp.py is almost identical to twisted/internet/iocpreactor/tcp.py. They probably should be identical, and it would be nice to avoid the duplication.
  7. In twisted/internet/test/test_tcp.py, minimizeTCPBuffers docstring is misformatted. Also, it'd be nice to know why this is a useful thing to do, either with more docstring or comments at the call sites.
  8. Also, it has an XXX in it.
  9. StreamingProducerClient.write seems to provoke pauseProducing after just a single write. Does it need to do 99 more? Could it early out after pauseProducing? And should it be upset if it gets through all the writes without being paused?
  10. Why 0.05 in StreamingProducerClient.pauseProducing?
  11. in twisted/internet/test/test_tls.py, ContextGeneratingMixin should be new-style.
  12. And it looks like AbortSSLConnectionTest.buildReactor could use self.patch to be a little simpler.
  13. The news fragment could talk about TCP and SSL I guess, since SSL seems to support this feature as well. Also that comma in the middle is gross. :) ... abortConnection() which, unlike loseConnection(), always ...
  14. Which test exercises the self.stopWriting() fix in twisted/internet/iocpreactor/abstract.py (line ~365)?
  15. I have some reservations about the ReadAbortServerProtocol-based tests. These are providing a pretty strong (but fuzzy) guarantee - no bytes written in the same reactor iteration as an abortConnection call will be sent. If we wanted to switch back to synchronous write attempts, these tests will become unhappy. I don't know what to do about this, maybe nothing.
  16. The docstrings for the multi-abortConnection-call test methods confused me into thinking that multiple calls to loseConnection would raise an exception.
  17. All the runAbortTest-based tests are great, as far as readability goes. It took a couple minutes to figure out what runAbortTest does, but then the individual test methods were a breeze.
  18. Some of their docstrings could be expanded though. dataReceived calls abortConnection(), and then throws exception. - yea, then what?

@twisted-trac
Copy link
Author

itamarst's avatar @itamarst commented

Re #11338... not really sure what you're suggesting. None of the new tests open many connections...

Re #14, quite possibly there is no test covering it. But I've written enough convoluted tests. If you really feel this is a problem, I can revert that fix :)

All other comments have been addressed. I just have to figure out why merging forward breaks a test.

@twisted-trac
Copy link
Author

itamarst's avatar @itamarst commented

(In [32604]) Branching to 'abortConnection-78-9'

@twisted-trac
Copy link
Author

itamarst's avatar @itamarst commented

Noooooooooooooooooooooooooooooooooo f u cfreactor:

[ERROR]
Traceback (most recent call last):
Failure: exceptions.RuntimeError: BUG: We should be paused a this point.

twisted.internet.test.test_tls.AbortSSLConnectionTest_CFReactor.test_fullWriteBufferAfterByteExchange

Doesn't happen on the other OS X builder, strangely. Can I just ignore that? Please? Or mark it failing? I don't want to have to deal with another crappy broken reactor :(

There's still also random mystery errors elsewhere (item 1) -- suggestions?

http://buildbot.twistedmatrix.com/boxes-supported?branch=/branches/abortConnection-78-9

@twisted-trac
Copy link
Author

itamarst's avatar @itamarst removed owner

@twisted-trac
Copy link
Author

exarkun's avatar @exarkun commented

Re #11338... not really sure what you're suggesting. None of the new tests open many connections...

You added new tests though. And each test probably leaks a file descriptor.

@twisted-trac
Copy link
Author

glyph's avatar @glyph commented

Replying to exarkun:

Re #11338... not really sure what you're suggesting. None of the new tests open many connections...

You added new tests though. And each test probably leaks a file descriptor.

For those who might see this and not be hip enough to have our entire ticket database memorized, exarkun's talking about: http://twistedmatrix.com/trac/ticket/4482

@twisted-trac
Copy link
Author

glyph's avatar @glyph commented

Or... no, that's a different bug? Hrm. I can't find a bug report about FD leaks in gtkreactor. I thought I knew about this though. Anyone have a reference handy?

@twisted-trac
Copy link
Author

glyph's avatar @glyph commented

I assert that GTK does not leak file descriptors when simply running the reactor, and I have attached a file which I believe proves it. Perhaps this isn't true of our rhel6 buildbot, so someone can run this file there and let me know if there's a problem. However, I've run this program on every Ubuntu release back to Hardy and none of them exhibit the described behavior.

@twisted-trac
Copy link
Author

exarkun's avatar @exarkun commented

Not exactly what I thought I was talking about, but it seems likely to be related anyway: https://bugzilla.gnome.org/show_bug.cgi?id=579406. I think you didn't see the problem on Ubuntu because eventually Ubuntu backported the fix, but maybe RHEL did not.

@twisted-trac
Copy link
Author

itamarst's avatar @itamarst commented

OK, I think I know what to do now. Taking off from review until I can get buildbot in better shape.

@twisted-trac
Copy link
Author

itamarst's avatar @itamarst commented

http://buildbot.twistedmatrix.com/boxes-supported?branch=/branches/abortConnection-78-9

We're down to failures that are even more mysterious and not-my-faulty.

Only unaddressed review comment is 14; see comments above. I can, if you want, remove the fix and open new ticket (Jerub claims to have seen bugs caused by it, so possibly writing a test isn't impossible).

@twisted-trac
Copy link
Author

exarkun's avatar @exarkun set owner to @itamarst
  1. The twisted/test/test_tcp_internals.py will fail to set the new rlimit if the number of open files is close to (within 10) the open file limit. It would also be nice if this change were made in a separate ticket/branch.
  2. The twisted/test/test_ssl.py change seems fairly unrelated. It would be nice if it were made in a separate ticket/branch.
  3. The twisted/internet/iocpreactor/abstract.py still needs a test. Maybe if the ratio of test difficulty to "fix" difficulty is so high, the fix can be made before the test - but that still involves understanding the case it fixes and being confident that it really works. Either way, it would be nice if this were done in a separate ticket/branch.
  4. lambda foo: bar(foo) is an antipattern. In twisted/internet/test/test_tls.py, in AbortSSLConnectionTest.buildReactor, just use cooperator.cooperate as the patch value.
  5. In twisted/internet/test/test_tcp.py, in the minimizeTCPBuffers docstring, likely makes me uncomfortable. Are the tests that use this helper non-deterministic? And is the IOCP case going to be more likely to fail (or pass?) spuriously due to its lack of buffer size tweaks?
  6. In twisted/internet/test/test_tcp.py, the trick with Xs and Ys used by ReadAbortServerProtocol and AbortingClient could probably be explained.
  7. Typo in StreamingProducerClient.pauseProducing's docstring - Wwe.
    1. Also, the behavior the callLater(0.05) is working around seems like a bug in Twisted. The reactor should really be moving bytes on to the next level more aggressively (I have some numbers that show this is a performance win too). Separate ticket, of course. And then the comment above the 0.05 can link to that ticket, and we can get rid of the delayed call when that ticket is resolved.
  8. The SSL handling in runAbortTest is sort of inverted. It would be better to let a subclass override something to define any extra exceptions. This can be fixed later though (it's a nice, easy refactoring ticket for some newbie).
  9. in r32606, was it the addition of the loop, or the corrected handling of the paused attribute that fixed the tests with a "modern version of pyOpenSSL"?
  10. TLS test_fullWriteBufferAfterByteExchange is still pretty slow. It does 300 writes now instead of 100, and writes 90MB instead of the 30MB it wrote last time. I really want all of twisted.internet.test.test_tls to take under a second. In my ideal universe, it also wouldn't take 90MB of data to test basic transport features.

Thanks.

@twisted-trac
Copy link
Author

itamarst's avatar @itamarst commented

Created ticket #5301 for item 1.

Created ticket #5302 for item 2.

Created ticket #5300 for item 3.

@twisted-trac
Copy link
Author

itamarst's avatar @itamarst commented

(In [32809]) Address review items 4-7. Refs #7153

@twisted-trac
Copy link
Author

itamarst's avatar @itamarst commented

I got rid of the minimize thing, it's not strictly necessary and the tests are still reasonable without it, and also dealt with 6 and 7.

Re 8, I will file a ticket after this branch is merged.

As far as 9, I think it was the bug fix, not the loop, so I lowered the number down to 100 (30MB write)

As far as 10 goes, the above means it's faster and writes less... but I think there may actually be a bottleneck in TLS. I don't think crypto is so slow that it should add add 2 CPU seconds, and it's much slower if I transfer same number of bytes with a smaller number of (larger) writes. Anyway, it's down to 3 seconds on my computer now. I don't want to lower it much more, because the idea is to fill up buffers.

Since I've addressed all review comments, I'll put this up for review once I can merge forward to include the fix for #5301.

@twisted-trac
Copy link
Author

itamarst's avatar @itamarst commented

(In [32824]) Branching to 'abortConnection-78-10'

@twisted-trac
Copy link
Author

thijstriemstra's avatar @thijstriemstra commented

Can you also add @since for the new public methods and classes?

@twisted-trac
Copy link
Author

itamarst's avatar @itamarst commented

Good point, will do. And maybe rename the abort mixin to start with an _ if it doesn't already.

@twisted-trac
Copy link
Author

@twisted-trac
Copy link
Author

exarkun's avatar @exarkun set owner to @exarkun

@twisted-trac
Copy link
Author

exarkun's avatar @exarkun set owner to @itamarst

Okay. I think this is good to merge. Thanks!

@twisted-trac
Copy link
Author

itamarst's avatar @itamarst set status to closed

(In [32851]) Merge abortConnection-78-10.

Author: itamar, jknight, exarkun, ivank, glyph
Review: exarkun, thijs
Fixes: #7153

Add abortConnection() method to TCP and TLS transports.

@twisted-trac
Copy link
Author

koebenhavn's avatar koebenhavn commented

Just for refrence regarding SO_LINGER:

Portability note 1: Some implementations of the BSD socket API do not implement SO_LINGER at all. On such systems, applying SO_LINGER either fails with EINVAL or is (silently) ignored. Having SO_LINGER defined in the headers is no guarantee that SO_LINGER is actually implemented.

Portability note 2: Since the BSD documentation on SO_LINGER is sparse and inadequate, it is not surprising to find the various implementations interpreting the effect of SO_LINGER differently. For instance, the effect of SO_LINGER on non-blocking sockets is not mentioned at all in BSD documentation, and is consequently treated differently on different platforms. Taking case 3 for example: Some implementations behave as described above. With others, a non-blocking socket close() succeed immediately leaving the rest to a background process. Others ignore non-blocking'ness and behave as if the socket were blocking. Yet others behave as if SO_LINGER wasn't in effect [as if the case 1, the default, was in effect], or ignore linger->l_linger [case 3 is treated as case 2]. Given the lack of
adequate documentation, such differences are not (by themselves) indicative of an "incomplete" or "broken" implementation. They are simply different, not incorrect.
Portability note 3: Some implementations of the BSD socket API do not implement SO_LINGER completely. On such systems, the value of linger->l_linger is ignored (always treated as if it were zero).

/Event.

Replying to naked:

I think sending RST is somewhat a separate issue.

The main issue is freeing resources for the userland process. Just calling close() on the socket achieves this well enough (as long as no pending writes are waited and no SSL close is waited). This looks like an orderly shutdown to the remote end (unless there's SSL) in most cases. The exception to this is if the other end sends a data packet after we have called close() - in this case, the kernel will send a RST packet to the connection. (This could be avoided by never calling close(), but by calling shutdown() to close the sending end of the connection only.)

However, if the connections are indeed malicious, it might be that the wish is to free up resources for the kernel as well - atleast as much as is possible. Normally, the kernel keeps sending already written data after close(), and keeps handling the socket shutdown event FIN exchanges. If a socket is closed with SO_LINGER on and it's timeout set to zero, the kernel will discard its write buffers and send a RST to the other end. (I'm not sure if this could end up being an orderly shutdown if buffers are empty.) This way the kernel can free up most of the memory immediately, but it will still keep TIME_WAIT sockets around for quite a while.

But, I must say, I believe Twisted is going to be the only project anywhere that is going to bother with tricking around with SO_LINGER. And the whole thing is actually even somewhat non-portable.

The feature has been missing for 8 years - and for 8 years I've had to add the abortConnection() API to every single server I've written, since every one of them have had a need for it. Perhaps it is time to finally slay the beast?

@twisted-trac
Copy link
Author

itamarst's avatar @itamarst commented

Thanks for the info. So it sounds like there's a problem case where certain BSDs (which?) will block on close() if SO_LINGER is set to 0, even for non-blocking sockets? Or am I misunderstanding?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant