Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#9446] iocp reactor is duplicating/loosing outgoing data #1025

Merged
merged 19 commits into from
Mar 24, 2019

Conversation

asodeur
Copy link
Contributor

@asodeur asodeur commented Jun 6, 2018

iocpreactor.abstract.FileHandle.doWrite() can get called again before ._handleWrite() got the chance to update the bytes already written if .write() gets called while a .writeToHandle() is still in flight.

  • Changed .startWriting() to only schedule a call to ._resumeWriting() when self.writing == False.
  • Madetest_tcp.LargeBufferTests write the last byte in a separate .write().

@codecov
Copy link

codecov bot commented Jun 6, 2018

Codecov Report

Merging #1025 into trunk will decrease coverage by 0.35%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##            trunk    #1025      +/-   ##
==========================================
- Coverage   91.93%   91.57%   -0.36%     
==========================================
  Files         846      846              
  Lines      151297   151301       +4     
  Branches    13196    13196              
==========================================
- Hits       139093   138558     -535     
- Misses      10107    10619     +512     
- Partials     2097     2124      +27

@wiml
Copy link
Contributor

wiml commented Oct 16, 2018

This looks good to me, although I'm not familiar with the IOCP API at all.

To make sure I understand the code, it looks like .writing is supposed to be true if a write is scheduled via _writeScheduled or if a write is in flight via the completion port callback + deferred chain + etc.. It seems to me that there are some paths through doWrite() which could leave writing true even if nothing is in progress, which would cause the file handle to become unusable. Does that seem like the case to you?

(If so, perhaps it's worth fixing it in a separate PR so that this one can be committed…)

@wiml wiml requested review from wiml and exarkun and removed request for wiml October 17, 2018 17:00
@oberstet
Copy link

fwiw, would be cool to land this, as the select reactor is limited in number of connections and the iocp reactor has this issue on high load (eg crossbario/crossbar#1347) - leaving windows/crossbar users (outside of dev/test use) currently sad ..

"client didn't receive all the data it expected "
"(%d != %d)" % (clientF.len, self.datalen))
self.assertTrue(clientF.len <= self.datalen,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the point of this assertion?
Why the previous one was not good? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current IOCP reactor shows rather interesting behaviour. ~10% of runs are ok, ~80% will loose data, and the remaining ~10% return duplicate data. Of course a simple == assertion would do, but the error message should not state data is missing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe the assertion message can be changed from didn't received all the data to didn't receive the expected data ... and replace with the dedicated self.assertEqual

but this is minor comment ...

what I would like to see, is this tests executed without the patch applied and see that it constantly fails
and then when applying the patch we have a solid green test :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making test fail with probability 1 on the unpatched IOCP reactor will be hard IMHO. The issue is with the interaction of twisted with the Windows network stack. You had to mock that to get deterministic behaviour. You could run the test multiple times which would obviously reduce chances of false passes.

That said, this is the commit that changed the test. Everything else (including the IOCP reactor) is still unchanged. You might give that a spin to get an idea of the frequency of false passes ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A mock that emulates the behavior of the IOCP APIs under their various failure modes would be a welcome addition, especially to help other maintainers who may want to clean things up but don't know as much about IOCP. That said, it sounds like that would be a good followup.

self.transport.loseConnection()
self.transport.write(b'X'*(self.factory.len-1))

def finish(): # write last byte separately to check issue #9446
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that the inline command can be set as a docstring for this function.
Rather than "check issue #9446", maybe this command can describe the targeted behaviour regardless of the ticket number.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commited change

@@ -1363,15 +1363,21 @@ def test_buildProtocolClient(self):


class LargeBufferWriterProtocol(protocol.Protocol):

# Win32 sockets cannot handle single huge chunks of bytes. Write one
# massive string to make sure Twisted deals with this fact.

def connectionMade(self):
# write 60MB
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still write 60MB? it looks like there are 60MB-1 ... I guess that we can remove this comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commited change

@oberstet
Copy link

cool, thanks for merging! also pinged OP to check if the issue is actually gone as it surfaced at the user level (crossbar in this case) and iocp now works also under high load (backpressure)

@glyph
Copy link
Member

glyph commented Mar 24, 2019

It sounds like we've gotten lots of feedback on this and it improves the situation, everybody feels like it's good, but nobody wants to approve it.

Sounds like it's time for somebody to be the hero that Twisted needs.

Copy link
Member

@glyph glyph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to merge this once CI is happy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants