Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#11989 Lots of small writes to the TLS transport use a lot of cpu #11996

Merged

Conversation

itamarst
Copy link
Contributor

@itamarst itamarst commented Sep 19, 2023

Scope and purpose

Fixes #11989

This makes TLS much faster when it involves many small writes (see here for a before/after benchmark - actual results are slightly worse but it's still a big improvement).

As a consequence of relying on the passage of time, some tests needed to be updated to advance time with a Clock.

Contributor Checklist:

This process applies to all pull requests - no matter how small.
Have a look at our developer documentation before submitting your Pull Request.

Below is a non-exhaustive list (as a reminder):

  • The title of the PR should describe the changes and starts with the associated issue number, like “#9782 Remove twisted.news. #1234 Brief description”.
  • A release notes news fragment file was create in src/twisted/newsfragments/ (see: Release notes fragments docs.)
  • The automated tests were updated.
  • Once all checks are green, request a review by leaving a comment that contains exactly the string please review.
    Our bot will trigger the review process, by applying the pending review label
    and requesting a review from the Twisted dev team.

@itamarst itamarst linked an issue Sep 19, 2023 that may be closed by this pull request
@itamarst itamarst requested a review from a team September 19, 2023 20:45
@hacklschorsch
Copy link

🚀!

@itamarst itamarst requested a review from a team September 25, 2023 13:31
Copy link
Member

@adiroiban adiroiban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.

I think that this is an important change and would be happy to see this merged.

I left some comments.
I hope that the comment will make it easier for the next person to review this PR.

I am -1 on style in which the tests are written... but if other Twisted devs are happy with the style, I don't block the merge.

Maybe the current tests are ok, as unit tests for AggregateSmallWrites, but I prefer to have AggregateSmallWrites as a private implementation and write unit and integration tests for the new public BufferingTLSTransport

In terms of test cases for BufferingTLSTransport:

  • Receive a new small chunk to write ... see that it is not written right away... it is written at the next iteration... on following iteration, nothing happens
  • Receive a new small chunk and then another small chunk... still not written right way, but sent together on next reactor iteration
  • Receive a new small chunk and then a big chunk, check that it's written right away

I hope it helps. Thanks again for the PR.

src/twisted/protocols/tls.py Outdated Show resolved Hide resolved
src/twisted/test/test_sslverify.py Outdated Show resolved Hide resolved
src/twisted/protocols/tls.py Outdated Show resolved Hide resolved
src/twisted/protocols/tls.py Show resolved Hide resolved
src/twisted/protocols/tls.py Outdated Show resolved Hide resolved
src/twisted/protocols/test/test_tls.py Show resolved Hide resolved
Copy link

@meejah meejah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM (some inline comments).

I am interested in the "downstream" mentions and implications for such projects. I looked quickly in txtorcon (and all my tests pass against this branch) and this leads me to an additional question for downstream:

If one is using TLSMemoryBIOFactory do we see the benefits of this buffering, or not?

src/twisted/protocols/test/test_tls.py Show resolved Hide resolved
src/twisted/protocols/tls.py Outdated Show resolved Hide resolved
@adiroiban
Copy link
Member

If one is using TLSMemoryBIOFactory do we see the benefits of this buffering, or not?

If you use TLSMemoryBIOFactory with the default protocol, you should automatically get these benefits.

The default protocol for TLSMemoryBIOFactory is now the buffering TLS protocol.

@itamarst itamarst requested a review from a team October 10, 2023 19:59
@adiroiban
Copy link
Member

@itamarst I see that you have requested a new review, but I don't see any code changes, other than a merge.

For example, I see that you marked the comment for the class AggregateSmallWrites: # TODO make this private for now? line as resolved, but the TODO comment is still there, and the class is still public.

Is that on purpose ?

THanks

…s-transport-use-a-lot-of-cpu' into 11989-lots-of-small-writes-to-tls-transport-use-a-lot-of-cpu
@itamarst
Copy link
Contributor Author

Conflict meant my push failed 🤦, fixed.

st.lists(
st.one_of(
st.none(),
st.integers(min_value=1, max_value=100_000).map(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more of a philosophical comment in the context of Twisted dev process.

We don't have much time dedicated to doing code reviews.

I had to read this code a few times to understand what is going on here and how the test is structured.

I think that this can discourage a review.

Someone might not have much to do a review, will read the code and will just stop doing the review.

I think that a lot of the contribution to Twisted are done as a 2nd job, without too much time dedicated to doing code reviews.

Now you can argue that if you don't have time to do top quality code review, is better not do do it ... but then we end up with nobody doing reviews.


I understand that the current test input generate is comprehensive and define in a brief way.
I find it hard to read.

I think is better to have comment close this definition.

For example, I don't understand why we need this whole integer to bytes mapping here.
Why not use st.binary() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't do st.binary() because my first though is it means it spends time searching the input space random bytes, and this test isn't about the contents of the bytes, it's about the lengths.

Copy link
Member

@adiroiban adiroiban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes looks good. Many thanks for the updated.

I left a few minor comments.


I also left a few philosophical questions, but they should not block this PR.

I am +1 for using test input generators, instead of static/hardcoded values.

I am using custom generators in my private projects and I think that is good to have a 3rd party library that can be used to share these type of generators.

But I am not sure if st.binary() is good enough and will not lead to flaky tests.

For my bytes generator helper, I try to always generate bytes that will fail UTF-8 or UTF-16 decoding. In this way, it should help identity cases in which binary data is accidentally handled as text.

src/twisted/test/iosim.py Outdated Show resolved Hide resolved
src/twisted/test/test_iosim.py Show resolved Hide resolved
"""
Return the default reactor.

This is a function so it can be monkey-patched in tests; see #5206.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have #11991 merged so that we will not need this ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed one part, but this particular one is still necessary for test_agent.py, unfortunately.

@itamarst
Copy link
Contributor Author

I added a minimal end-to-end test of small-write buffering, just to show it's hooked up.

@adiroiban
Copy link
Member

Thanks for the update.

Ohhhh, you just don't want 0.000001. That's easy to fix, it's not actually necessary.

yes. My issue was with 0.00001 vs just 0 .

As long as the time is not changed by iosim, it should be ok.


Thanks for the info about hypothesis decorators. It makes sense.

So maybe we just need a helper like st.binaryLengthVariation(min, max) that will generate bytes of various lengths.

But this should not be a blocker for this PR.


Also thanks for the new integration tests :)

As soon as Github Actions is happy, I think that this should be merged.

Thanks again for the PR.

@itamarst itamarst merged commit 105a9f5 into trunk Oct 12, 2023
22 of 23 checks passed
@itamarst itamarst deleted the 11989-lots-of-small-writes-to-tls-transport-use-a-lot-of-cpu branch October 12, 2023 21:03
@hacklschorsch
Copy link

🎉 🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Lots of small writes to TLS transport use a lot of CPU
8 participants