Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid unnecessary parsing of content-type header in ServerInboundHandler #2644

Merged
merged 3 commits into from
Jan 24, 2024

Conversation

kyri-petrou
Copy link
Collaborator

While #2597 improved the performance of parsing the content-type header significantly, it is still a relatively expensive computation (w.r.t everything else) so if we can avoid it, why not? While most applications won't see any measurable difference with this change, I'm certain it'll make a difference on benchmarks that include the content-type header as part of the request.

To demonstrate the difference, I added a benchmark in ServerInboundHandlerBenchmark that includes the content type header:

Before:

[info] Benchmark                                                  Mode  Cnt      Score     Error  Units
[info] ServerInboundHandlerBenchmark.benchmarkSimpleContentType  thrpt   10  30047.175 ± 245.613  ops/s

After:

[info] Benchmark                                                  Mode  Cnt      Score     Error  Units
[info] ServerInboundHandlerBenchmark.benchmarkSimpleContentType  thrpt   10  32728.912 ± 182.722  ops/s

I rerun the benchmark 5-10 times and it seems that avoiding the parsing of the content-type header is consistently 5-10% faster

@kyri-petrou kyri-petrou changed the title Avoid unnecessary parsing of content-type header in ServerInboundHandler Avoid unnecessary parsing of content-type header in ServerInboundHandler Jan 21, 2024

override def contentType(newMediaType: MediaType, newBoundary: Boundary): Body =
copy(mediaType = Some(newMediaType), boundary = boundary.orElse(Some(newBoundary)))
copy(mediaType0 = () => Some(newMediaType), boundary0 = () => boundary.orElse(Some(newBoundary)))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, is this the correct behaviour? I'm a bit confused on why we would keep the old boundary if it was defined. Shouldn't this be boundary = Some(newBoundary) instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably so

@codecov-commenter
Copy link

codecov-commenter commented Jan 21, 2024

Codecov Report

Attention: 8 lines in your changes are missing coverage. Please review.

Comparison is base (6930757) 65.38% compared to head (6b83fba) 65.34%.

Files Patch % Lines
.../jvm/src/main/scala/zio/http/netty/NettyBody.scala 52.94% 8 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2644      +/-   ##
==========================================
- Coverage   65.38%   65.34%   -0.05%     
==========================================
  Files         144      144              
  Lines        8407     8414       +7     
  Branches     1532     1486      -46     
==========================================
+ Hits         5497     5498       +1     
- Misses       2910     2916       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kyri-petrou
Copy link
Collaborator Author

The test failure is due to a flaky test ScalaJS. I opened an issue for it; I tried rerunning the entire test suite via an empty commit but the test failed again so it's perhaps better to just restart the failed run

contentTypeHeader.map(_.mediaType),
contentTypeHeader.flatMap(_.boundary),
)
contentTypeHeader: () => Option[Header.ContentType] = () => None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that allocation of Function0 can be measured in benchmarks.

I would suggest the following:

  1. Statically make Header.ContentType for common content types (e.g. Header.ContentType.application/json`)
  2. Use fast comparison (or maybe hash map lookup) to decide which Header.ContentType to map to which standard Netty header. If Netty reuses the same char sequence, it could be even faster (eq).

If done properly this can as fast as Function0 and will also be less disruptive to the code.

Copy link
Member

@jdegoes jdegoes Jan 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh also we can do:

Header.ContentType.`some-application/json` 

to pre-allocate the option.

asciiString: AsciiString,
override val mediaType: Option[MediaType] = None,
override val boundary: Option[Boundary] = None,
private val mediaType0: () => Option[MediaType] = () => None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same story here for MediaType. It can also eliminate the Function0 allocation.

override val mediaType: Option[MediaType] = None,
override val boundary: Option[Boundary] = None,
private val mediaType0: () => Option[MediaType] = () => None,
private val boundary0: () => Option[Boundary] = () => None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boundary type is tricky because it comes in dynamically. However, it would be nice to eliminate the more complex API here and the Function0 allocation. Did you measure to see if this has any impact?

@kyri-petrou
Copy link
Collaborator Author

@jdegoes thanks for the feedback. While reading though your comments, I realised that we already do cache media types (via a hash map). I reverted all the Function0 changes and utilized MediaType.forContentType to parse the header in ServerInboundHandler (which is 1 or 2 orders of magnitude faster than ContentType.parse, depending whether the content-type header contains parameters (2 orders if there aren't any parameters).

With the changes in the latest commit, the performance is slightly improved (compared to the previous commit), but the biggest benefit is that the API is much cleaner

[info] Benchmark                                                  Mode  Cnt      Score     Error  Units
[info] ServerInboundHandlerBenchmark.benchmarkSimpleContentType  thrpt   10  33409.354 ± 155.748  ops/s

The only "downside" to these changes is the use of .toLowerCase. To make sure there aren't any performance implications associated with it, I added a benchmark to compare whether it makes any difference and it was negllible (~10 nanoseconds slower 🤷‍♂️ )

In addition, I shamelessly broke binary / source compatibility of the NettyBody.fromByteBuf method by changing the signature and making it private. I can revert this change and expose a different method; the main reason I did it was that I'm not sure whether the method is meant to be public at all. If that's not the case, please let me know and I'll revert the change

@kyri-petrou
Copy link
Collaborator Author

PS: Build is failed due to Sonatype 502 error. Should work after a restart

@jdegoes jdegoes closed this Jan 24, 2024
@jdegoes jdegoes reopened this Jan 24, 2024
@jdegoes jdegoes merged commit 351f6ed into zio:main Jan 24, 2024
21 of 27 checks passed
@kyri-petrou kyri-petrou deleted the lazy-content-type-parsing branch January 24, 2024 21:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants