Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: Cookie decoder improvements #576

Merged
merged 14 commits into from
Jan 3, 2022

Conversation

jgoday
Copy link
Contributor

@jgoday jgoday commented Nov 26, 2021

See #571.

Reduce Cookie case class allocations in Cookie.decodeResponseCookie.
Right now, it allocates a new Cookie instance with each field (using Cookie.copy through several methods).
For example,
it allocates 3 instances for "SUID=123;httponly=true;expires=2007-12-03T10:15:30.00Z".

This change parses each field separately and then creates one single Cookie instance.

This simple jhm benchmark show the two approaches

package sample

import java.util.concurrent.TimeUnit
import org.openjdk.jmh.annotations._
import zhttp.http.Cookie

@State(Scope.Thread)
@BenchmarkMode(Array(Mode.Throughput))
@OutputTimeUnit(TimeUnit.MICROSECONDS)
class CookieDecode {
  @Benchmark
  def benchmarkDecode(): Unit = {
    val value="SUID=123;httponly=true;expires=2007-12-03T10:15:30.00Z"
    val _ = Cookie.decodeResponseCookie(value)
    ()
  }
}

Parsing and copying Cookie case class


# Warmup Iteration   1: 0,141 ops/us
[info] # Warmup Iteration   2: 0,151 ops/us
[info] # Warmup Iteration   3: 0,206 ops/us
[info] Iteration   1: 0,217 ops/us
[info] Iteration   2: 0,168 ops/us
[info] Iteration   3: 0,147 ops/us
[info] Result "sample.CookieDecode.benchmarkDecode":
[info]   0,177 ┬▒(99.9%) 0,653 ops/us [Average]
[info]   (min, avg, max) = (0,147, 0,177, 0,217), stdev = 0,036
[info]   CI (99.9%): [? 0, 0,831] (assumes normal distribution)
[info] # Run complete. Total time: 00:01:01

Parsing fields with mutable vars

[info] # Warmup Iteration   1: 0,340 ops/us
[info] # Warmup Iteration   2: 0,375 ops/us
[info] # Warmup Iteration   3: 0,354 ops/us
[info] Iteration   1: 0,369 ops/us
[info] Iteration   2: 0,375 ops/us
[info] Iteration   3: 0,349 ops/us
[info] Result "sample.CookieDecode.benchmarkDecode":
[info]   0,364 ┬▒(99.9%) 0,244 ops/us [Average]
[info]   (min, avg, max) = (0,349, 0,364, 0,375), stdev = 0,013
[info]   CI (99.9%): [0,120, 0,608] (assumes normal distribution)
[info] # Run complete. Total time: 00:01:00

@jgoday jgoday force-pushed the Fix/cookie-decoder-allocations branch from 821a10c to 1cd3d23 Compare November 26, 2021 22:08
@codecov-commenter
Copy link

codecov-commenter commented Nov 26, 2021

Codecov Report

Merging #576 (58ee1bf) into main (57d6126) will decrease coverage by 8.02%.
The diff coverage is 81.94%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #576      +/-   ##
==========================================
- Coverage   57.36%   49.34%   -8.03%     
==========================================
  Files          66       68       +2     
  Lines        1466     2061     +595     
  Branches       55       53       -2     
==========================================
+ Hits          841     1017     +176     
- Misses        625     1044     +419     
Impacted Files Coverage Δ
zio-http/src/main/scala/zhttp/http/Cookie.scala 63.47% <81.94%> (+9.06%) ⬆️
.../scala/zhttp/service/client/ClientSSLHandler.scala 0.00% <0.00%> (-100.00%) ⬇️
...cala/zhttp/service/server/OptionalSSLHandler.scala 0.00% <0.00%> (-78.58%) ⬇️
.../scala/zhttp/service/server/ServerSSLHandler.scala 16.66% <0.00%> (-66.67%) ⬇️
...cala/zhttp/service/server/HttpOnHttpsHandler.scala 0.00% <0.00%> (-61.54%) ⬇️
...ttp/src/main/scala/zhttp/service/HttpRuntime.scala 35.71% <0.00%> (-60.72%) ⬇️
zio-http/src/main/scala/zhttp/http/HttpData.scala 29.41% <0.00%> (-32.13%) ⬇️
...la/zhttp/service/client/ClientInboundHandler.scala 71.42% <0.00%> (-28.58%) ⬇️
...io-http/src/main/scala/zhttp/service/Handler.scala 65.78% <0.00%> (-28.19%) ⬇️
...http/service/client/ClientChannelInitializer.scala 85.71% <0.00%> (-14.29%) ⬇️
... and 57 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 57d6126...58ee1bf. Read the comment docs.

),
)
} else {
None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, you have almost doubled the perf!

You can further reduce allocations by —

  1. Get rid of all the Option and use null instead.
  2. Every map operation is creating a new collection (more memory), use a for/while loop if possible
  3. Get rid of interim Function and Tuple also.
  4. Don't allocate the String inside the method, pull them outside so that thy are allocated only once.

NOTE: You can have a method called - unsafeDecodeResponseCookie that returns a null or a Cookie for performance and make it package private. Then also have a decodeResponseCookie which calls unsafeDecodeResponseCookie and wraps it in an Option

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some changes to avoid string split and some allocations.

Marked internal functions parseDate and splitNameContent @inline functions, not sure if it could be a problem ...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update the benchmark numbers as before and after. I think we can make more changes for performance :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using None use null. This can cause null pointer exceptions but that should be fine because we will rename this method to unsafeDecodeResponseCookie
and make it package-private. And, we will have a separate method decodeResponseCookie which calls unsafeDecodeResponseCookie and wraps the output in an Option. And that should be a public method.

val c = if (i <= 0) headerValue.substring(prev) else headerValue.substring(prev, i)
val (n, ct) = splitNameContent(c)

(n.toLowerCase, ct) match {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary tuple allocation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tusharmath I have mode some changes to avoid string and tuple unnecessary allocations, inner loop is not the fanciest but I think that this way you get more performance ... what do you think ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you commit the Benchmark code to the benchmarks project.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, added CookieDecodeBenchark to zio-http-benchmarks (very naive at the moment)

@amitksingh1490 amitksingh1490 changed the base branch from release/next to main December 14, 2021 09:40
case _ => ("", None)
@inline
private def parseDate(v: String): Try[Instant] =
Try(Instant.parse(v))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try will allocate additional memory. Instead, you could use a try { } catch {} block.
Incase of failure return a null

@BenchmarkMode(Array(Mode.Throughput))
@OutputTimeUnit(TimeUnit.SECONDS)
class CookieDecodeBenchmark {
private val cookie = "SUID=123;httponly=true;expires=2007-12-03T10:15:30.00Z"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be generate a larger cookie. With a large payload and more key-value pairs.

} else {
name = headerValue.substring(0, next)
}
} else if (headerValue.regionMatches(true, curr, "expires=", 0, 8)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string literals will also allocate memory every time the method is called. You could move it into a value so that it's allocated only once.

} else if (headerValue.regionMatches(true, curr, "samesite=", 0, 9)) {
val v = headerValue.substring(curr + 9, next).toLowerCase
v match {
case "lax" => sameSite = SameSite.Lax
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not really sure but "lax" AFAIK will also allocate additional memory.

@jgoday
Copy link
Contributor Author

jgoday commented Dec 16, 2021

@tusharmath
Updated with your latest suggestions:

  1. String literals
  2. Update benchmark with a larger payload cookie
OLD CODE:
[info] CookieDecodeBenchmark.benchmarkApp  thrpt    3  387562,885 ± 52538,364  ops/s

NEW CODE:
[info] CookieDecodeBenchmark.benchmarkApp  thrpt    3  679569,604 ± 27421,427  ops/s
  1. Avoid scala.util.Try in parseDate
  2. Added unsafeDecodeResponseCookie: Cookie
  3. Really like the idea of having a cookie with the index-based fields, specially to avoid splitNameContent allocations in 'decodeRequestCookie', but as you said, maybe it's better to do it in another PR ...

@tusharmath tusharmath changed the title #571: Reduce decodeResponseCookie allocations Reduce decodeResponseCookie allocations Dec 21, 2021
Co-authored-by: Tushar Mathur <tusharmath@gmail.com>
try { Instant.parse(headerValue.substring(curr + 8, next)) }
catch { case _: Throwable => null }
} else if (headerValue.regionMatches(true, curr, fieldMaxAge, 0, fieldMaxAge.length)) {
maxAge = Try(headerValue.substring(curr + 8, next).toLong).toOption
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can create a global try catch block for this method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tusharmath , a couple of questions here:

  1. would it be enough to wrap with a try each attempt to decode a field?

  2. What should we do with the exception, discard it completely?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We should have only 1 top level try catch block inside the safe implementation.
  2. Incase of failure we should throw an exception in the unsafe implementation. The safe implementation captures it and then could do either of the two things —
    a. Ignore the error and return a None.
    b. Capture the error and return a Left[Throwable]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, changed to catch the exception in decodeResponseCookie but keeping the Option[Cookie] signature

@tusharmath
Copy link
Collaborator

Really looking forward to this PR @jgoday :)

@tusharmath tusharmath merged commit 03881d9 into zio:main Jan 3, 2022
@tusharmath tusharmath changed the title Reduce decodeResponseCookie allocations Performance: Cookie decoder improvements Jan 3, 2022
@tusharmath
Copy link
Collaborator

tusharmath commented Jan 3, 2022

@jgoday Thanks! this was pretty amazing :) We make it 1.7x faster. It might not seem much right now, but it all adds up when write a proper e2e application.

@jgoday
Copy link
Contributor Author

jgoday commented Jan 4, 2022

Thank you @tusharmath for all your help and support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants