Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defer creation of Expectations #280

Merged
merged 6 commits into from Oct 29, 2021
Merged

Conversation

rossabaker
Copy link
Member

Given a parser whose hot part looks like this:

    val fieldName = Rfc7230.token.string
    val fieldValue = P.repSep0(Rfc5234.vchar.rep, P.charIn(" \t").rep).string.surroundedBy(Rfc7230.ows)
    val header = ((fieldName <* P.char(':')) ~ fieldValue).map {
      case (k, v) => Header.Raw(CIString(k), v)
    }
    val headers = (header <* P.string("\r\n")).rep0

CharIn.makeError is dominating the stack trace and allocations. It is created at the end of each repetition, only to be nulled out and gc'ed if we've made any progress.

[info] Benchmark                                                         Mode  Cnt       Score       Error   Units
[info] Http1DecoderBench.http1Decoder                                   thrpt    5  194717.904 ± 27142.805   ops/s
[info] Http1DecoderBench.http1Decoder:·gc.alloc.rate                    thrpt    5    2375.264 ±   330.882  MB/sec
[info] Http1DecoderBench.http1Decoder:·gc.alloc.rate.norm               thrpt    5   13436.503 ±     0.524    B/op
[info] Http1DecoderBench.http1Decoder:·gc.churn.G1_Eden_Space           thrpt    5    2378.683 ±   323.143  MB/sec
[info] Http1DecoderBench.http1Decoder:·gc.churn.G1_Eden_Space.norm      thrpt    5   13456.252 ±   205.497    B/op
[info] Http1DecoderBench.http1Decoder:·gc.churn.G1_Survivor_Space       thrpt    5       0.008 ±     0.004  MB/sec
[info] Http1DecoderBench.http1Decoder:·gc.churn.G1_Survivor_Space.norm  thrpt    5       0.044 ±     0.019    B/op
[info] Http1DecoderBench.http1Decoder:·gc.count                         thrpt    5     508.000              counts
[info] Http1DecoderBench.http1Decoder:·gc.time                          thrpt    5     552.000                  ms
[info] Http1DecoderBench.http1Decoder:·stack                            thrpt              NaN                 ---

Here, we wrap every Chain[Expectation] in an Eval, so that the expectations can be created lazily. Care must be taken not to close over mutable state, namely, state.offset. We significantly increase speed and reduce garbage:

[info] Http1DecoderBench.http1Decoder                                   thrpt    5  353576.391 ± 39077.640   ops/s
[info] Http1DecoderBench.http1Decoder:·gc.alloc.rate                    thrpt    5    2293.983 ±   253.014  MB/sec
[info] Http1DecoderBench.http1Decoder:·gc.alloc.rate.norm               thrpt    5    7146.434 ±     0.264    B/op
[info] Http1DecoderBench.http1Decoder:·gc.churn.G1_Eden_Space           thrpt    5    2294.376 ±   247.756  MB/sec
[info] Http1DecoderBench.http1Decoder:·gc.churn.G1_Eden_Space.norm      thrpt    5    7147.791 ±    94.745    B/op
[info] Http1DecoderBench.http1Decoder:·gc.churn.G1_Survivor_Space       thrpt    5       0.007 ±     0.002  MB/sec
[info] Http1DecoderBench.http1Decoder:·gc.churn.G1_Survivor_Space.norm  thrpt    5       0.023 ±     0.006    B/op
[info] Http1DecoderBench.http1Decoder:·gc.count                         thrpt    5     494.000              counts
[info] Http1DecoderBench.http1Decoder:·gc.time                          thrpt    5     519.000                  ms
[info] Http1DecoderBench.http1Decoder:·stack                            thrpt              NaN                 ---

I'll work on cleaning up and providing a runnable benchmark, but wanted to get early feedback.

We might also make it leaner with simple thunks rather than an Eval that's always later.

@codecov-commenter
Copy link

codecov-commenter commented Oct 15, 2021

Codecov Report

Merging #280 (bbfe4d7) into main (9e67814) will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #280      +/-   ##
==========================================
+ Coverage   96.32%   96.35%   +0.02%     
==========================================
  Files           8        8              
  Lines         980      988       +8     
  Branches       93       92       -1     
==========================================
+ Hits          944      952       +8     
  Misses         36       36              
Impacted Files Coverage Δ
core/shared/src/main/scala/cats/parse/Parser.scala 96.26% <100.00%> (+0.03%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9e67814...bbfe4d7. Read the comment docs.

@johnynek
Copy link
Collaborator

The win looks really good and I'm definitely interested in merging when green and we sort out any mima issues.

@rossabaker
Copy link
Member Author

Drat. I was afraid stack safety would be an issue with the thunks. I'll pause here and see if people agree with the problem before I pour more into the solution.

@johnynek
Copy link
Collaborator

I don't see why Eval should hurt much and I imagine flatMap can solve the stack issues no?

@rossabaker
Copy link
Member Author

A thunk is slightly lighter than Eval, but not much more so, and Eval does indeed take care of stack safety.

Replacing Eval.later with Eval.always is marginally slower, but eliminates some allocations:

[info] Http1DecoderBench.http1Decoder                                   thrpt    5  337942.154 ± 42747.443   ops/s
[info] Http1DecoderBench.http1Decoder:·gc.alloc.rate                    thrpt    5    2072.398 ±   261.164  MB/sec
[info] Http1DecoderBench.http1Decoder:·gc.alloc.rate.norm               thrpt    5    6754.582 ±     0.565    B/op
[info] Http1DecoderBench.http1Decoder:·gc.churn.G1_Eden_Space           thrpt    5    2075.145 ±   264.493  MB/sec
[info] Http1DecoderBench.http1Decoder:·gc.churn.G1_Eden_Space.norm      thrpt    5    6763.527 ±   113.341    B/op
[info] Http1DecoderBench.http1Decoder:·gc.churn.G1_Survivor_Space       thrpt    5       0.008 ±     0.003  MB/sec
[info] Http1DecoderBench.http1Decoder:·gc.churn.G1_Survivor_Space.norm  thrpt    5       0.025 ±     0.009    B/op
[info] Http1DecoderBench.http1Decoder:·gc.count                         thrpt    5     480.000              counts
[info] Http1DecoderBench.http1Decoder:·gc.time                          thrpt    5     520.000                  ms
[info] Http1DecoderBench.http1Decoder:·stack                            thrpt              NaN                 ---

@rossabaker rossabaker marked this pull request as ready for review October 15, 2021 14:10
@rossabaker
Copy link
Member Author

Do you want a benchmark added to this project? A dumbed down HTTP header parser should do it.

If I can get this reasonably competitive with the gross handwritten ones in http4s, a proper one will appear there.

Copy link
Collaborator

@regadas regadas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! those numbers look good 😄

Comment on lines 2677 to 2681
Chain.fromSeq(
ranges.toList.map { case (s, e) =>
Expectation.InRange(offset, s, e)
}
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rossabaker last time I saw this #62 are you seeing diff results? maybe worth keep?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't show up in my benchmark since we now avoid this path instead of calling it a zillion times. It probably doesn't matter in reality because it's now only called once per failed parse. But I guess there's no reason not to keep the optimization and fail as fast as we can. I'll restore it.

Now that I'm aware of #62, I suspect we see some positive effect from this on those benchmarks.

@rossabaker
Copy link
Member Author

I need my CPU back so I can't run a proper number of iterations right now, but looking good on the JSON:

bench/jmh:run -i 4 -wi 2 -f 1 -t 1 .*catsParseParse

Before, on 9e67814:

[info] Benchmark                   Mode  Cnt   Score    Error  Units
[info] BarBench.catsParseParse     avgt    4  ≈ 10⁻⁴           ms/op
[info] Bla25Bench.catsParseParse   avgt    4  36.105 ±  4.108  ms/op
[info] Qux2Bench.catsParseParse    avgt    4  10.878 ±  1.018  ms/op
[info] Ugh10kBench.catsParseParse  avgt    4  84.775 ±  4.723  ms/op

After, on 9561249:

[info] Benchmark                   Mode  Cnt   Score    Error  Units
[info] BarBench.catsParseParse     avgt    4  ≈ 10⁻⁴           ms/op
[info] Bla25Bench.catsParseParse   avgt    4  30.502 ±  1.736  ms/op
[info] Qux2Bench.catsParseParse    avgt    4   9.649 ±  1.508  ms/op
[info] Ugh10kBench.catsParseParse  avgt    4  76.212 ±  6.961  ms/op

Error on both:

[info] java.lang.RuntimeException: Error(256941,NonEmptyList(InRange(256941,,,,)))
[info] 	at scala.sys.package$.error(package.scala:27)
[info] 	at cats.parse.bench.JmhBenchmarks.catsParseParse(JsonBench.scala:47)
[info] 	at cats.parse.bench.jmh_generated.CountriesBench_catsParseParse_jmhTest.catsParseParse_avgt_jmhStub(CountriesBench_catsParseParse_jmhTest.java:190)
[info] 	at cats.parse.bench.jmh_generated.CountriesBench_catsParseParse_jmhTest.catsParseParse_AverageTime(CountriesBench_catsParseParse_jmhTest.java:152)

@johnynek
Copy link
Collaborator

The json parser got refactored in a way that broke it I think and we aren't testing the code in CI so we didn't noticed.

This was fixed for the read me here:

#258

null.asInstanceOf[A]
}
}

final def oneOf[A](all: Array[Parser0[A]], state: State): A = {
val offset = state.offset
var errs: Chain[Expectation] = Chain.nil
var errs: Eval[Chain[Expectation]] = Eval.later(Chain.nil)
Copy link
Collaborator

@johnynek johnynek Oct 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make this Eval.now(Chain.nil) I think that will be less allocation and actually every oneOf hits this path.

actually, can we allocate a single Eval[Chain[Expectation]] as a private val evalEmpty: Eval[Chain[Expectation]] = Eval.now(Chain.nil) in Impl and not have any allocations for this?

@@ -2180,7 +2195,7 @@ object Parser {
// we failed to parse, but didn't consume input
// is unchanged we continue
// else we stop
errs = errs ++ err
errs = errs.map(_ ++ err.value)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.value isn't safe. We should do for { e1 <- errs; e2 <- err } yield (e1 ++ e2) or (errs, err).mapN(_ ++ _) but maybe the latter is slightly slower due to dispatch via typeclass.

@johnynek
Copy link
Collaborator

I'll take the changes I requested.

Thanks for sending a PR!

@johnynek johnynek merged commit eee5032 into typelevel:main Oct 29, 2021
johnynek added a commit that referenced this pull request Oct 29, 2021
@johnynek johnynek mentioned this pull request Oct 29, 2021
regadas pushed a commit that referenced this pull request Oct 29, 2021
This was referenced Nov 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants