Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

base-http: -> 2.13 #797

Closed

Conversation

martijnhoekstra
Copy link
Contributor

@martijnhoekstra martijnhoekstra commented Sep 24, 2019

Problem

no 2.13 cross build

Solution

* HeaderMap: The big one. Rework into a trie as not to depend on HashTable's implementation.

  • Other maps: split into 2.12- and 2.13+ versions
  • build: enable 2.13

Result

~Performance changes in 2.12 vs baseline are a mixed bag. ~

* Create: 2.5 times as fast
* Create and add a single element: 1.3 times as fast
* Create and add 24 life-like headers: 34% slower
* Get one out of 20 random headers: 30% slower
* Get one out of 24 life-like headers: 2.98 times as slow (!!)
* Foreach: 2.5 times as fast
* Iterating keys and values separtely is slow. There is likely still a lot to get there.

@martijnhoekstra martijnhoekstra mentioned this pull request Sep 24, 2019
28 tasks
@codecov-io
Copy link

codecov-io commented Sep 24, 2019

Codecov Report

Merging #797 into develop will increase coverage by 3.91%.
The diff coverage is 71.79%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #797      +/-   ##
===========================================
+ Coverage    71.91%   75.82%   +3.91%     
===========================================
  Files          857      789      -68     
  Lines        25042    23740    -1302     
  Branches      1816     1769      -47     
===========================================
- Hits         18009    18002       -7     
+ Misses        7033     5738    -1295
Impacted Files Coverage Δ
.../twitter/finagle/http/headers/EmptyHeaderMap.scala 20% <ø> (ø) ⬆️
...witter/finagle/http/headers/DefaultHeaderMap.scala 0% <ø> (ø)
...com/twitter/finagle/http/headers/HeadersHash.scala 100% <ø> (ø)
...twitter/finagle/http/ParamMapVersionSpecific.scala 0% <0%> (ø)
...witter/finagle/http/CookieMapVersionSpecific.scala 0% <0%> (ø)
...inagle/http/headers/HeaderMapVersionSpecific.scala 0% <0%> (ø)
...finagle/http/headers/JTreeMapBackedHeaderMap.scala 100% <100%> (ø) ⬆️
.../src/main/scala/com/twitter/finagle/http/Uri.scala 94.11% <100%> (+0.36%) ⬆️
...finagle/http/headers/Rfc7230HeaderValidation.scala 100% <100%> (ø) ⬆️
...ain/scala/com/twitter/finagle/http/CookieMap.scala 96.55% <100%> (ø) ⬆️
... and 85 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c699dc2...bfe9a4a. Read the comment docs.

@mosesn
Copy link
Contributor

mosesn commented Sep 25, 2019

Hmm, how would you feel about breaking this up into two commits, one of which does the 2.13 work and one of which depends on the 2.13 one and has the Map changes? In the mean time we can just stub out the Map stuff with trivial implementations (like the Seq one we discussed).

Also, @yufangong tested out removing the extends Map from HeaderMap and it seems that everything still compiles after we just add back in a couple of methods, so we might be able to simplify things by breaking the API a tiny bit.

@martijnhoekstra
Copy link
Contributor Author

The simplest thing that could possibly work for a backing store for Headers would be a plain mutable map over a string wrapper that compares case-insensitively -- easier and probably faster than the Seq one. It'll cost you an allocation per header. I can do that pretty quickly.

Re-evaluating whether something more sophisticated is a net positive can come at a later time.

@martijnhoekstra martijnhoekstra force-pushed the http_base_213 branch 2 times, most recently from e44e738 to e307f53 Compare September 26, 2019 10:58
@martijnhoekstra
Copy link
Contributor Author

@mosesn this version includes both a trie-backed version and (by default) a hashmap backed version. I don't have the resources to benchmark everything.

If you want to split out the trie implementation, you can just leave out **/TrieHeaderMap.scala and **/TrieHeaderMapBase.scala

Copy link
Contributor

@mosesn mosesn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this approach is fine, and it's OK to move forward with it. With that said, my preference would probably be to remove the constraints that HeaderMap and CookieMap actually extend scala Maps, unify our 2.12 and 2.13 implementations, and make this a less invasive change for now. The trie-based approach is definitely interesting though, so I'd like to continue exploring that no matter what (separately from the 2.13 upgrade).

What do you think we should do?

def iterated: Iterator[HeaderMap.NameValue] = {
var cur = this
new AbstractIterator[HeaderMap.NameValue](){
def hasNext = cur != null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type annos please

var cur = this
new AbstractIterator[HeaderMap.NameValue](){
def hasNext = cur != null
def next() = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type annos please


// ---- Map/MapLike -----

/*final*/ def get(key: String): Option[String] = underlying.synchronized {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this commented out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a comment about that. It got lost I think, I'll bring it back. scala/bug#11749

@@ -16,12 +16,12 @@ class DefaultHeaderMapTest extends AbstractHeaderMapTest with ScalaCheckDrivenPr
final def newHeaderMap(headers: (String, String)*): HeaderMap = DefaultHeaderMap(headers: _*)

test("apply()") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we push all of these tests down into AbstractHeaderMapTest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really shouldn't read review comments bottom to top

@@ -3,7 +3,7 @@ package com.twitter.finagle.http
import org.scalacheck.Gen
import org.scalatestplus.scalacheck.ScalaCheckDrivenPropertyChecks

class DefaultHeaderMapTest extends AbstractHeaderMapTest with ScalaCheckDrivenPropertyChecks {
class TrieHeaderMapTest extends AbstractHeaderMapTest with ScalaCheckDrivenPropertyChecks {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make sure we're testing the Map-backed HeaderMap too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that sounds good. Is there a reason why these tests aren't on AbstractHeaderMapTest? Should I duplicate them or move them there?

def values: Seq[String] =
if (next == null) value :: Nil
else {
val result = new mutable.ListBuffer[String] += value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use List.newBuilder here too?

def names: List[String] =
if (next == null) name :: Nil
else {
val result = new mutable.ListBuffer[String] += name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use List.newBuilder here too?

@martijnhoekstra
Copy link
Contributor Author

I agree that extending Map doesn't bring much in principle. The problem with no longer extending Map is that you may break downstream: Map is part of the pubic signature and is exposed through request/response as Headers.

Is not extending map and possibly breaking downstream worth the benefit of not having to do that weird thing with version-dependent source directories? In the sbt world, I'd cautiously say no: there is very little version-specific code, and it's fairly discoverable. Not sure if the same goes for the pants/monorepo world. It probably also depends on whether any of your internal users use it as a Map.

@mosesn
Copy link
Contributor

mosesn commented Sep 26, 2019

We typically use our internal repository as a model for how we expect that people are using our code in the wild, since it gives us a bunch of examples. @yufangong found that there are a pretty limited number of ways that people use Map-ish methods internally, so I think it should be safe to break.

@yufangong can you elaborate on what you found in your investigation?

I think my bigger concern with the current approach is the sheer number of changes you had to make to get it to work. Right now this is pretty challenging to review–I can't really see what you copied over from before (or changed minimally) and what you changed substantially.

@luciferous
Copy link
Collaborator

Just starting to take a look, seems like a lot of work/thought has been put into this. Thanks for digging in, @martijnhoekstra.

I think I'm agreeing with @mosesn about breaking up the commit. Is there a smaller and simpler (even if not efficient) replacement implementation for HeaderMap that will allow us to break with HashTable? It looks to me that we'd need to spend a lot more time studying this trie-based approach, and maybe we can do that separately.

@martijnhoekstra
Copy link
Contributor Author

I can think of a couple of things to make the diff here smaller and more manageable. I'm going to propose a few preliminary PR's, and we can see for individual changes whether they work out, or want to go a different direction. That's for all or your time and effort y'all!

@martijnhoekstra
Copy link
Contributor Author

Diff-shrinker 1: #798

@martijnhoekstra
Copy link
Contributor Author

Diff-shrinker 2: #799

@martijnhoekstra
Copy link
Contributor Author

Diff-shrinker 3 will then introduce MapBackedHeaderMap on top of diff-shrinker 1 and 2, and then this PR can be rebased on top of that, do the 212/213 compat, and set MapBackedHeaderMap as the default for 2.13 (or for everything).

@yufangong
Copy link
Contributor

We typically use our internal repository as a model for how we expect that people are using our code in the wild, since it gives us a bunch of examples. @yufangong found that there are a pretty limited number of ways that people use Map-ish methods internally, so I think it should be safe to break.

@yufangong can you elaborate on what you found in your investigation?

I think my bigger concern with the current approach is the sheer number of changes you had to make to get it to work. Right now this is pretty challenging to review–I can't really see what you copied over from before (or changed minimally) and what you changed substantially.

Sorry, I missed this. I first try to remove the HashMap Headers extended and leave something as ???, some internal things failed because of that (not compiling errors), so I assume not many models are relying on the HashMap impl. This PR seems now breaking finagle-base-http, so I cannot get an actual result.

@martijnhoekstra
Copy link
Contributor Author

This PR shouldn't be breaking. The failed CI is the finagle-http streaming test that's sometimes failing elsewhere too intermittently, but let's look at that again when the preliminary work is done, and it needs to be looked at all over again.

@martijnhoekstra
Copy link
Contributor Author

Diff-shrinker 3: #805

@martijnhoekstra martijnhoekstra force-pushed the http_base_213 branch 2 times, most recently from fec719f to 71dff8a Compare October 20, 2019 09:23
@martijnhoekstra
Copy link
Contributor Author

martijnhoekstra commented Oct 20, 2019

rebased on top of #805

When that gets in, rebase again on develop, which is equivalent to cherry-picking 92d7d3e

I'll be on holiday for the next week, and won't have my laptop on me (and have different things on my mind other than rebasing pull requests).

@spockz could you maybe see if you can get this moving once #805 is merged?

@martijnhoekstra
Copy link
Contributor Author

rebased. Diff shrunk significantly. Is the builder approach of avoiding Vector and its builders still good?

@@ -22,7 +22,7 @@ class MapParamMapTest extends FunSuite {
}

test("+") {
val paramMap = MapParamMap() + ("a" -> "1") + ("b" -> "2") + ("a" -> "3")
val paramMap = MapParamMap().setParam("a" -> "1").setParam("b" -> "2").setParam("a" -> "3")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really unfortunate, but + is final and returns Map.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HashMap gets around it somehow. I think it's with StrictOptimizedMapOps, but I'm not sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got mixed up with += on mutable.Map -- + is not final at all, specifically to allow overriding with a narrower return type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, phew!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

phew indeed

Copy link

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Interested observer - no input on code, just noticed some whitespace issues (also, almost all new files are missing a final newline)

@martijnhoekstra
Copy link
Contributor Author

Surely, you mean to say that those files don't have spurious final newlines

@h-vetinari
Copy link

Git (and github) are quite clear that the lack of final newlines is not desired (git diff will complain, and the github UI shows it as missing as well). ;-)

If it's an intentional choice then sure - but there is a final newline in DefaultHeaderMap.scala for example, so it's not even consistent...

@martijnhoekstra
Copy link
Contributor Author

Neither Git, nor Github complains to me. Maybe you have different settings, or different interpretations of the same interface.

I added a commit I removing the spurious newline at the end of DefaultHeaderMap for consistencies sake though.

@h-vetinari
Copy link

Neither Git, nor Github complains to me. Maybe you have different settings, or different interpretations of the same interface.

Running
git diff upstream/master -- finagle-base-http/src/main/scala-2.12-/com/twitter/finagle/http/headers/D* on this PR yields

diff --git a/finagle-base-http/src/main/scala-2.12-/com/twitter/finagle/http/headers/DefaultHeaderMap.scala b/finagle-base-http/src/main/scala-2.12-/com/twitter/finagle/http/headers/DefaultHeaderMap.scala
new file mode 100644
index 000000000..ad55b6933
--- /dev/null
+++ b/finagle-base-http/src/main/scala-2.12-/com/twitter/finagle/http/headers/DefaultHeaderMap.scala
@@ -0,0 +1,7 @@
+package com.twitter.finagle.http.headers
+
+import com.twitter.finagle.http.HeaderMap
+
+object DefaultHeaderMap {
+  def apply(headers: (String, String)*): HeaderMap = HashBackedHeaderMap(headers: _*)
+}
\ No newline at end of file

and similarly for all the other files missing a newline. Note the last line - that's git complaining roughly because of the posix standard. You'll also see it in your diff on github as a crossed red circle with a return arrow.

@martijnhoekstra
Copy link
Contributor Author

I see it mentioning the lack of a newline at the end of the file, but not complaining about it. The Scala spec doesn't refer to POSIX with regards to its lines either.

@mosesn
Copy link
Contributor

mosesn commented Nov 7, 2019

fwiw, whether files have newlines at the end of them or not has no bearing on whether they'll be merged in or not.

@martijnhoekstra is this ready to rock? should we review it?

@martijnhoekstra
Copy link
Contributor Author

yeah, this is ready to rock @mosesn

@mosesn
Copy link
Contributor

mosesn commented Nov 7, 2019

awesome! we're heads down on a tech debt sprint this week, but we'll review your patch next week. thanks so much for going through all of this trouble, I'm pretty excited about where we're going to end up.

@martijnhoekstra
Copy link
Contributor Author

@mosesn great to hear! The rest of finagle-http is pretty review-light. I've tested the commits by @spockz at https://github.com/martijnhoekstra/finagle/pull/1/commits , and as far as I'm concerned they look good to me to be cherry-picked in as well. Can we make that happen within the scope of this PR as well? It would be really nice if we can get finagle-http published this year still.

Copy link
Contributor

@bryce-anderson bryce-anderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgive me if my comments don't make a lot of sense: I haven't spent much with the 2.13 yet.

@@ -7,7 +7,7 @@ import com.twitter.finagle.http.HeaderMap
* Mutable, thread-safe [[HeaderMap]] implementation backed by a
* HeaderHash.
*/
private final class HashBackedHeaderMap extends http.HeaderMap {
private[http] final class HashBackedHeaderMap extends http.HeaderMap {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the plan is to just get rid of this since the JTree backed header map is a better performer and scala-version agnostic, so hopefully that will cut down some noise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other PR was merged keeping the old as the default, so I figured it was a good idea to keep it here too. @mosesn what's your take on that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Bryce's proposal makes sense, we're going to swap the default after we finish testing it, after all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented in caf06dc

Comment on lines 5 to 10
protected abstract class ParamMapVersionSpecific extends immutable.Map[String, String] {
def setParam[B >: String](kv: (String, B)): ParamMap
def updated[V1 >: String](key: String, value: V1): ParamMap = setParam(key, value)

def clearParam(name: String): ParamMap
def removed(name: String): ParamMap = clearParam(name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can setParam and clearParam be protected so as not to become part of the public api?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made those protected

@@ -128,13 +128,13 @@ class CookieMap private[finagle] (message: Message, cookieCodec: CookieCodec)
this += ((cookie.name, cookie))
}

override def ++=(xs: TraversableOnce[(String, Cookie)]): this.type = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it's replaced with addCookies, and thus an API change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in 2.12, ++= comes from the override in CookieMapVersionSpecific
in 2.13, ++= comes from Map where it is an alias for addAll which is overridden in CookieMapVersionSpecific

Comment on lines 5 to 19
protected abstract class CookieMapVersionSpecific(message: Message, cookieCodec: CookieCodec) {
def this(message: Message) = this(message, CookieMap.cookieCodec)

def addCookie(cookie: (String, Cookie)): this.type
def addCookies(cookies: IterableOnce[(String, Cookie)]): this.type
def removeCookie(name: String): this.type
def removeCookies(names: IterableOnce[String]): this.type

def addOne(cookie: (String, Cookie)): this.type = addCookie(cookie)
def subtractOne(name: String): this.type = removeCookie(name)

}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can some of these be protected? And I'm not sure where the addCookies and removeCookies are used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, those sound like good plans.

@martijnhoekstra martijnhoekstra force-pushed the http_base_213 branch 2 times, most recently from f9b36e3 to caf06dc Compare November 11, 2019 15:17
@yufangong
Copy link
Contributor

Hey @martijnhoekstra, it looks like there are some conflicts between the branch and develope (DefaultHeaderMap), can you do a rebase? Thank you!

@martijnhoekstra
Copy link
Contributor Author

rebased

@martijnhoekstra
Copy link
Contributor Author

test failures look spurious

@yufangong
Copy link
Contributor

Sorry for the delay, just got back from holiday breaks.
One thing caught our eyes that ArraySeq.newBuilder, we have found out ArraySeq.newBuilder being more expensive than new ArrayBuffer, see 34a2c66.
Is there any reason we want to use ArraySeq here? Otherwise, I will just change this part to use ArrayBuffer on our end. Thanks.

@martijnhoekstra
Copy link
Contributor Author

Calling toSeq on a mutable seq in 2.12 is a noop, but is a conversion in 2.13, usually involving copying the entire structure and should be much slower than the builder/result construct.

The difficulty is balancing the requirements of staying on collection.Seq in both 2.12 and 2.13, not regressing 2.12 performance (too much), keeping the diff as small as possible, and not regressing 2.12 -> 2.13 too much. I thought I found a good balance there with the builder, but as the facts show, the regressions on 2.12 are still significant.

@yufangong
Copy link
Contributor

Oh so good to know, we neglected 2.13 performance before, this is a good sign for us to start building some perf tools running on 2.12 and 2.13.

And yes, the regressions on 2.12 are showing pretty significant perf drops by using builder/result. so we will have to keep the toSeq around for the majority. Maybe worth to file an issue as well to start the discussion on 2.13?

@martijnhoekstra martijnhoekstra mentioned this pull request Dec 4, 2019
@martijnhoekstra
Copy link
Contributor Author

@yufangong I created #818

finaglehelper pushed a commit that referenced this pull request Dec 10, 2019
******************************************************
PR here: #797
******************************************************

Problem

no 2.13 cross build

Solution

~~HeaderMap: The big one. Rework into a trie as not to depend on HashTable's implementation.~~

Other maps: split into 2.12- and 2.13+ versions
build: enable 2.13
Result

Performance changes in 2.12 vs baseline are a mixed bag.

~~* Create: 2.5 times as fast~~
~~* Create and add a single element: 1.3 times as fast~~
~~* Create and add 24 life-like headers: 34% slower~~
~~* Get one out of 20 random headers: 30% slower~~
~~* Get one out of 24 life-like headers: 2.98 times as slow (!!)~~
~~* Foreach: 2.5 times as fast~~
~~* Iterating keys and values separtely is slow. There is likely still a lot to get there.~~

Signed-off-by: Yufan Gong <yufang@twitter.com>

JIRA Issues: CSL-9126

Differential Revision: https://phabricator.twitter.biz/D405663
twitter-service pushed a commit that referenced this pull request Dec 10, 2019
******************************************************
PR here: #797
******************************************************

Problem

no 2.13 cross build

Solution

~~HeaderMap: The big one. Rework into a trie as not to depend on HashTable's implementation.~~

Other maps: split into 2.12- and 2.13+ versions
build: enable 2.13
Result

Performance changes in 2.12 vs baseline are a mixed bag.

~~* Create: 2.5 times as fast~~
~~* Create and add a single element: 1.3 times as fast~~
~~* Create and add 24 life-like headers: 34% slower~~
~~* Get one out of 20 random headers: 30% slower~~
~~* Get one out of 24 life-like headers: 2.98 times as slow (!!)~~
~~* Foreach: 2.5 times as fast~~
~~* Iterating keys and values separtely is slow. There is likely still a lot to get there.~~

Signed-off-by: Yufan Gong <yufang@twitter.com>
GitOrigin-RevId: 4deae32e1f9e73d95ece50a9a9ce760006dc318f
@yufangong
Copy link
Contributor

Merged here: 3007e5d.
Thank you!

@yufangong yufangong closed this Dec 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

7 participants