Update Kryo to 3.0.1 #230

figpope · 2015-06-07T15:57:55Z

This updates Kryo to the 3.0.1 shaded jar and resolves the deprecations this upgrade causes. Specifically:

The jar organization was changed from org.esotericsoftware.kryo to org.esotericsoftware
output.total() / input.total() now return long values instead of int
ObjectInstantiator is now a generic, in order to preserve type information. To retain the original library behavior, ObjectInstantiator[_] and ObjectInstantiator[AnyRef] were used

All tests are passing as far as I can tell. (Is there any other manner of testing other than running sbt +test?) And, assuming the tests are somewhat exhaustive, this should also allow Java 8 compatibility, which can be tested with the following patch:

--- a/project/Build.scala
+++ b/project/Build.scala
@@ -28,8 +28,12 @@ object ChillBuild extends Build {
     ScalariformKeys.preferences := formattingPreferences,

     // Twitter Hadoop needs this, sorry 1.7 fans
-    javacOptions ++= Seq("-target", "1.6", "-source", "1.6", "-Xlint:-options"),
-    javacOptions in doc := Seq("-source", "1.6"),
+    javacOptions ++= (
+      if (scalaVersion.value.startsWith("2.11")) Seq("-target", "1.8", "-source", "1.8", "-Xlint:-options")
+      else Seq("-target", "1.7", "-source", "1.7", "-Xlint:-options")),
+    javacOptions in doc := (
+      if (scalaVersion.value.startsWith("2.11")) Seq("-source", "1.8")
+      else Seq("-source", "1.7")),

     resolvers ++= Seq(
       Opts.resolver.sonatypeSnapshots,

This excludes the requisite JavaDoc updates that are required to move to Java 8.

ianoc · 2015-06-08T15:53:13Z

Given this is for java8 can you configure travis in this PR so its tested against java 8? -- http://docs.travis-ci.com/user/languages/java/

johnynek · 2015-06-08T18:46:57Z

chill-hadoop/src/main/java/com/twitter/chill/hadoop/KryoSerializer.java

@@ -47,7 +47,7 @@ public void serialize(Object o) throws IOException {
        try {
          st.writeObject(o);
          // Copy from buffer to output stream.
-          outputStream.writeInt(st.numOfWrittenBytes());
+          outputStream.writeLong(st.numOfWrittenBytes());


this is going to write 4 extra bytes for each record, right? Even 4 bytes is more than we need, as we don't expect 4 GB records.

Why not cast this to an int here and save the data (consider serializing Unit with this, which will now take twice as much data as it did before).

If this is a legit use case, should probably move this to a varInt encoding, would probably be a win over our usual int in some cases.

johnynek · 2015-06-10T01:12:56Z

This makes me a bit nervous unfortunately. Maybe someone should help me through my anxiety. Here is the issue: storm, spark, scalding and heron all put Kryo on the classpath. Upgrading here means there is a real possibility for diamonds and lots of binary pain if you have two of them in scope at the same time. The easiest way (for us) for that to happen is via summingbird since that will have Kryo from chill and from storm/heron.

How do we get out of this mess?

johnynek · 2015-06-10T01:15:27Z

chill-hadoop/src/main/java/com/twitter/chill/hadoop/Varint.java

+ * <p>Signed values are further encoded using so-called zig-zag encoding
+ * in order to make them "compatible" with variable-length encoding.</p>
+ */
+final class Varint {


we need tests for this. This is super testable with scalacheck.

steveloughran · 2015-09-15T11:26:32Z

Kryo is one of the JAR version troublespots -to deal with 2.21 & 2.22 import/link incpompatibilies spark 1.5 has to have a custom version of hive (downgraded against 2.21); ideally hive (unshaded) and spark would use the same release -but that would only work if chill used the same version (2.21+)

Before a big switch to 3.x, would a smaller upgrade to 2.22 be possible? Maybe even 2.24 with some co-ordination across Hive and Spark for a consistent version

ianoc · 2015-09-15T15:02:14Z

The apis alas changed enough between those versions that the binary incompatibilities problems would drive lots of pain everywhere. Going direct to 3 seems better to me than repeating the process multiple times (given the level of coordination pain this will drive).

I actually asked in June about some coordination on the spark dev list but got no response there. There is also the likes of Storm too. Unfortunately this bleeds everywhere pretty quickly.

ozawa-hi · 2015-11-10T11:25:06Z

I'm trying to get Spark to upgrade their kryo to 3. It seems there's a recent issue at HIVE too. Would it be better if I try to create a Spark patch for 3.0.1 or 3.0.3?
https://issues.apache.org/jira/browse/HIVE-12175

ozawa-hi · 2015-11-11T01:00:14Z

It seems there was an issue where upgrading Kryo in Spark would solve the issue, but it was stopped because of Chill. It would be great if we can coordinate activities.
https://issues.apache.org/jira/browse/SPARK-7708

steveloughran · 2015-11-11T16:57:51Z

@ozawa-hi —we really need to get hive & spark to be in sync; it's the sole reason there's a mutant hive-exec JAR for spark. If Hive and Chill can agree on a version, and it doesn't cause regressions in Spark, then it would sync things up nicely

ianoc · 2015-11-11T21:15:08Z

I had asked on the spark dev list a few months ago for a point person on that side to coordinate an upgrade with... but got no response. Scalding/Summingbird need to synchronize on the big upgrade too, but I can drive most of that. Do we have a point person that can vet and drive any spark upgrade volunteer?

steveloughran · 2015-11-11T21:43:34Z

...I'm not a person who can help with Spark updates, but could work to keep the Hive code in sync (they've generally been on the cutting edge of Kryo issues)

ozawa-hi · 2015-11-11T21:48:24Z

I'm not a Spark committer but is strongly willing to put all effort to get this upgrade because I desperately need it.
The funny thing is, there's an issue at Spark saying they can't version up Kryo because it's an uphill battle to do a version up with Chill. :-)
https://issues.apache.org/jira/browse/SPARK-7708

ianoc · 2015-11-11T21:55:16Z

I dunno, I've not said no to any of the main spark committers that I'm aware of. We've generally moved slowly due to the wide web of dependencies around this. Especially as the version of kryo here now is right before some major binary version breakage (In which the versions after it, a long time ago now, had some bugs so we had to go back). But if someone wants to take over this PR (its got conflicts now), bring it up to the latest version of kryo released I think we can get the ball rolling and try get an updated release of chill out soon

steveloughran · 2015-11-11T21:57:55Z

One q: how much coordinated update is needed? If a version of chill ships with Kryo 3, spark doesn't have to immediately update that until they want to. Same for Hive. One thing that will be trouble is if the kryo APIs change signficantly & Hive moves to Kryo 3.x, as then the ugly hacks needed to get spark and hive consistent (rebuilt hive with different kryo version, re-group-ID'd JAR without shading), won't work: upgrading is the only option.

That would be an issue independent of Chill —a version of chill would simply make it possible to get hive and spark in sync.

steveloughran · 2015-11-11T21:59:34Z

FWIW, I have a branch of Chill using Kryo 2.2; this is the version hive updated to for bug fixes ... they didn't find any problems worth rolling back for

https://github.com/steveloughran/chill/tree/feature/support-kryo-2.22

ianoc · 2015-11-11T22:02:03Z

So if the versions are not binary compatible then code cannot depend on new chill. Or say versions of algebird serializers can become a problem. Or for us we have a monorepo, so taking 1 version of kryo via our chill dep, then another via our spark deps, and other via our scalding ones is a mess alas.

ozawa-hi · 2015-11-13T03:07:24Z

So, it may better to upgrade Kryo in Chill first, then Hive, then Spark.

steveloughran · 2015-11-13T10:55:45Z

Hive will update on its own schedule, with its own needs.. looks like Hive 2 has the update down.

In Hive 1.2.1 it is already ahead of Chill & spark —for Spark 1.5.1 we had to patch it to downgrade (POM file and a couple of source files with changed import statements); and to not shade kryo.

Its spark that needs to pick up a changed kryo via chill; then the resyncing process can begin again.

I've just pushed up the patch #244 which patches chill for Kryo-2.22 instead; makes it consistent with Hive's current release. This is a lower risk update, which means that it's more likely to be picked up by spark in the 1.x line after a chill update; its clear that a kryo 3.x update won't happen until spark 2

(as a colleague who works on Hive once said to me, "Kryo is the guava of serialization")

ianoc · 2015-11-13T17:11:24Z

Upgrading to just support hive isn't a big win for most people and huge cost in effort. We should just be trying to get onto the latest stable kryo. We've tried a kryo 2.22 update, its was lots and lots of pain for little benefit, there was a chill release with it, long since reverted.

steveloughran · 2015-11-13T17:33:51Z

didn't know about that update; will cancel the pull req. If even a minor release hurts that much, you may as well go to the latest version you are happy with, skipping the intermediate

ozawa-hi · 2015-11-14T09:35:37Z

We've tried kryo2.22 but it really didn't solve the issues. We're currently using kryo 3.0 in our projects and it seems to be usable. Java is the default serdes for Spark and I don't think too many people are switching to use kryo as it is - we've found 2.22 to be insufficient in resolving the issues and not worth the pains to maintain it.

Kryo3.x won't happen before Spark 2.x but as you've already mentioned, they're a lot of work and it's probably better to start early. I'm not just suggesting an upgrade because there's a newer version out there - I really need to to use kryo because we need performance and kryo 2.x just isn't usable..

figpope added 5 commits June 7, 2015 11:46

Bumped kryo version to 3.0.1

dcef00f

Changed numOfWrittenBytes to long to match Kryo

2a064a4

Removed use of deprecated newSerializer

bb52eee

Add AnyRef type to generic ObjectInstantiator

7596f6c

Write long size instead of int, and convert when allocating byte array

a0909b7

figpope mentioned this pull request Jun 7, 2015

Chill should Java 8 #223

Closed

johnynek reviewed Jun 8, 2015
View reviewed changes

figpope added 3 commits June 8, 2015 15:19

Add Java 8 compilation compatibility

f6d462e

Added TravisCI Java 8 build

defab77

Added Varint / Varlong support

3ef80a6

figpope force-pushed the develop branch from 2993558 to 3ef80a6 Compare June 9, 2015 10:19

johnynek reviewed Jun 10, 2015
View reviewed changes

steveloughran mentioned this pull request Nov 13, 2015

Upgrade to kryo-2.22 #244

Closed

ianoc mentioned this pull request Nov 16, 2015

Kryo 3.0.3 upgrade #245

Merged

figpope closed this Dec 1, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Kryo to 3.0.1 #230

Update Kryo to 3.0.1 #230

figpope commented Jun 7, 2015

ianoc commented Jun 8, 2015

johnynek Jun 8, 2015

ianoc Jun 8, 2015

johnynek commented Jun 10, 2015

johnynek Jun 10, 2015

steveloughran commented Sep 15, 2015

ianoc commented Sep 15, 2015

ozawa-hi commented Nov 10, 2015

ozawa-hi commented Nov 11, 2015

steveloughran commented Nov 11, 2015

ianoc commented Nov 11, 2015

steveloughran commented Nov 11, 2015

ozawa-hi commented Nov 11, 2015

ianoc commented Nov 11, 2015

steveloughran commented Nov 11, 2015

steveloughran commented Nov 11, 2015

ianoc commented Nov 11, 2015

ozawa-hi commented Nov 13, 2015

steveloughran commented Nov 13, 2015

ianoc commented Nov 13, 2015

steveloughran commented Nov 13, 2015

ozawa-hi commented Nov 14, 2015

Update Kryo to 3.0.1 #230

Update Kryo to 3.0.1 #230

Conversation

figpope commented Jun 7, 2015

ianoc commented Jun 8, 2015

johnynek Jun 8, 2015

Choose a reason for hiding this comment

ianoc Jun 8, 2015

Choose a reason for hiding this comment

johnynek commented Jun 10, 2015

johnynek Jun 10, 2015

Choose a reason for hiding this comment

steveloughran commented Sep 15, 2015

ianoc commented Sep 15, 2015

ozawa-hi commented Nov 10, 2015

ozawa-hi commented Nov 11, 2015

steveloughran commented Nov 11, 2015

ianoc commented Nov 11, 2015

steveloughran commented Nov 11, 2015

ozawa-hi commented Nov 11, 2015

ianoc commented Nov 11, 2015

steveloughran commented Nov 11, 2015

steveloughran commented Nov 11, 2015

ianoc commented Nov 11, 2015

ozawa-hi commented Nov 13, 2015

steveloughran commented Nov 13, 2015

ianoc commented Nov 13, 2015

steveloughran commented Nov 13, 2015

ozawa-hi commented Nov 14, 2015