-
Notifications
You must be signed in to change notification settings - Fork 706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OrderedSerialization support for faster key sorting #1208
Conversation
@ianoc @isnotinvain please carefully review this code. I do want to merge this. |
|
||
def getRequireOrderedSerialization: Boolean = | ||
get(ScaldingRequireOrderedSerialization) | ||
.map(java.lang.Boolean.parseBoolean) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
random: I think there's also .map(_.toBoolean), which throws for invalid boolean strings, compared to:
scala> java.lang.Boolean.parseBoolean("heyyy")
res2: Boolean = false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call.
Serialization.successUnit | ||
} catch { case NonFatal(e) => Failure(e) } | ||
|
||
override def compareBinary(lhs: InputStream, rhs: InputStream) = try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does try with no catch do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
odd that it compiles
@avibryant @colinmarc @snoble if y'all have time to take a look, that would be great. The basic approach is the OrderedSerialization typeclass and see how we wire it into cascading (it is a bit painful, but that is life with cascading). |
env: BUILD="base" TEST_TARGET="scalding-jdbc scalding-json" | ||
script: "scripts/run_test.sh" | ||
|
||
# not committed yet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets just move this to one of the other PR's? and remove from here -- if its not in those PR's we might forget to uncomment, and if its in them doesn't need to be here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I uncommented in the follow up (I think). I'm just as worried we won't remember to add it (since that happens to). I'll do what you prefer.
I made some small comments on mostly really TODO's and comments. LGTM really. Though I was involved in enough of it to be suitably biased ;) Given we are doing this smaller PR we should consider if doing a followup this week/next week that splits out the ordered serialization code that doesn't depend on scalding into its own package is a good idea or left for future work. (If it survives past a release together we need to change the classpath of the classes) |
@isnotinvain @ianoc addressed most of the comments. Want to take another look? |
case f @ OrderedSerialization.CompareFailure(_) => f | ||
case _ => cA // the first is not equal | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, staticSize and dynamicSize need to be added. I wonder if they should not have default implementations. Its too easy forget about them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable not to have them have any defaults, make someone think about the cost involved if they don't supply them if they write something manually.
actually there were several bugs with combinators as a result of the default None in the size methods. I think making those required is a good call for performance critical code, such as this. |
@isnotinvain any more concerns on this base PR? |
@johnynek Sorry, I'm still reading your comments. I will try to finish tonight. |
@isnotinvain changed the Iterable to a Map. Makes the code clearer and possibly faster. Anything else that needs addressing in this? |
Do we have Laws for Orderings? I understand that we now have laws for total orderings applied to serializable orderings, but I think there's value in checking the other things that we expect to be true, like The ignored exception in Serialization2.scala would be great to log or compose with the other exception, maybe like this: I have not had a chance to look up the contract around |
Meant to say, +1 LGTM, the above questions aren't blockers. |
@isnotinvain two issues:
I'd rather merge this now and move on to the macro's branch. Is that okay? |
Given alex's + 1 above, merging |
OrderedSerialization support for faster key sorting
For 1), I guess I meant that for any Orderings we or our users have written by hand, it'd be nice to assert those properties. No need to assert them on ones that come built into the language. Similar to how we provide MonoidLaws etc. For 2) I must be going crazy, I thought stackTraceString was an implicit enrichment on exception, but now I can't find it. Must have been local to some other project. I agree, if we surface the first exception that's fine since often we wouldn't even try the second stream. |
this does not include the macros. Want to stabilize this first.