-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sampling Operators #735
Closed
Closed
Sampling Operators #735
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
In many cases, the updaters for user-defined aggregates share code. E.g., for an argmax, you may do something like this: def higher(max1, max2, val1, val2): case when max1 > max2 then val1 else val2 end; And then if I have a table Student(name, gpa), I may define argmax using this updater to pick the {name,gpa} of the student with the highest gpa. update = [higher(s.gpa, state.gpa, s.name, state.name), higher(s.gpa, state.gpa, s.gpa, state.gpa)] Currently, we compile each of the update expressions individually and then execute them in series. Unless Java's JIT is really awesome, this likely leads to redundant execution. (Performance results indicate that the JIT does not optimize this redundancy away.) Instead, we should generate the entire updater script as a single block of code, and compile it as a single method. The execution code gets simpler and we expose more optimization opportunities to the compiler. In my experiments, the time of UDA execution decreases by ~20% or better. - Add a new ScriptEvalInterface for compiled script objects, and clean up the name of the old EvalInterface -> ExpressionEvalInterface - Rename Evaluator.getJavaExpression() to reflect the fact that it always includes code to append to an input column. - Refactor UserDefinedAggregator (and associated Factory) to use the new script interface. - Add the AppendableTable interface to Tuple so that we can use it with the new ScriptEvalInterface.
…determine how to decode it. To decode a message, the code checks if this physical channel is linked to a logical StreamInputChannel and use this information to determine the type of this message. However the link may not exist due to failure or query kill, while the message (should not be used anymore) still exists in the physical channel and will be decoded incorrectly thus causing errors. To fix it, simply using the header of this message to detect its type and ignore it when needed.
The channel may not exist anymore due to failure, then resumeRead() leads to an error.
Code/test changes to enable modulo operator.
a dataset. Add a partition function UnknownPartitionFunction to deal with old "unknown" values and also cases when it's indeed unknown, e.g. importDataset.
using SingleFieldHashPartitionFunction.
Fix ipc channel failure
UserDefinedAggregate: combine expressions into a script
updating all json queries, mostly changing opid and aggregators
Related to ongoing work for uwescience/myria-web#265
MasterCatalog: populate language in querySimpleStatusHelper
Support http and https schemes in UriSource
Add partition function to ingest
Several extensions
I think I messed up a rebase here - going to start a new PR for my sampling operators |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Unit tests forthcoming.