Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sampling Operators #735

Closed
wants to merge 103 commits into from
Closed

Sampling Operators #735

wants to merge 103 commits into from

Conversation

radion
Copy link
Contributor

@radion radion commented Apr 13, 2015

Unit tests forthcoming.

dhalperi and others added 30 commits March 5, 2015 16:04
In many cases, the updaters for user-defined aggregates share code. E.g., for
an argmax, you may do something like this:

def higher(max1, max2, val1, val2):
   case when max1 > max2 then val1 else val2 end;

And then if I have a table Student(name, gpa), I may define argmax using this
updater to pick the {name,gpa} of the student with the highest gpa.

update = [higher(s.gpa, state.gpa, s.name, state.name),
          higher(s.gpa, state.gpa, s.gpa, state.gpa)]

Currently, we compile each of the update expressions individually and then
execute them in series. Unless Java's JIT is really awesome, this likely leads
to redundant execution. (Performance results indicate that the JIT does not
optimize this redundancy away.)

Instead, we should generate the entire updater script as a single block of
code, and compile it as a single method. The execution code gets simpler and we
expose more optimization opportunities to the compiler. In my experiments, the
time of UDA execution decreases by ~20% or better.

- Add a new ScriptEvalInterface for compiled script objects, and clean up the
  name of the old EvalInterface -> ExpressionEvalInterface
- Rename Evaluator.getJavaExpression() to reflect the fact that it always
  includes code to append to an input column.
- Refactor UserDefinedAggregator (and associated Factory) to use the new script
  interface.
- Add the AppendableTable interface to Tuple so that we can use it with the new
  ScriptEvalInterface.
…determine how to decode it.

To decode a message, the code checks if this physical channel is linked to a logical StreamInputChannel and use this information to determine the type of this message. However the link may not exist due to failure or query kill, while the message (should not be used anymore) still exists in the physical channel and will be decoded incorrectly thus causing errors. To fix it, simply using the header of this message to detect its type and ignore it when needed.
The channel may not exist anymore due to failure, then resumeRead() leads to an error.
Code/test changes to enable modulo operator.
a dataset.

Add a partition function UnknownPartitionFunction to deal with old "unknown"
values and also cases when it's indeed unknown, e.g. importDataset.
UserDefinedAggregate: combine expressions into a script
updating all json queries, mostly changing opid and aggregators
MasterCatalog: populate language in querySimpleStatusHelper
Support http and https schemes in UriSource
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.07%) to 59.34% when pulling 645e14d on radion:master into 54d54b9 on uwescience:master.

@radion
Copy link
Contributor Author

radion commented May 22, 2015

I think I messed up a rebase here - going to start a new PR for my sampling operators

@radion radion closed this May 22, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants