Sampling Operators #735

radion · 2015-04-13T04:54:00Z

Unit tests forthcoming.

In many cases, the updaters for user-defined aggregates share code. E.g., for an argmax, you may do something like this: def higher(max1, max2, val1, val2): case when max1 > max2 then val1 else val2 end; And then if I have a table Student(name, gpa), I may define argmax using this updater to pick the {name,gpa} of the student with the highest gpa. update = [higher(s.gpa, state.gpa, s.name, state.name), higher(s.gpa, state.gpa, s.gpa, state.gpa)] Currently, we compile each of the update expressions individually and then execute them in series. Unless Java's JIT is really awesome, this likely leads to redundant execution. (Performance results indicate that the JIT does not optimize this redundancy away.) Instead, we should generate the entire updater script as a single block of code, and compile it as a single method. The execution code gets simpler and we expose more optimization opportunities to the compiler. In my experiments, the time of UDA execution decreases by ~20% or better. - Add a new ScriptEvalInterface for compiled script objects, and clean up the name of the old EvalInterface -> ExpressionEvalInterface - Rename Evaluator.getJavaExpression() to reflect the fact that it always includes code to append to an input column. - Refactor UserDefinedAggregator (and associated Factory) to use the new script interface. - Add the AppendableTable interface to Tuple so that we can use it with the new ScriptEvalInterface.

…determine how to decode it. To decode a message, the code checks if this physical channel is linked to a logical StreamInputChannel and use this information to determine the type of this message. However the link may not exist due to failure or query kill, while the message (should not be used anymore) still exists in the physical channel and will be decoded incorrectly thus causing errors. To fix it, simply using the header of this message to detect its type and ignore it when needed.

The channel may not exist anymore due to failure, then resumeRead() leads to an error.

Code/test changes to enable modulo operator.

a dataset. Add a partition function UnknownPartitionFunction to deal with old "unknown" values and also cases when it's indeed unknown, e.g. importDataset.

using SingleFieldHashPartitionFunction.

Fix ipc channel failure

UserDefinedAggregate: combine expressions into a script

updating all json queries, mostly changing opid and aggregators

Related to ongoing work for uwescience/myria-web#265

MasterCatalog: populate language in querySimpleStatusHelper

Support http and https schemes in UriSource

coveralls · 2015-05-17T11:04:23Z

Coverage decreased (-0.07%) to 59.34% when pulling 645e14d on radion:master into 54d54b9 on uwescience:master.

Add partition function to ingest

Several extensions

radion · 2015-05-22T16:42:12Z

I think I messed up a rebase here - going to start a new PR for my sampling operators

dhalperi and others added 30 commits March 5, 2015 16:04

updating all json queries, mostly changing opid and aggregators

d2d62c7

EOI should be treated the same as data.

14e508d

check if the channel is connected before resumeRead().

6cf0ade

The channel may not exist anymore due to failure, then resumeRead() leads to an error.

Code/test changes to enable modulo operator.

53bb688

Merge pull request #719 from uwescience/mod_op

90d85cd

Code/test changes to enable modulo operator.

write the partition function into catalog stored_relation when ingesting

1ec89c7

a dataset. Add a partition function UnknownPartitionFunction to deal with old "unknown" values and also cases when it's indeed unknown, e.g. importDataset.

extend DatasetStatus to return howPartitioned, add a test to ingest data

b820339

using SingleFieldHashPartitionFunction.

remove logger mode checks

a1627db

Merge pull request #717 from uwescience/fix-ipc-channel-failure

0f96752

Fix ipc channel failure

New string split operator.

c5dff8d

Fixing javadoc.

b8f2e98

Don't discard trailing empty split segments.

801bea5

Merge pull request #713 from uwescience/uda-rewrite-script-eval

350a68c

UserDefinedAggregate: combine expressions into a script

Merge pull request #716 from uwescience/fix-old-jsons

9756b46

updating all json queries, mostly changing opid and aggregators

small edit on getting_started examples

212a7e3

Code review feedback.

a746137

Fixing Eclipse breakage.

c2dd4f6

More code review feedback.

3042d12

Still more code review feedback.

c3ff69e

Adding JSON encoding for Split operator.

64ec4a7

MasterCatalog: populate language in querySimpleStatusHelper

7813702

Related to ongoing work for uwescience/myria-web#265

Merge pull request #724 from uwescience/fix-query-summary

40e2799

MasterCatalog: populate language in querySimpleStatusHelper

Adding test for Split operator JSON encoding.

c5dcfef

Use java.net.URL for http scheme in UriSource

9ef2352

Grr Java

a9fca68

Reverting formatting changes.

a12a54d

Explicit operator precedence on ternary operator

127d2bd

Merge pull request #727 from uwescience/urisource-http

abd7832

Support http and https schemes in UriSource

fixed up stream sampling

645e14d

jingjingwang and others added 26 commits May 19, 2015 13:09

remove unused method

b71fe6b

change HowPartitioned.workers as an ImmutableSet

f62dedf

use null as the default value since null means workers are not specified

0ee6255

Merge pull request #720 from uwescience/add-partition-function-to-ingest

94158b5

Add partition function to ingest

Merge pull request #718 from uwescience/several-extensions

0f813fd

Several extensions

added IntelliJ setup files to .gitignore

449b6a6

added a partition function that decides based on a raw integer value

298907f

RawValuePartitionFunction update

fd436a8

encodings for sampling operators

2b3cdfc

sample operators

8e95211

minor fix...

77b057b

example json queries for sampling

816dbff

removed println from operator

88824bf

extra precondition checks for SamplingDistribution

b6603d4

can specify a seed value for Sample operators

428befc

make seeded SamplingDistribution actually determinisitic

4a767db

updated sampling encodings to support randomSeed

fc436f0

tests for sampling operators

2ab4d47

fixed off-by-one issue with SampleWR

19a43b0

update sampling operators

d834d05

improved sampling WoR

c3f4008

refactored sampling operators

5e9ce68

update tests and documentation

503ca23

fixed up stream sampling

6df66dc

Merge branch 'master' of github.com:radion/myria

ee51b9e

code cleanup for sampling operators

a453770

radion closed this May 22, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sampling Operators #735

Sampling Operators #735

radion commented Apr 13, 2015

coveralls commented May 17, 2015

radion commented May 22, 2015

Sampling Operators #735

Sampling Operators #735

Conversation

radion commented Apr 13, 2015

coveralls commented May 17, 2015

radion commented May 22, 2015