Fix state col offset #891

jingjingwang · 2017-05-04T19:58:26Z

This PR fixes two problems: 1. when STATE is used as input for an aggregate with group by, the StateExpression needs to be aware of its column index offset. 2. add a special check for count(*) on an empty table to return 0, but do not return anything if there are more aggregates.

Travis will be fixed by uwescience/raco#558.

senderista

LGTM, just a few nitpicks

senderista · 2017-05-04T22:05:47Z

src/edu/washington/escience/myria/expression/evaluate/PythonUDFEvaluator.java

@@ -115,6 +115,7 @@ public PythonUDFEvaluator(
    pyWorker = new PythonWorker();
    pyWorker.sendCodePickle(fs.getBinary(), columnIdxs.length, outputType, isMultiValued);
    buffer = new TupleBuffer(stateSchema);
+    groups = new IntObjectHashMap<IntArrayList>();


Was this not properly initialized before?

No, we missed it. ( @parmitam also found it)

senderista · 2017-05-04T22:11:41Z

src/edu/washington/escience/myria/storage/TupleBuffer.java

@@ -361,6 +361,5 @@ public final void put(
      final int destColumn, final ReadableColumn sourceColumn, final int sourceRow) {
    checkPutIndex(destColumn);
    TupleUtils.copyValue(sourceColumn, sourceRow, this, destColumn);
-    columnPut(destColumn);


I don't immediately see the redundant call to columnPut(); where was it?

It's in TupleUtils.copyValue, this function calls putXXX()s which call columnPut()

senderista · 2017-05-04T22:14:34Z

src/edu/washington/escience/myria/operator/agg/Aggregate.java

@@ -46,6 +47,8 @@
  protected final int[] gfields;
  /** Buffer for restoring results. */
  protected TupleBatchBuffer resultBuffer;
+  /** If we have outputted 0 as the count for count(*) on an empty relation. */
+  private boolean COUNTALL_ON_EMPTY;


This is not a constant, so shouldn't be named like one.

senderista · 2017-05-04T22:16:51Z

src/edu/washington/escience/myria/operator/agg/Aggregate.java

@@ -102,6 +105,16 @@ protected TupleBatch fetchNextReady() throws DbException {
      tb = child.nextReady();
    }
    if (child.eos()) {
+      if (!COUNTALL_ON_EMPTY


Would it be more broadly useful to have a boolean getter on Operator indicating whether an operator had ever returned any results (i.e. nextReady() != null)? If that could be reused elsewhere, it would be better than an ad-hoc flag.

Great idea, I switched to use Operator.numOutputTuples instead.

senderista · 2017-05-04T22:16:58Z

src/edu/washington/escience/myria/operator/agg/Aggregate.java

@@ -102,6 +105,16 @@ protected TupleBatch fetchNextReady() throws DbException {
      tb = child.nextReady();
    }
    if (child.eos()) {
+      if (!COUNTALL_ON_EMPTY
+          && gfields.length == 0


This check might be a little clearer if it were encapsulated in a helper.

senderista · 2017-05-05T00:06:56Z

This looks good, but to be super-nitpicky (and this is purely a matter of taste), I think the logic would be clearer if the helper only tested whether CountAll was the sole aggregate, so the check for emitting a 0 with no results would look like:

private boolean isCountAllOnlyAggregate() {
  return gfields.length == 0
      && internalAggs.size() == 1
      && internalAggs.get(0) instanceof PrimitiveAggregator
      && ((PrimitiveAggregator) (internalAggs.get(0))).aggOp == AggregationOp.COUNT;
}

if (getNumOutputTuples() == 0 && groupStates.numTuples() == 0 && isCountAllOnlyAggregate()) {
...
}

parmitam · 2017-05-05T18:00:39Z

This has broken stateful agg for pythonUDFs. it should be sending a tuple list, but it is sending a tuple at a time again...

coveralls · 2017-05-05T18:05:08Z

Changes Unknown when pulling c5804f6 on fix_state_col_offset into ** on master**.

jingjingwang · 2017-05-05T18:09:57Z

From the code https://github.com/uwescience/myria/blob/fix_state_col_offset/src/edu/washington/escience/myria/expression/evaluate/PythonUDFEvaluator.java#L181 it seems we still send a list of tuples. Is there a quick ref for the error log?

parmitam · 2017-05-05T18:17:28Z

There is no failure, it just sends one tuple at a time. Also, it just send the tuple not the state -- this is a big deal, as I python can't update the state, if it is not sent everytime. Sending a tuple at a time woudl be fine just slow, but not sending the state is breaking the uda functionality-- it cannot calc an aggregate.
( I am printing what is being sent to the python process so not sure what I can provide to help)

senderista · 2017-05-05T18:17:41Z

Looks good from my end, I guess we'll have to wait for the Python issue to be resolved...

parmitam · 2017-05-05T19:32:57Z

Python issue resolved :)
There were some changes in semantics that were throwing things off. everything works!
Looks great to me!

coveralls · 2017-05-05T19:47:33Z

Changes Unknown when pulling b5d8681 on fix_state_col_offset into ** on master**.

jingjingwang added 3 commits May 3, 2017 16:39

make StateExpression aware of the column offset

d0b5b48

fix multiple calls to columnPut() and groups init

8d32c6a

add special check for count(*) on empty relation

3e9f105

jingjingwang requested a review from senderista May 4, 2017 19:59

add a test for UDA counter

c8f3b5b

senderista reviewed May 4, 2017

View reviewed changes

jingjingwang added 2 commits May 4, 2017 16:02

use numOutputTuples instead of a flag

a788436

put the check in a method

393f35a

move some conditions from the helper method

c5804f6

changing python agg functions to work with new tuple schema

b5d8681

senderista merged commit 5b992a0 into master May 5, 2017

senderista deleted the fix_state_col_offset branch May 5, 2017 21:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix state col offset #891

Fix state col offset #891

jingjingwang commented May 4, 2017 •

edited

Loading

senderista left a comment

senderista May 4, 2017

jingjingwang May 4, 2017 •

edited

Loading

senderista May 4, 2017

jingjingwang May 4, 2017

senderista May 4, 2017

senderista May 4, 2017

jingjingwang May 4, 2017

senderista May 4, 2017

senderista commented May 5, 2017

parmitam commented May 5, 2017

coveralls commented May 5, 2017

jingjingwang commented May 5, 2017

parmitam commented May 5, 2017 •

edited

Loading

senderista commented May 5, 2017

parmitam commented May 5, 2017

coveralls commented May 5, 2017

Fix state col offset #891

Fix state col offset #891

Conversation

jingjingwang commented May 4, 2017 • edited Loading

senderista left a comment

Choose a reason for hiding this comment

senderista May 4, 2017

Choose a reason for hiding this comment

jingjingwang May 4, 2017 • edited Loading

Choose a reason for hiding this comment

senderista May 4, 2017

Choose a reason for hiding this comment

jingjingwang May 4, 2017

Choose a reason for hiding this comment

senderista May 4, 2017

Choose a reason for hiding this comment

senderista May 4, 2017

Choose a reason for hiding this comment

jingjingwang May 4, 2017

Choose a reason for hiding this comment

senderista May 4, 2017

Choose a reason for hiding this comment

senderista commented May 5, 2017

parmitam commented May 5, 2017

coveralls commented May 5, 2017

jingjingwang commented May 5, 2017

parmitam commented May 5, 2017 • edited Loading

senderista commented May 5, 2017

parmitam commented May 5, 2017

coveralls commented May 5, 2017

jingjingwang commented May 4, 2017 •

edited

Loading

jingjingwang May 4, 2017 •

edited

Loading

parmitam commented May 5, 2017 •

edited

Loading