-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix state col offset #891
Fix state col offset #891
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a few nitpicks
@@ -115,6 +115,7 @@ public PythonUDFEvaluator( | |||
pyWorker = new PythonWorker(); | |||
pyWorker.sendCodePickle(fs.getBinary(), columnIdxs.length, outputType, isMultiValued); | |||
buffer = new TupleBuffer(stateSchema); | |||
groups = new IntObjectHashMap<IntArrayList>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this not properly initialized before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we missed it. ( @parmitam also found it)
@@ -361,6 +361,5 @@ public final void put( | |||
final int destColumn, final ReadableColumn sourceColumn, final int sourceRow) { | |||
checkPutIndex(destColumn); | |||
TupleUtils.copyValue(sourceColumn, sourceRow, this, destColumn); | |||
columnPut(destColumn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't immediately see the redundant call to columnPut()
; where was it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's in TupleUtils.copyValue
, this function calls putXXX()
s which call columnPut()
@@ -46,6 +47,8 @@ | |||
protected final int[] gfields; | |||
/** Buffer for restoring results. */ | |||
protected TupleBatchBuffer resultBuffer; | |||
/** If we have outputted 0 as the count for count(*) on an empty relation. */ | |||
private boolean COUNTALL_ON_EMPTY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a constant, so shouldn't be named like one.
@@ -102,6 +105,16 @@ protected TupleBatch fetchNextReady() throws DbException { | |||
tb = child.nextReady(); | |||
} | |||
if (child.eos()) { | |||
if (!COUNTALL_ON_EMPTY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be more broadly useful to have a boolean getter on Operator
indicating whether an operator had ever returned any results (i.e. nextReady() != null
)? If that could be reused elsewhere, it would be better than an ad-hoc flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea, I switched to use Operator.numOutputTuples
instead.
@@ -102,6 +105,16 @@ protected TupleBatch fetchNextReady() throws DbException { | |||
tb = child.nextReady(); | |||
} | |||
if (child.eos()) { | |||
if (!COUNTALL_ON_EMPTY | |||
&& gfields.length == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check might be a little clearer if it were encapsulated in a helper.
This looks good, but to be super-nitpicky (and this is purely a matter of taste), I think the logic would be clearer if the helper only tested whether
|
This has broken stateful agg for pythonUDFs. it should be sending a tuple list, but it is sending a tuple at a time again... |
Changes Unknown when pulling c5804f6 on fix_state_col_offset into ** on master**. |
From the code https://github.com/uwescience/myria/blob/fix_state_col_offset/src/edu/washington/escience/myria/expression/evaluate/PythonUDFEvaluator.java#L181 it seems we still send a list of tuples. Is there a quick ref for the error log? |
There is no failure, it just sends one tuple at a time. Also, it just send the tuple not the state -- this is a big deal, as I python can't update the state, if it is not sent everytime. Sending a tuple at a time woudl be fine just slow, but not sending the state is breaking the uda functionality-- it cannot calc an aggregate. |
Looks good from my end, I guess we'll have to wait for the Python issue to be resolved... |
Python issue resolved :) |
Changes Unknown when pulling b5d8681 on fix_state_col_offset into ** on master**. |
This PR fixes two problems: 1. when
STATE
is used as input for an aggregate with group by, theStateExpression
needs to be aware of its column index offset. 2. add a special check forcount(*)
on an empty table to return 0, but do not return anything if there are more aggregates.Travis will be fixed by uwescience/raco#558.