sequences, using subqueries #523

dhalperi · 2014-05-21T08:27:09Z

Enable the client to submit a sequence of queries and have them be executed in series.

fix the future -- right now it's a future for the first subquery rather than the whole query
enable and test sequences through REST api
rename a bunch of things to have a reasonable naming scheme
test with failures in later tasks
disable profiling for queries that may have multiple subqueries.

There will be a subsequent commit to make the server and worker control structures handle this well.

Switch from long queryID to QueryTaskId taskId, as prep work for subqueries. This commit also refactors some logging statements to not construct strings, but rather use the printf-style formatting that is more efficient. The LOGGER.ifBlahEnabled() checks are also removed when they are not more efficient (i.e., the logging command does no complex operations and hence no work). Also remove QUERY_PAUSE and QUERY_REMOVE from Myria, as these are never called. In particular, the IPC utility to generate a pause message from the server is never called.

This enables operators to advertise that they read or write relations.

always require a QueryStatusEncoding; make users construct one if they want hand coded strings.

If not started, return null for elapsed. If started and not finished, return elapsed since start.

It actually does the right thing with enums.

Option 1: plans and options come in at the same time (QueryEncoding) Option 2: plans and options come in separately (Fragments, then options)

So we can turn at JSON string into a DatasetStatus if need be.

…ponse And then use this to extract query id rather than hard coding it

It's not used at all by the OperationFuture classes, and it was implemented wrong by the AttachmentableAdaptor.

- refactor the way the server manages queries into QueryState, QueryTask - add MetaTask operators for Fragment, Sequence, JsonFragment - make the server issue multiple tasks one after the other. That is, it submits a task and then gets the next one. If null, success, otherwise it submits the next one and repeats. - refactor a bunch of work around how queries are submitted or exposed to the Catalog

This is a refactoring from the old OperationFuture to Google's ListenableFuture. THe goal is to have simpler code and not reinvent the wheel. I believe, at least for this particular task, this future is fully functional with very little code.

In reality, this is dead code though, I think.

Signed-off-by: Daniel Halperin <dhalperi@cs.washington.edu>

dhalperi · 2014-05-28T05:06:27Z

merging master. You know, since github doesn't handle that well.

dhalperi · 2014-05-28T05:06:51Z

willing to try a rebase on master if that will help review.

domoritz · 2014-05-28T05:08:37Z

I will use git log -p ..origin/master then ;-)

No need to rebase.

dhalperi · 2014-05-28T05:10:16Z

Nice option. But maybe you meant git log -p origin/master.. --reverse ?

domoritz · 2014-05-28T05:12:53Z

Nice option. But maybe you meant git log -p origin/master.. --reverse ?

Yes, you are right.

domoritz · 2014-05-28T05:16:12Z

I should clearly go home. Can't even type a git command without an error any more. Also, for evening entertainment: http://git-man-page-generator.lokaltog.net/

stechu · 2014-05-28T05:40:47Z

I feel that I should look at changes and ask dumb questions as well :)

First two dumb questions about the overall design before going into details:

what is the semantic of the failure of a task? For example, if the second sub-query of a task fails. Will the output of the first sub-query be deleted? In another words, do we recover the system to the status before the task or allow partial success/failure.
We probably should use @slxu's json builder client to simplify many tests. Here is an example. https://github.com/uwescience/myria/blob/master/jsonQueries/multiIDB_jwang/joinCircle.scala. And update the json generation code with the new encoding. I can work on that part, too.

dhalperi · 2014-05-28T05:41:55Z

If a subquery fails, the query fails. However, queries are not transactional. Completed subqueries stay completed.

dhalperi · 2014-05-28T05:43:16Z

We could imagine adding restrictions to Myria to make queries more transactional. E.g., you may never append to a relation. (Then we can just tag each relation with which query it was created by, and we get automatic versioning too!)

dhalperi · 2014-05-28T05:45:06Z

I'd really like to just rewrite the operators so that the operators themselves are able to be JSON-ed directly. Would simplify our lives.

When I looked at the JsonQueryBuilder there were several things I really didn't like. E.g., it's written in frickin' Scala, so it's a pain to maintain. Also, it's not very fleshed out.

stechu · 2014-05-28T06:14:02Z

There is nothing wrong with current semantic. In the long term, considering the following program (I tried very hard to not use the word query)

A = sub-query1;
store(A, public:adhoc:A);
B = sub-query2 (A was used);
store(B, public:adhoc:B );

If we consider MyriaL as a normal programming language. We even should keep B there if sub-query1 is succeed. However, for the following program.

A = sub-query1;
B = sub-query2 (A was used);
store(B, public:adhoc:B );

We should delete the intermediate result. (I just tried this on demo, raco is very smart and will combine these two queries into one. But let's say this program indeed translates to a sequence with 2 sub-queries.)

So in myria side, I guess some design that distinguishes materalized intermediate result and the actual result would be great. We can surely do this in another PR or when we feel it is needed though.

It's thoroughly unused.

billhowe · 2014-05-28T06:21:42Z

If the same user executes the same query twice, can we not just run it from
scratch, overwriting intermediate results as we go?

Fault tolerance features to save work is a good idea, but that's a
performance issue only. We should focus on correctness first, I figure.

(Plus, Dominik's plans for a materialization-aware optimizer may handle
this case automatically. The optimizer will see that the intermediate
result is already computed and just use it.)

On Tue, May 27, 2014 at 11:14 PM, Shumo Chu notifications@github.comwrote:

There is nothing wrong with current semantic. In the long term,
considering the following program (I tried very hard to not use the
word query)

A = sub-query1;
store(A, public:adhoc:A);
B = sub-query2 (A was used);
store(B, public:adhoc:B );

If we consider MyriaL as a normal programming language. We even should
keep B there if sub-query1 is succeed. However, for the following program.

A = sub-query1;
B = sub-query2 (A was used);
store(B, public:adhoc:B );

We should delete the intermediate result. (I just tried this on demo, raco
is very smart and will combine these two queries into one. But let's say
this program indeed translates to a sequence with 2 sub-queries.)

So in myria side, I guess some design that distinguishes materalized
intermediate result and the actual result would be great. We can surely do
this in another PR or when we feel it is needed though.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/523#issuecomment-44369280
.

It already is a SubQuery

They are literally only used in outdated tests. We have the ability to submit multiple queries and will soon have sequence operators; we don't need these any more. Signed-off-by: Daniel Halperin <dhalperi@cs.washington.edu>

coveralls · 2014-05-28T23:19:19Z

Coverage increased (+0.41%) when pulling d0d65ed on issue_query_cleanups into e0159cb on master.

dhalperi · 2014-06-01T23:40:21Z

systemtest/edu/washington/escience/myria/systemtest/SequenceTest.java

+public class SequenceTest extends SystemTestBase {
+
+  @Test
+  public void testSequence() throws Exception {


@stechu this is the test I was alluding to. Below, a similar test but submitted as JSON.

stechu · 2014-06-02T21:41:11Z

@dhalperi another dumb question, is there any design for nested sequence? From what I have understood so far, it is not supported yet?

If it is, we should add tests.
If it is not, we should explicitly forbid that.

dhalperi · 2014-06-02T21:56:48Z

@stechu I believe a sequence inside a sequence should be fine. I will add a test.

domoritz · 2014-06-02T21:57:52Z

@stechu Looks like we asked the same question.

sequences, using subqueries

coveralls · 2014-06-02T22:41:14Z

Coverage increased (+0.36%) when pulling 562826f on issue_query_cleanups into e0159cb on master.

stechu · 2014-06-02T23:28:34Z

yeah!

On Monday, June 2, 2014, Coveralls notifications@github.com wrote:

[image: Coverage Status] https://coveralls.io/builds/827984

Coverage increased (+0.36%) when pulling 562826f
562826f
on issue_query_cleanups into e0159cb
e0159cb
on master.

—
Reply to this email directly or view it on GitHub
#523 (comment).

Sent from Gmail Mobile

dhalperi · 2014-06-02T23:55:44Z

fyi, I found a small race condition and pushed the fix directly to master. Yay testing! Somehow it seems to have been expose and/or triggered when updating the null analysis branch.

dhalperi added 15 commits May 20, 2014 23:33

Server: another safeguard against zombie queries

666366c

protobuf: add subquery ID to control messages

937dc96

There will be a subsequent commit to make the server and worker control structures handle this well.

operator: add DbReader and DbWriter interfaces

79aa04f

This enables operators to advertise that they read or write relations.

Catalog: simpler interface to set query status

d0ca89f

always require a QueryStatusEncoding; make users construct one if they want hand coded strings.

QueryExecutionStatistics: return null if not started, elapsed after

c1a7015

If not started, return null for elapsed. If started and not finished, return elapsed since start.

build.gradle: upgrade guava

bbdc9a1

QueryEncoding: use FTMODE, not string

55e6faa

It actually does the right thing with enums.

QueryConstruct: refactor slightly

7aea05b

Option 1: plans and options come in at the same time (QueryEncoding) Option 2: plans and options come in separately (Fragments, then options)

DatasetStatus: add deserialization annotations

bdbaad3

So we can turn at JSON string into a DatasetStatus if need be.

SystemTestBase: helper functions to get dataset/query status from res…

011ec1f

…ponse And then use this to extract query id rather than hard coding it

QueryConstruct: use LOGGER, not syserr

a6488b3

myria: cleanup use of Attachmentable

eec1585

It's not used at all by the OperationFuture classes, and it was implemented wrong by the AttachmentableAdaptor.

SequenceTest: test submitting two queries in a sequence

3fd182a

dhalperi added Enhancement and removed RFC labels May 21, 2014

dhalperi added 9 commits May 21, 2014 01:31

OperationFuture: cleanup comment

74e5ed2

sp2bench.Main: reimplement slightly for new interface

2fcd3cb

In reality, this is dead code though, I think.

QueryEncoding: remove any reference to expected_result

3a40bb4

api: add meta task encodings

2efdba1

api: validate MetaTaskEncoding subclasses

6ac8884

api: update MetaTaskEncoding

cc02d6d

QueryConstruct: remove unused code

8c3fd27

Signed-off-by: Daniel Halperin <dhalperi@cs.washington.edu>

api: more work on the meta encoding

2217d43

OperationFutureProgressListener: finish removal

9bfb932

It's thoroughly unused.

SubQuery: don't create a new copy of yourself when instantiating

159ba19

It already is a SubQuery

dhalperi mentioned this pull request May 28, 2014

Null analysis #535

Merged

remove Blocking{SQLite,JDBC}Receiver operators

d0d65ed

They are literally only used in outdated tests. We have the ability to submit multiple queries and will soon have sequence operators; we don't need these any more. Signed-off-by: Daniel Halperin <dhalperi@cs.washington.edu>

dhalperi reviewed Jun 1, 2014
View reviewed changes

dhalperi mentioned this pull request Jun 2, 2014

improve viz api #529

Closed

2 tasks

SequenceTest: test nested Sequence

562826f

domoritz added a commit that referenced this pull request Jun 2, 2014

Merge pull request #523 from uwescience/issue_query_cleanups

b0ed5ec

sequences, using subqueries

domoritz merged commit b0ed5ec into master Jun 2, 2014

dhalperi deleted the issue_query_cleanups branch June 2, 2014 23:23

dhalperi mentioned this pull request Jun 3, 2014

Add myria sequence uwescience/raco#222

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sequences, using subqueries #523

sequences, using subqueries #523

dhalperi commented May 21, 2014

dhalperi commented May 28, 2014

dhalperi commented May 28, 2014

domoritz commented May 28, 2014

dhalperi commented May 28, 2014

domoritz commented May 28, 2014

domoritz commented May 28, 2014

stechu commented May 28, 2014

dhalperi commented May 28, 2014

dhalperi commented May 28, 2014

dhalperi commented May 28, 2014

stechu commented May 28, 2014

billhowe commented May 28, 2014

coveralls commented May 28, 2014

dhalperi Jun 1, 2014

stechu commented Jun 2, 2014

dhalperi commented Jun 2, 2014

domoritz commented Jun 2, 2014

coveralls commented Jun 2, 2014

stechu commented Jun 2, 2014

dhalperi commented Jun 2, 2014

sequences, using subqueries #523

sequences, using subqueries #523

Conversation

dhalperi commented May 21, 2014

dhalperi commented May 28, 2014

dhalperi commented May 28, 2014

domoritz commented May 28, 2014

dhalperi commented May 28, 2014

domoritz commented May 28, 2014

domoritz commented May 28, 2014

stechu commented May 28, 2014

dhalperi commented May 28, 2014

dhalperi commented May 28, 2014

dhalperi commented May 28, 2014

stechu commented May 28, 2014

billhowe commented May 28, 2014

coveralls commented May 28, 2014

dhalperi Jun 1, 2014

Choose a reason for hiding this comment

stechu commented Jun 2, 2014

dhalperi commented Jun 2, 2014

domoritz commented Jun 2, 2014

coveralls commented Jun 2, 2014

stechu commented Jun 2, 2014

dhalperi commented Jun 2, 2014