Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sequences, using subqueries #523

Merged
merged 54 commits into from
Jun 2, 2014
Merged

sequences, using subqueries #523

merged 54 commits into from
Jun 2, 2014

Conversation

dhalperi
Copy link
Member

Enable the client to submit a sequence of queries and have them be executed in series.

  • fix the future -- right now it's a future for the first subquery rather than the whole query
  • enable and test sequences through REST api
  • rename a bunch of things to have a reasonable naming scheme
  • test with failures in later tasks
  • disable profiling for queries that may have multiple subqueries.

There will be a subsequent commit to make the server and worker control
structures handle this well.
Switch from long queryID to QueryTaskId taskId, as prep work for
subqueries.

This commit also refactors some logging statements to not construct
strings, but rather use the printf-style formatting that is more
efficient. The LOGGER.ifBlahEnabled() checks are also removed when they
are not more efficient (i.e., the logging command does no complex
operations and hence no work).

Also remove QUERY_PAUSE and QUERY_REMOVE from Myria, as these are never
called. In particular, the IPC utility to generate a pause message from
the server is never called.
This enables operators to advertise that they read or write relations.
always require a QueryStatusEncoding; make users construct one if they
want hand coded strings.
If not started, return null for elapsed.

If started and not finished, return elapsed since start.
It actually does the right thing with enums.
Option 1: plans and options come in at the same time (QueryEncoding)

Option 2: plans and options come in separately (Fragments, then options)
So we can turn at JSON string into a DatasetStatus if need be.
…ponse

And then use this to extract query id rather than hard coding it
It's not used at all by the OperationFuture classes, and it was
implemented wrong by the AttachmentableAdaptor.
- refactor the way the server manages queries into QueryState, QueryTask
- add MetaTask operators for Fragment, Sequence, JsonFragment
- make the server issue multiple tasks one after the other. That is,
  it submits a task and then gets the next one. If null, success,
  otherwise it submits the next one and repeats.
- refactor a bunch of work around how queries are submitted or exposed
  to the Catalog
@dhalperi dhalperi added Enhancement and removed RFC labels May 21, 2014
This is a refactoring from the old OperationFuture to Google's
ListenableFuture. THe goal is to have simpler code and not reinvent the
wheel. I believe, at least for this particular task, this future is
fully functional with very little code.
In reality, this is dead code though, I think.
Signed-off-by: Daniel Halperin <dhalperi@cs.washington.edu>
@dhalperi
Copy link
Member Author

merging master. You know, since github doesn't handle that well.

@dhalperi
Copy link
Member Author

willing to try a rebase on master if that will help review.

@domoritz
Copy link
Member

I will use git log -p ..origin/master then ;-)

No need to rebase.

@dhalperi
Copy link
Member Author

Nice option. But maybe you meant git log -p origin/master.. --reverse ?

@domoritz
Copy link
Member

Nice option. But maybe you meant git log -p origin/master.. --reverse ?

Yes, you are right.

@domoritz
Copy link
Member

I should clearly go home. Can't even type a git command without an error any more. Also, for evening entertainment: http://git-man-page-generator.lokaltog.net/

@stechu
Copy link
Contributor

stechu commented May 28, 2014

I feel that I should look at changes and ask dumb questions as well :)

First two dumb questions about the overall design before going into details:

  1. what is the semantic of the failure of a task? For example, if the second sub-query of a task fails. Will the output of the first sub-query be deleted? In another words, do we recover the system to the status before the task or allow partial success/failure.
  2. We probably should use @slxu's json builder client to simplify many tests. Here is an example. https://github.com/uwescience/myria/blob/master/jsonQueries/multiIDB_jwang/joinCircle.scala. And update the json generation code with the new encoding. I can work on that part, too.

@dhalperi
Copy link
Member Author

If a subquery fails, the query fails. However, queries are not transactional. Completed subqueries stay completed.

@dhalperi
Copy link
Member Author

We could imagine adding restrictions to Myria to make queries more transactional. E.g., you may never append to a relation. (Then we can just tag each relation with which query it was created by, and we get automatic versioning too!)

@dhalperi
Copy link
Member Author

I'd really like to just rewrite the operators so that the operators themselves are able to be JSON-ed directly. Would simplify our lives.

When I looked at the JsonQueryBuilder there were several things I really didn't like. E.g., it's written in frickin' Scala, so it's a pain to maintain. Also, it's not very fleshed out.

@stechu
Copy link
Contributor

stechu commented May 28, 2014

There is nothing wrong with current semantic. In the long term, considering the following program (I tried very hard to not use the word query)

A = sub-query1;
store(A, public:adhoc:A);
B = sub-query2 (A was used);
store(B, public:adhoc:B );

If we consider MyriaL as a normal programming language. We even should keep B there if sub-query1 is succeed. However, for the following program.

A = sub-query1;
B = sub-query2 (A was used);
store(B, public:adhoc:B );

We should delete the intermediate result. (I just tried this on demo, raco is very smart and will combine these two queries into one. But let's say this program indeed translates to a sequence with 2 sub-queries.)

So in myria side, I guess some design that distinguishes materalized intermediate result and the actual result would be great. We can surely do this in another PR or when we feel it is needed though.

@billhowe
Copy link

If the same user executes the same query twice, can we not just run it from
scratch, overwriting intermediate results as we go?

Fault tolerance features to save work is a good idea, but that's a
performance issue only. We should focus on correctness first, I figure.

(Plus, Dominik's plans for a materialization-aware optimizer may handle
this case automatically. The optimizer will see that the intermediate
result is already computed and just use it.)

On Tue, May 27, 2014 at 11:14 PM, Shumo Chu notifications@github.comwrote:

There is nothing wrong with current semantic. In the long term,
considering the following program (I tried very hard to not use the
word query)

A = sub-query1;
store(A, public:adhoc:A);
B = sub-query2 (A was used);
store(B, public:adhoc:B );

If we consider MyriaL as a normal programming language. We even should
keep B there if sub-query1 is succeed. However, for the following program.

A = sub-query1;
B = sub-query2 (A was used);
store(B, public:adhoc:B );

We should delete the intermediate result. (I just tried this on demo, raco
is very smart and will combine these two queries into one. But let's say
this program indeed translates to a sequence with 2 sub-queries.)

So in myria side, I guess some design that distinguishes materalized
intermediate result and the actual result would be great. We can surely do
this in another PR or when we feel it is needed though.


Reply to this email directly or view it on GitHubhttps://github.com//pull/523#issuecomment-44369280
.

@dhalperi dhalperi mentioned this pull request May 28, 2014
They are literally only used in outdated tests. We have the ability to
submit multiple queries and will soon have sequence operators; we don't
need these any more.

Signed-off-by: Daniel Halperin <dhalperi@cs.washington.edu>
@coveralls
Copy link

Coverage Status

Coverage increased (+0.41%) when pulling d0d65ed on issue_query_cleanups into e0159cb on master.

public class SequenceTest extends SystemTestBase {

@Test
public void testSequence() throws Exception {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stechu this is the test I was alluding to. Below, a similar test but submitted as JSON.

@dhalperi dhalperi mentioned this pull request Jun 2, 2014
2 tasks
@stechu
Copy link
Contributor

stechu commented Jun 2, 2014

@dhalperi another dumb question, is there any design for nested sequence? From what I have understood so far, it is not supported yet?

If it is, we should add tests.
If it is not, we should explicitly forbid that.

@dhalperi
Copy link
Member Author

dhalperi commented Jun 2, 2014

@stechu I believe a sequence inside a sequence should be fine. I will add a test.

@domoritz
Copy link
Member

domoritz commented Jun 2, 2014

@stechu Looks like we asked the same question.

domoritz added a commit that referenced this pull request Jun 2, 2014
@domoritz domoritz merged commit b0ed5ec into master Jun 2, 2014
@coveralls
Copy link

Coverage Status

Coverage increased (+0.36%) when pulling 562826f on issue_query_cleanups into e0159cb on master.

@dhalperi dhalperi deleted the issue_query_cleanups branch June 2, 2014 23:23
@stechu
Copy link
Contributor

stechu commented Jun 2, 2014

yeah!

On Monday, June 2, 2014, Coveralls notifications@github.com wrote:

[image: Coverage Status] https://coveralls.io/builds/827984

Coverage increased (+0.36%) when pulling 562826f
562826f
on issue_query_cleanups
into e0159cb
e0159cb
on master
.


Reply to this email directly or view it on GitHub
#523 (comment).

Sent from Gmail Mobile

@dhalperi
Copy link
Member Author

dhalperi commented Jun 2, 2014

fyi, I found a small race condition and pushed the fix directly to master. Yay testing! Somehow it seems to have been expose and/or triggered when updating the null analysis branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants