Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup for single node environment / TPC-H benchmark #917

Open
RonyMin opened this issue Sep 21, 2017 · 7 comments
Open

Setup for single node environment / TPC-H benchmark #917

RonyMin opened this issue Sep 21, 2017 · 7 comments
Assignees

Comments

@RonyMin
Copy link

RonyMin commented Sep 21, 2017

Hi, I tried to setup Myria on a server which is equipped with the following hardwares:

Intel Xeon E5-2690 v4 x 2 (10 x 2 cores)
512GB RAM
5TB PCi-E SSD

I believe that my server can run a single master and a total number of 20 workers.
In this case, what is the best way to setup Myria, i.e., deployment.cfg.local, for achieving its best performance?

In addition, Does MyriaX can run TPC-H benchmark without any critical issue?
If possible, then where can I get related documents about TPC-H benchmark on MyriaX query engine?

Thanks,
Yoon-Min

@RonyMin RonyMin changed the title Setup for single node environments Setup for single node environments / TPC-H benchmark Sep 21, 2017
@senderista
Copy link
Contributor

I assume you're following the instructions for Myria local installation at http://myria.cs.washington.edu/docs/myriax/. It should be enough to just modify deployment.cfg.local to have 20 worker entries, each with distinct ports.

We have run TPC-H queries on Myria and can provide input data on request. I would recommend contacting @jingjingwang for details.

@senderista
Copy link
Contributor

The Myria query plan generator for TPC-H queries is here: https://github.com/uwescience/tpch-myrial. Note that we have added ORDER BY support after this was written. We could also discuss extending datetime support if that's a problem. A few of the TPC-H queries were omitted from the templates; I assume those involved NULL. We can help you ingest the TPC-H data into Myria from the S3 folder I mentioned before (s3://uwdb/tpch).

@RonyMin
Copy link
Author

RonyMin commented Sep 27, 2017

@senderista It's wonderful! Actually, I have a huge interests in processing TPC-DS benchmark using Myria. I avoid null-value issue in Myria to upload TPC-DS database into Myria system.
However, there exists two remaining issues.
The first one is the correct MyriaL script for processing TPC-DS query.
The second one, which is the biggest issue for me is the error from query compiler if I turn on the option that prefers multiway joins using hypercube partitioning method, instead of a series of binary joins.
Related to the second issue, Myria demo web interface shows the same problem that a query plan using binary joins is only available to run the triangle query correctly.
Please review my comment and let me know how to handle this issues.

@senderista
Copy link
Contributor

I'm unclear if you're interested in running TPC-H or TPC-DS queries on Myria (or both?)...

Could you paste a screenshot or the text from the error you see when trying to enable multiway joins?

@RonyMin
Copy link
Author

RonyMin commented Oct 3, 2017

screen shot 2017-10-03 at 12 08 15 pm

I faced exactly the same error as I tried to process TPC-DS queries in Myria with turning the multi-way join option on.
How can I avoid such error during compiling?
Any suggestion?

@bmyerz
Copy link
Member

bmyerz commented Oct 5, 2017

@senderista

Note that we have added ORDER BY support after this was written

Is uwescience/raco#174 now out of date and can be closed or is the ORDER BY support in MyriaX side only?

@senderista
Copy link
Contributor

@bmyerz we haven't quite implemented everything in that issue. All we have now is translation of ORDER BY + LIMIT in MyriaL to the corresponding logical/physical operators, but we don't push them into SQL, nor do we have a global Merge exchange operator (I suspect we can live without the latter for the simple top-K scenarios we want to support, but we really should be able to push ORDER BY/LIMIT into DbQueryScan). We also still need to propagate ordering as a RepresentationProperty so we can implement optimizations like forcing merge join or merge aggregation for ordered inputs.

@RonyMin RonyMin changed the title Setup for single node environments / TPC-H benchmark Setup for single node environment / TPC-H benchmark Oct 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants