Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to cache scramble tables in Spark? #362

Open
hychen20 opened this issue Apr 3, 2019 · 2 comments
Open

How to cache scramble tables in Spark? #362

hychen20 opened this issue Apr 3, 2019 · 2 comments

Comments

@hychen20
Copy link

hychen20 commented Apr 3, 2019

No description provided.

@pyongjoo
Copy link
Member

pyongjoo commented Apr 3, 2019

The standard caching statement [1] should work when prefixed with bypass. For example.

verdict.sql('bypass cache table schema.scramble_table')

Disclaimer: We have not tested this yet, so I am not 100% certain.

[1] https://docs.databricks.com/spark/latest/spark-sql/language-manual/cache-table.html

@hychen20
Copy link
Author

hychen20 commented Apr 7, 2019

Sorry, it seems it does not work. I cached the scramble lineitem table as well as the verdictdbmeta table. I can see the tables are cached from the Spark UI, however, the TPC-H Q1 still takes the same amount of time as when the tables are not cached ...

Here's my code:

  verdict.setDefaultSchema(schema) // tpch1g
  verdict.sql("bypass cache table lineitem")
  verdict.sql("bypass cache table orders")
  verdict.sql("bypass cache table verdictdbmeta.verdictdbmeta")
  verdict.sql("bypass cache table lineitem_scramble")
  verdict.sql("bypass cache table orders_scramble")
  val q_verdict = spark.sparkContext.getConf.get("spark.verdictdb.query") // Q1, Q6, or Q14
  val rs_verdict = verdict.sql(q_verdict)
  rs_verdict.print()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants