Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support syntax and AST building for Materialized View Commands ... #3283

Merged
merged 1 commit into from Sep 1, 2020

Conversation

anjalinorwood
Copy link
Member

... like CREATE MATERIALIZED VIEW, REFRESH MATERIALIZED VIEW
and DROP MATERIALIZED VIEW.

Much like a logical view, a materialized view has a SQL query associated with it.
Unlike logical view, it stores data corresponding to the SQL query.

This commit adds support to parse the materialized view related commands and
build an AST for those commands. This commit does not include connector-side
implementation of materialized views.

Materialized views are modeled as an extension of logical views with additional
properties such as partitioning.

Given that materialized views can be seen as a combination of a view and a table,
access control for a CREATE MATERIALIZED VIEW command is a combination of access
checks for CREATE TABLE and CREATE VIEW commands.
Similarly, a REFRESH MATERIALIZED VIEW command is a combination of DELETE and INSERT
operations and access checks for this command is a combination of access checks
for DELETE and INSERT.
Lastly, a DROP MATERIALIZED VIEW access check is a combination of DROP TABLE and
DROP VIEW command.

@findepi
Copy link
Member

findepi commented Mar 30, 2020

@anjalinorwood how does this relate to Hive 3's materialized views?

@anjalinorwood
Copy link
Member Author

@anjalinorwood how does this relate to Hive 3's materialized views?

Not related to Hive 3's materialized views. Details here: https://docs.google.com/document/d/1MOtYt7BFNFoDBc7SwCStHzXy6cyzOvAKURH3hbnk8VU/edit?usp=sharing

I have been talking to @martint for this feature. Would you like to be added to that email conversation?

@findepi
Copy link
Member

findepi commented Mar 30, 2020

@anjalinorwood yes, thanks

@kokosing
Copy link
Member

@anjalinorwood How it is going to be modeled in connectors and how refresh is going to be implemented in the execution engine? The doc you mentioned is only about the syntax.

@anjalinorwood
Copy link
Member Author

@anjalinorwood How it is going to be modeled in connectors and how refresh is going to be implemented in the execution engine? The doc you mentioned is only about the syntax.

There is an open question around how to implement refresh in the engine. A proposal here:
presto-main/src/main/java/io/prestosql/execution/RefreshMaterializedViewTask.java
https://docs.google.com/document/d/1GYIyEhJQ3ngvOPmJYWU-8zwDQRCMXkBDZaSn3c3MF0k/edit?usp=sharing

As for Connector side implementation, here at Netflix we will start with Iceberg. Some details here:
https://docs.google.com/document/d/1GYIyEhJQ3ngvOPmJYWU-8zwDQRCMXkBDZaSn3c3MF0k/edit?usp=sharing

PR for API is here: #3061

The idea is to nail down the syntax for create, refresh, drop of materialized view and the connector API, so that community can start with materialized view implementation for their favorite connector.

In this first version, we are not proposing automatic rewrite / query routing to materialized views. The user query will be written against the materialized view. Refresh provides a convenient way to keep materialized views fresh. (Incremental refresh can be implemented by the connector).

@shlomialfasi
Copy link
Contributor

@anjalinorwood can you add me also to the email conversation that you mentioned?
We in Varada.io are having our own implementation of materialized view which have a lot in common with your proposal. We will be happy to share our insights and discuss more about the details.

@anjalinorwood
Copy link
Member Author

@anjalinorwood can you add me also to the email conversation that you mentioned?
We in Varada.io are having our own implementation of materialized view which have a lot in common with your proposal. We will be happy to share our insights and discuss more about the details.

With the community interest in this feature, it is a good idea to keep the discussion on Github/gdrive. Turns out there are no additional details in the email. Relevant links are in my comment here: #3283 (comment)

I looked at the Verada syntax, it looks similar to the proposal above. :-)

@anjalinorwood
Copy link
Member Author

Final proposal for syntax for create/refresh/drop materialized views is here: https://docs.google.com/document/d/10jPGw3t-Tu8OgWo5oC9d-O8d1PVdnAbEhyfvnJA0T8U/edit

@anjalinorwood anjalinorwood force-pushed the oss_syntax branch 2 times, most recently from 58a9c0b to ec96f88 Compare June 5, 2020 21:51
@anjalinorwood anjalinorwood marked this pull request as draft June 8, 2020 14:28
@anjalinorwood anjalinorwood force-pushed the oss_syntax branch 5 times, most recently from 4bf8803 to 59c0d85 Compare June 10, 2020 21:51
@anjalinorwood anjalinorwood force-pushed the oss_syntax branch 3 times, most recently from ac9b6f4 to 4e5e73f Compare June 18, 2020 18:37
@anjalinorwood anjalinorwood force-pushed the oss_syntax branch 2 times, most recently from 2d7d100 to b6ec49e Compare August 31, 2020 17:57
@anjalinorwood anjalinorwood marked this pull request as ready for review August 31, 2020 17:59
This commit adds support for Materialized Views in Presto engine.
Much like a logical view, a materialized view has a SQL query associated with it.
Unlike logical view, it stores data corresponding to the SQL query.

The commit adds support for commands like CREATE MATERIALIZED VIEW, REFRESH MATERIALIZED VIEW,
SHOW CREATE MATERIALIZED VIEW and DROP MATERIALIZED VIEW.
The commit adds support for reading data from a  materialized view when it is fresh with
respect to its underlying base tables. When a materialized view is stale with respect to its
base tables, the materialized view is resolved to base tables using the associated definition.
Querying the materialized view always returns the current/fresh data irrespective of the state
of the materialized view.

A materialized view is modeled as a combination of a SQL definition and a storage table
that holds the data.

The ‘Refresh Materialized View’ command is implemented as follows:
REFRESH MATERIALIZED VIEW Implementation:
+ Refresh materialized view operation is implemented as a table writer that drops partitions from,
  deletes data from and inserts data into the storage table as needed. The source of the data is
  the query associated with the materialized view.
+ A new type of TableWriterOperator, ‘RefreshMaterializedViewTarget’ is implemented. This translates
  into two connector API calls ‘beginRefreshMaterializedView’ and ‘finishRefreshMaterializedView’.
+ StatementAnalyzer determines if the materialized view is fresh and sets the flag in Analysis.
  If the materialized view is fresh, logical planner plans the refresh operation as a no-op.
+ The ‘beginRefreshMaterializedView’ implementation for a connector is expected to do the following:
  + Start a transaction
  + Drop specified partitions of the storage table based on input parameters (applicable only
    for incremental refresh of the materialized view)
  + Delete data from specified partitions of the storage table or all of the data from the
    storage table based on input parameters (applicable for incremental refresh and full refresh
    respectively)
  + Return a ConnectorInsertTableHandle
+ The ‘finishRefreshMaterializedView’ implementation for a connector is expected to do the following:
  + Insert data into the storage table based on parameters
  + Store the table tokens for the base tables in the storage table
  + Commit the transaction.
+ Note that the refresh materialized view operation is performed in the scope of a single
  transaction in the connector.

Access control:
Given that materialized views can be seen as a combination of a view and a table,
access control for a CREATE MATERIALIZED VIEW command is a combination of access
checks for CREATE TABLE and CREATE VIEW commands.
Similarly, a REFRESH MATERIALIZED VIEW command is a combination of DELETE and INSERT
operations and access checks for this command is a combination of access checks
for DELETE and INSERT.
Lastly, a DROP MATERIALIZED VIEW access check is a combination of DROP TABLE and
DROP VIEW command.
@martint martint merged commit 88116a4 into trinodb:master Sep 1, 2020
@martint martint added this to the 341 milestone Sep 1, 2020
@kokosing
Copy link
Member

kokosing commented Sep 1, 2020

@anjalinorwood @martint We need access control checks for this. See #5041

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

5 participants