Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Iceberg materialized views to federate #15108

Merged
merged 4 commits into from
Nov 25, 2022

Conversation

findepi
Copy link
Member

@findepi findepi commented Nov 18, 2022

Probably fixes #13131

Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I leave the approval to other reviewers who're familiar with materialized views.

@findepi
Copy link
Member Author

findepi commented Nov 21, 2022

thank you @ebyhr for your review. AC. PTAL

raunaqmorarka
raunaqmorarka previously approved these changes Nov 21, 2022
@findepi findepi force-pushed the findepi/iceberg-mv-federate branch 5 times, most recently from e9f0619 to 1ba2559 Compare November 22, 2022 10:50
@findepi
Copy link
Member Author

findepi commented Nov 22, 2022

@raunaqmorarka added tests reproducing the problem and addressed

We have effectively three different states an MV can be in

  • "fresh" -- can be queried, and REFRESH is a no-op
  • "stale" -- cannot be queried (needs to be inlined), and REFRESH should refresh it
  • "in between" -- e.g. federated mviews -- can be queried, but REFRESH should refresh it

To support this, i changed MaterializedViewFreshness. It's boolean field could not convey those different meanings, so replaced it with an enum.

cc @losipiuk @sopel39 @martint @alexjo2144 @romanvainb

@findinpath
Copy link
Contributor

findinpath commented Nov 22, 2022

testing/bin/ptl env up --environment 'singlenode'  --config 'config-default' --without-trino
create table iceberg.default.itable1 (x integer);

insert into iceberg.default.itable1 values (1),(2),(3);

create materialized view iceberg.default.mvtest1
as select nationkey, name
from iceberg.default.itable1 itable1 
inner join tpch.sf1.nation tnation on itable1.x = tnation.nationkey;


refresh maerialized view  iceberg.default.mvtest1;

select * from  iceberg.default.mvtest1 ;

insert into iceberg.default.itable1 values (4);

select * from  iceberg.default.mvtest1 ;

I see the country corresponding to the nationkey 4 when selecting from iceberg.default.mvtest1 even though the materialized view was not refreshed. Is this supposed to be happening?

SELECT * FROM system.metadata.materialized_views;

The freshness for mvtest1 is always STALE even right after refreshing the materialized view. Wasn't is supposed to be unknown due to the fact that it is "federated" with a tpch table?

@findepi
Copy link
Member Author

findepi commented Nov 24, 2022

(just rebased)

@findepi
Copy link
Member Author

findepi commented Nov 24, 2022

I see the country corresponding to the nationkey 4 when selecting from iceberg.default.mvtest1 even though the materialized view was not refreshed. Is this supposed to be happening?

Yes.

In this case, there are two base tables

  1. iceberg table
  2. non-iceberg table (tpch)

When you have a modification in the iceberg table, the MV is known to be stale, so the freshness is STALE.
A STALE MV gets inlined by the engine. This is the same behavior that existed before my changes.

@findepi
Copy link
Member Author

findepi commented Nov 24, 2022

The freshness for mvtest1 is always STALE even right after refreshing the materialized view. Wasn't is supposed to be unknown due to the fact that it is "federated" with a tpch table?

can this be related to a typo in the command:

refresh maerialized view  iceberg.default.mvtest1;

?

otherwise I couldn't reproduce this.

@findepi
Copy link
Member Author

findepi commented Nov 24, 2022

AC, thank you @findinpath for thorough review.

The `listTables` callback takes schema name, so it should return table
names without qualification. Allowing it to return `SchemaTableName`
objects allows an implementation to incorrectly return tables with wrong
schema name.

For example, `TestEventListenerBasic.createQueryRunner` would create
such bogus `listTables` implementation.
Require going through `anyTree()` by making `matchToAnyNodeTree` impl
detail.
@raunaqmorarka
Copy link
Member

We should probably update iceberg or MV docs to explain what REFRESH MV does when all the source tables are iceberg tables vs when there is a non-iceberg table involved.

To a somewhat technical person, the previous wording would trigger
questions that docs didn't not answer explicitly.
@findepi
Copy link
Member Author

findepi commented Nov 25, 2022

@raunaqmorarka good point, updated the docs now.

@mosabua i am going to merge this when build passes. if you have comments, i will happily apply them, but it's also ok to improve the wording as a follow-up.

@findepi findepi merged commit 241f928 into trinodb:master Nov 25, 2022
@findepi findepi deleted the findepi/iceberg-mv-federate branch November 25, 2022 13:58
@github-actions github-actions bot added this to the 404 milestone Nov 25, 2022
@findepi findepi mentioned this pull request Nov 25, 2022
@mosabua
Copy link
Member

mosabua commented Nov 25, 2022

So AWESOME @findepi

@mosabua
Copy link
Member

mosabua commented Nov 25, 2022

Filed #15199 to follow up on the docs with some more clarifications.

@findinpath
Copy link
Contributor

can this be related to a typo in the command:

refresh maerialized view iceberg.default.mvtest1;

Indeed. It was apparently a typo.
I've retested the scenario now and see that the freshness is set to "UNKNOWN" (due to the dependency on the non versioned tpch table).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed docs enhancement New feature or request
Development

Successfully merging this pull request may close these issues.

Support cross-catalog materialized views in Iceberg
5 participants