Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iceberg View Rest Catalog support #19818

Merged
merged 1 commit into from
May 22, 2024

Conversation

amogh-jahagirdar
Copy link
Contributor

Description

This change integrates Iceberg View support https://iceberg.apache.org/view-spec/ in the REST catalog.

Additional context and related issues

See #14120

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
(X ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

Comment on lines 486 to 488
// Set a best effort view definition
viewDefinition = Optional.of(new ConnectorViewDefinition(
sql, defaultCatalog, defaultNamespace, viewColumns, comment, Optional.empty(), false, null));
Copy link
Contributor Author

@amogh-jahagirdar amogh-jahagirdar Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will be a key point of discussion but yes best effort means "If we didn't find a Trino dialect, let's use an arbitrary SQL dialect". So here are my thoughts:

  1. Most views are trivial (such as simple projections, aggregations and joins) which share the same semantics across engines.
  2. One of the benefits of having a shared view metadata is being able to access information in other dialects.
  3. Considering 1 and 2, it then seems to be reasonable to simplify the user experience for Trino users, to be able to create views in other systems and then query those same views in Trino without having to do any recreation.

Now, in the current implementation this best effort behavior happens by default. To avoid correctness issues what we can do is make this behavior opt in via a connector property (defaulting it to false to avoid the best effort behavior). That way users who know their domain of queries can take advantage of this benefit of Iceberg view metadata.

On Hive views, those seem to leverage Coral, which I don't think we want to do here. Instead of an IR layer that can be represented as a separate dialect in the future in Iceberg views

@amogh-jahagirdar amogh-jahagirdar marked this pull request as ready for review November 28, 2023 20:33
@amogh-jahagirdar
Copy link
Contributor Author

amogh-jahagirdar commented Nov 28, 2023

Ok so there's still some things I need to look into, here's my list:

1.) Dialect resolution will be a key point of discussion #19818 (comment) -> refer here for the details.

2.) Updating the comment on a column on view should be implemented.

3.) Testing. Right now Trino REST smoke tests use JDBC catalog which doesn't implement ViewCatalog. We pass through some Hadoop HDFS configuration through to the JDBC catalog. For testing views, what I have done currently is a catalog implementation which uses both the JDBC catalog and InMemoryCatalog for views. It's a bit messy, I'll look into options here. Ideally we could just use the InMemoryCatalog but I don't think there's a way to pass in the Hadoop config details we would want.

@findepi
Copy link
Member

findepi commented Nov 29, 2023

1.) Dialect resolution will be a key point of discussion #19818 (comment) -> refer here for the details.

please see #19818 (comment) for how to unblock this PR.

@findepi
Copy link
Member

findepi commented Nov 29, 2023

cc @alexjo2144 @findinpath

@Override
public void createNamespace(Namespace namespace, Map<String, String> metadata)
{
super.createNamespace(namespace, metadata);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These duplicated calls feel a bit strange.
If the first succeeds and the second call fails , it would be rather misleading from a user perspective to actually see the namespace created for tables, even though the call failed.

@@ -26,7 +26,7 @@
<air.test.parallel>instances</air.test.parallel>
<dep.keycloak.version>21.1.2</dep.keycloak.version>
<!-- Nessie version (matching to Iceberg release) must be bumped along with Iceberg version bump to avoid compatibility issues -->
<dep.nessie.version>0.71.0</dep.nessie.version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate commit -> relates probably to iceberg 1.5.x bump (first commit in the PR)

Copy link
Contributor Author

@amogh-jahagirdar amogh-jahagirdar Jan 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.5.0 release should be sometime soon (hopefully this month but of course depends). I've been using the snapshot release for development of this feature. I think what most likely would happen is when 1.5.0 is released we'd have a separate PR out for the dependency upgrade and I'd rebase this change on top of that after it gets merged. Let me know what you think @findepi @findinpath , alternatively I could include it as separate commit in this PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this PR can't be merged until 1.5.0 is released right? (It has dependency on some view APIs?)

I sent a mail yesterday in Iceberg mailing list about 1.5.0 release. Lets make the release happen within this month. Then a separate PR to bump Iceberg version and this PR can be merged after that.

But for development we can depend on snapshot version and keep the PR as WIP or Draft.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajantha-bhat Yes that's what I mean. After 1.5, we'd most likely have a PR just for the upgrade and then just rebase this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findepi @findinpath Just a heads up, all the dependency updates you see in this PR such as the Iceberg 1.5 update and the Nessie update will be in a separate PR, along with any of the relevant changes that come up as part of the upgrade. https://github.com/trinodb/trino/pull/20308/files is the PR.

Copy link
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me overall. I agree with @dain's design comments. Once those and the other comments are addressed, this is good to merge. Thanks for all your work and patience on this.

public static final String ICEBERG_VIEW_RUN_AS_INVOKER = "trino_run_as_invoker";
public static final String ICEBERG_VIEW_OWNER = "trino_owner";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me. We also only need the single field trino_run_as_owner that contains the owner. If this is present, then it's a definer view with that owner. If empty, it's an invoker view.

ConnectorViewDefinition has both a boolean and an optional owner for legacy reasons. In the engine, ViewDefinition has a single Optional<Identity> runAsIdentity field.

@amogh-jahagirdar amogh-jahagirdar force-pushed the iceberg-view-support branch 4 times, most recently from 8771019 to eaa61be Compare May 21, 2024 19:59
Comment on lines +1053 to +1031
default boolean isView(ConnectorSession session, SchemaTableName viewName)
{
return getView(session, viewName).isPresent();
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default implementation of connector metadata#getView will preserve the existing behavior for other connectors by just doing getView().isPresent. IcebergMetadata overrides this implementation so we can still return true in the case of multiple dialects.

@amogh-jahagirdar amogh-jahagirdar force-pushed the iceberg-view-support branch 2 times, most recently from ab7b28f to a7faa6f Compare May 21, 2024 20:20
Comment on lines +2521 to +2517
if (e.getErrorCode() == ICEBERG_UNSUPPORTED_VIEW_DIALECT.toErrorCode()) {
return true;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for the Iceberg specific error code is so that we can handle multiple dialects properly in the isView logic and distinguish it from other errors. We can just catch this particular code and return true, and other catalogs in the future can just do what the rest catalog does and it should just work. Another benefit of this is that we don't need to plumb isView all the way down into TrinoCatalog and add to another interface.

public static final String ICEBERG_VIEW_RUN_AS_INVOKER = "trino_run_as_invoker";
public static final String ICEBERG_VIEW_OWNER = "trino_owner";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point, we only really need trino_run_as_owner with the actual owner value. Everything else can be inferred from the existence of that field. I've updated.

@amogh-jahagirdar
Copy link
Contributor Author

My latest update addressed all the comments @electrum , thanks for the review!

@@ -41,6 +41,7 @@ public enum IcebergErrorCode
ICEBERG_WRITER_CLOSE_ERROR(14, EXTERNAL),
ICEBERG_MISSING_METADATA(15, EXTERNAL),
ICEBERG_WRITER_DATA_ERROR(16, EXTERNAL),
ICEBERG_UNSUPPORTED_VIEW_DIALECT(17, EXTERNAL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I like making this Iceberg specific

@@ -120,6 +120,9 @@ public abstract class AbstractTrinoCatalog
implements TrinoCatalog
{
public static final String TRINO_CREATED_BY_VALUE = "Trino Iceberg connector";
public static final String ICEBERG_VIEW_SQL_DIALECT = "trino";
public static final String ICEBERG_VIEW_RUN_AS_OWNER = "trino_run_as_owner";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we name the property trino.run_as_owner?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would be better since typically we namespace properties and separate those with a period. Right now, this is the only property, but down the line there can be more in Trino and we'd want to follow the established pattern.

I think we also want to remove the snake case in favor of dashes. That is the pattern that iceberg typically follows for property names. see https://iceberg.apache.org/docs/1.5.0/view-configuration/?h=replace.dro#view-properties

So the final property name would be:

trino.run-as-owner

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me

@amogh-jahagirdar amogh-jahagirdar force-pushed the iceberg-view-support branch 3 times, most recently from b1efa04 to 9fceece Compare May 22, 2024 01:21
@github-actions github-actions bot added the jdbc Relates to Trino JDBC driver label May 22, 2024
@electrum electrum merged commit 8d57334 into trinodb:master May 22, 2024
93 of 94 checks passed
@electrum
Copy link
Member

Thanks!

@github-actions github-actions bot added this to the 449 milestone May 22, 2024
@mosabua
Copy link
Member

mosabua commented May 23, 2024

Dont we need to document this somehow? Also .. need release notes entry.

@amogh-jahagirdar
Copy link
Contributor Author

Good point @mosabua we certainly should, I'm happy to raise another PR to document this feature.

@mosabua
Copy link
Member

mosabua commented May 24, 2024

Great @amogh-jahagirdar - pull me in as reviewer and I can help with merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed delta-lake Delta Lake connector docs hive Hive connector iceberg Iceberg connector jdbc Relates to Trino JDBC driver
Development

Successfully merging this pull request may close these issues.

None yet