-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iceberg: refactor IcebergMetadata to support multiple catalogs #6977
Conversation
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@trino.io. For more information, see https://github.com/trinodb/cla. |
return catalogImpl; | ||
} | ||
|
||
@Config("iceberg.catalog-impl") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a simpler way is to make this metadataType
for users to choose among options like GENERIC, HIVE, etc. There are quite a few places that would require a switch statement and can take advantage of an enum.
|
||
@Inject | ||
public IcebergResourceFactory( | ||
StaticMetastoreConfig metastoreConfig, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently I still need to set a dummy value for hive.metastore.uri
due to the check in StaticMetastoreConfig
, otherwise connector initialization would fail with NPE. Would be great if anyone knows how to avoid initializing that based on the config flag.
} | ||
} | ||
|
||
private static Configuration getHadoopConfiguration() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for draft only, we need a better way to get the latest Hadoop configuration to initialize Iceberg resource. Currently the connector mostly rely on hdfsEnvironment.getConfiguration(hdfsContext, path)
, but here we need a way to get configuration without a path. I am not very familiar with this because for Glue we can just pass in null
. Currently I just borrowed some code from Hive's ConfigurationUtils
to make this work.
} | ||
} | ||
|
||
private static Map<String, String> getCatalogProperties(StaticMetastoreConfig metastoreConfig, IcebergConfig icebergConfig) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our internal Iceberg connector, we have a well-defined set of catalog properties. However, in open source users should be able to input arbitrary catalog property. This can be done in Spark and Flink easily, but Trino is a bit harder due to its use of a POJO to parse and hold the config map. I can use a string config additional-config
and convert it to a map, but that sounds like a hack. Any better suggestions?
private String metadataImpl = IcebergMetadata.class.getName(); | ||
private String catalogImpl = HiveCatalog.class.getName(); | ||
private String catalogName = "trino"; | ||
private String ioImpl = HadoopFileIO.class.getName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can define enums to define the metastore/file system type like we do for hive connector.
@cla-bot check |
The cla-bot has been summoned, and re-checked this pull request! |
Thanks for sending this draft. Supporting alternate catalogs is definitely a welcome feature and something I've wanted for a while. It might be helpful to have a call about this, but I'll start with some initial comments here:
|
I am also interested in alternative iceberg catalogs. @electrum and @jackye1995 what do you think the next steps are here and can this be broken down into a few smaller chunks? I would love to help but not sure where I can best do that. |
@rymurr Hi Ryan, thanks for bringing this up, too many things are going on and I almost forget about this PR. @electrum thanks for the review, for your comments:
I think I am mostly interested in
Understandable, but as a part of effort to move systems to use AWS SDK v2, we find it difficult to migrate all the Hadoop dependencies, and The only difference between the 2 approach is that the current code path runs through the hive connector and is hard-coded to use
That would make it basically identical to the existing |
Trino has its own We'd need a way to test it at scale with realistic workloads and verify that there aren't any regressions in performance, memory, etc. Given that the S3 support in Trino is stable and has been working well for years, and that v1 seems to be actively maintained, I think there'd need to be significant benefits to upgrading to be worth the potential risk. We don't want to have a separate S3 implementation for Iceberg, since that would mean duplicating all of the config options that we already have for S3: https://trino.io/docs/current/connector/hive-s3.html I think we can separate FileIO from the Catalog change, since they seem independent.
Yes, it will be a large refactoring, but it's what needs to happen to allow multiple implementations. It will make the code cleaner as well, since all the metastore related code will be behind an interface.
That would be me and @phd3. |
@jackye1995 shout if I can help w/ the catalog refactor. |
b5e949d
to
e74a4a1
Compare
@electrum @rymurr @findepi I have separated out the Glue part of work and make this PR only for refactoring so that the changes are more clear. Glue and other catalog support can be added in different PRs once we agree upon the interface. This also allows us to work on adding new catalogs in parallel. The approach is mostly described in the Slack room, where the I expect other Iceberg native catalogs to be plugged in through something like: public class TrinoGlueCatalog extends GlueCatalog implements TrinoIcebergCatalog {
...
} For each new session, a new The current caching logic in IcebergMetadata is refactored into @rdblue you might also be interested in this, since there are view related APIs in |
@jackye1995 thanks. I skimmed through PR and I am not convinced that we need the pattern where new You mentioned that "session should be used to initialize a new catalog to ensure the access control is set correctly for each session". That may block us from making The approach I propose simplifies cache management, as then cache becomes sole responsibility of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jackye1995 for continuing this effort! +1 to @losipiuk's comment. It'd also make it easier for the different implementations to evolve separately in terms of using different Iceberg APIs. SupportNameSpaces
and Catalog
look like good references to decide the common interface methods (even though not all might be strictly required in Trino atm).
I think we can still see whether it makes sense to keep the cache in IcebergMetadata
, and ask the catalog for relevant information only when required. It looks like things we want we can keep are generic iceberg objects. We can keep the cache invisible to catalogs. That way caching logic (and invalidation logic if any in future) doesn't need to be replicated across catalogs.
@losipiuk @phd3 thanks for the comments!
Because Iceberg's catalog is basically designed to be single tenant, for each session a new catalog is required to ensure security. So if we want to let the methods of the catalog take session, I think directly implementing the The original implementation was written with this principle, to have basically 2 I am fine with either approach, we just need to make a decision here, with:
Please let me know which way you guys think is better to proceed forward, and I can make change accordingly. |
I think having two separate
From those two I would prefer to go with (2) (even if it is some more boilerplate). I am not a big fan of basing
As for cache integration, I think what @phd3 proposes (to try to keep cache at @electrum please also chime in as you were involved in initial discussions regarding this PR. |
Yes caching can always be added later for whatever approach we go, I am not super concerned about that.
I do not fully get how approach 2 would solve the problem comparing to having 2 connector metadata implementations that just inherit the same base class to avoid duplicated code. Let's take The Hive version would be:
with metadata being a singleton. The Iceberg version would need to create a new catalog from the session, and then:
And most of the metadata operations follow a similar pattern. If there is an intermediate interface to adopt both approaches, I think we are just moving the complexity one level down without solving the actual issue of code duplication. And we now have to maintain on more additional interface in the process. @losipiuk am I misunderstanding the proposal you have or missing anything? |
Yeah - with the base class you inherit from it would work. I missed that part. But I think it is slightly cleaner to separate the low level catalog access interface with methods like:
From higher level methods
Technically you can expose the first set of methods as |
@losipiuk cool, I think we are actually on the same line then. I had a discussion with @rdblue about the issue related to authZ difference in Iceberg and Trino, and we came to the conclusion that Iceberg's design did not have much consideration of a dynamic authZ for different calls to the same metastore, and that is something probably worth adding directly in Iceberg core if possible. I think there can be a middle ground to go with approach 2 by building an authZ layer Trino will use Then there is a single And I would hope to have that Some details around the view interface might need to be discussed in more details as Iceberg is adding ViewCatalog, but apart from that I think this should be a good plan that fits all the requirements discussed. I will start to update the code based on this plan, please let me know if you have any additional suggestions. |
6d74ca9
to
21dcca0
Compare
aae6ebb
to
160b136
Compare
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/CatalogType.java
Show resolved
Hide resolved
160b136
to
d6b881c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new interface looks great! Thanks for your work on this PR. added some comments, mostly nits.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitManager.java
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/HiveTableOperations.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/HiveTableOperations.java
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TrinoHiveCatalog.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TrinoCatalog.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TrinoHiveCatalog.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergTableProperties.java
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. Please address the comments from @phd3 and then we can merge this. Thanks for all the work on this. I know that having to keep rebasing a big refactor like this is a pain.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TrinoCatalogFactory.java
Outdated
Show resolved
Hide resolved
private final String trinoVersion; | ||
private final boolean useUniqueTableLocation; | ||
|
||
private final Map<SchemaTableName, TableMetadata> tableMetadataCache = new ConcurrentHashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually brings up the question of scope of TrinoCatalog
. A Trino ConnectorMetadata
instance (i.e. IcebergMetadata
) is created per SQL transaction via Connector.getMetadata()
. Thus, we store the snapshot isolation info in the IcebergMetadata
instance.
The snapshot isolation is primarily handled by the snapshotIds
map in IcebergMetadata
, to ensure we use the same table snapshots for the entire transaction. I believe that tableMetadataCache
was added for performance. However, I think since Iceberg (currently?) does not support transactional (historical) metadata, the metadata cache is also required for snapshot isolation.
I would prefer to keep both snapshotIds
and tableMetadataCache
in IcebergMetadata
, since that's the canonical place to store transaction information, and it keeps the catalog implementation simpler.
Going back to the TrinoCatalog
scope question, what is expected? Do they need to be thread-safe? Could we turn TrinoCatalogFactory
into a Guice Provider
and inject TrinoCatalog
as a singleton?
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TrinoHiveCatalog.java
Outdated
Show resolved
Hide resolved
d6b881c
to
ecca5e0
Compare
@phd3 @electrum thanks for the review, I have updated all the codestyle suggestions. Regarding the scope of Regarding why the metadata cache is in |
ecca5e0
to
8dc1ca6
Compare
@phd3 sorry some of your comments were hidden, I updated based on those. |
Relevant failure, looks like an exception message needs updating:
|
8dc1ca6
to
5f29bb3
Compare
5f29bb3
to
a110d09
Compare
Thanks! |
Does this already allow Iceberg to work with Glue? |
Hello @jackye1995
|
@jackye1995 I'm wondering the same thing as @findepi, does this contain the work to work for Glue or was it refactoring to allow the ability for that? |
@findepi @ckdarby @dungdm93 not yet for Glue, good timing to ask, I am writing the PR for that, will publish tomorrow and ping you guys. For the tables created with Hive storage handler, most likely it would work with the Trino's implementation, but there might be some edge cases I need to check because it's created by a different implementation after all. And Trino only supports a small subset of Iceberg table properties, so if you set some properties it's not likely to be respected by Trino. |
@jackye1995 Do you still also plan to add support for Hadoop catalogs? I know it was never your primary use case, but I am very interested in getting that added to Trino. |
Hi, this is Jack from AWS Athena. As Iceberg recently released 0.11.0, users can now dynamically load
Catalog
andFileIO
implementations. This feature is already supported in Spark and Flink, but Trino's Iceberg connector is strongly coupled with Hive and cannot easily integrate with this feature. We believe that doing this integration can enable a much broader use case for Iceberg through Trino, as Iceberg is now supporting more and more new catalogs likeGlueCatalog
,NessieCatalog
,JdbcCatalog
. It will also provide a generic solution for PRs like #6528. There are 2 major blockers we see for integration:HiveMetastore
rather than Iceberg'sCatalog
HdfsEnvironment
to leverage existing source and sink in the Hive connectorIn this PR, we propose the following approach to update Iceberg connector to complete the integration. The approach is a modified version of our internal Iceberg connector implementation and updated based on the current Trino codebase:
IcebergConfig
is treated as the place to load Iceberg catalog propertiesIcebergResourceFactory
is introduced to loadCatalog
andFileIO
based on session and config information.IcebergGenericMetadata
is introduced to perform all the metadata operations instead ofIcebergMetadata
.IcebergMetadata
. We think it can be renamed toIcebergHiveMetadata
(I did not do in this PR to avoid too many file changes) and extend the generic metadata. The key reason to continue using it would be to leverage Hive's native security features. For example, some APIs such assetSchemaAuthorization
are not supported natively in Iceberg.IcebergMetadataFactory
is used to load the correct metadata class based on a flag in Iceberg config.HdfsParquetPageSource
. In this PR, we only show an example Parquet read code path, which introducesIcebergParquetPageSource
that is built upon Iceberg's nativeSeekableInputStream
.So after all these changes, user can even run this connector without a Hadoop or Hive environment. For example, I am able to start the connector and do a read query of my existing Iceberg warehouse on AWS Glue and S3 with the following configurations:
Please note that this is a draft for discussion, I did not take time to fix unit tests and only tested directly through SQL in CLI. If we can agree on the general approach after discussion, I will start to put this draft into smaller PRs for contribution. I will also post some discussion points as comments in code.