New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iceberg reuse information from table handle #14079
Iceberg reuse information from table handle #14079
Conversation
11e63d1
to
a31ce06
Compare
Iceberg tests are failing |
a31ce06
to
b08b9da
Compare
Related discussion: |
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
42d1bdd
to
ad2752e
Compare
@@ -695,8 +695,11 @@ public Optional<ConnectorOutputMetadata> finishCreateTable(ConnectorSession sess | |||
public Optional<ConnectorTableLayout> getInsertLayout(ConnectorSession session, ConnectorTableHandle tableHandle) | |||
{ | |||
IcebergTableHandle table = (IcebergTableHandle) tableHandle; | |||
Table icebergTable = catalog.loadTable(session, table.getSchemaTableName()); | |||
return getWriteLayout(icebergTable.schema(), icebergTable.spec(), false); | |||
Schema schema = SchemaParser.fromJson(table.getTableSchemaJson()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SchemaParser.fromJson
isn't very expensive because it's cached.
If it was not cached, it could be expensive.
catalog.loadTable
isn't very expensive because it's cached.
if it was not cached, we could have query consistency issues (like one query accessing same table twice, and reading different versions)
how do we assess which one is actually better?
it sounds that in any case we rely on some (hidden) caching taking place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I find SchemaParser.fromJson(tableHandle.getSchemaJson())
less hidden than the loadTable
option. The fact that it's fast is hidden but the fact that it's the correct schema is obvious. To me the loadTable option needs an extra step to reason that it is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, but don't we rely on caching catalog.loadTable
anyway?
e.g. we want to read consistent table version in self-join query case.
So this is not only about schemas (which we can and do carry in the table handle), but also about other state.
I think i tend to agree that -- if we carry info in a table handle, this is "the version of information" to be used
Description
Instead of loading the table from the catalog, reuse as much as possible the information already packed into the
IcebergTableHandle
Non-technical explanation
Internal optimizations
Release notes
(x) This is not user-visible and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text: