Iceberg reuse information from table handle #14079

findinpath · 2022-09-09T15:02:05Z

Description

Instead of loading the table from the catalog, reuse as much as possible the information already packed into the IcebergTableHandle

Non-technical explanation

Internal optimizations

Release notes

(x) This is not user-visible and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

findepi · 2022-09-12T08:07:53Z

Iceberg tests are failing

findinpath · 2022-09-12T15:40:53Z

Related discussion:

#14076 (comment)

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java

findepi · 2022-09-16T20:59:19Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java

@@ -695,8 +695,11 @@ public Optional<ConnectorOutputMetadata> finishCreateTable(ConnectorSession sess
    public Optional<ConnectorTableLayout> getInsertLayout(ConnectorSession session, ConnectorTableHandle tableHandle)
    {
        IcebergTableHandle table = (IcebergTableHandle) tableHandle;
-        Table icebergTable = catalog.loadTable(session, table.getSchemaTableName());
-        return getWriteLayout(icebergTable.schema(), icebergTable.spec(), false);
+        Schema schema = SchemaParser.fromJson(table.getTableSchemaJson());


SchemaParser.fromJson isn't very expensive because it's cached.
If it was not cached, it could be expensive.

catalog.loadTable isn't very expensive because it's cached.
if it was not cached, we could have query consistency issues (like one query accessing same table twice, and reading different versions)

how do we assess which one is actually better?
it sounds that in any case we rely on some (hidden) caching taking place.

Personally, I find SchemaParser.fromJson(tableHandle.getSchemaJson()) less hidden than the loadTable option. The fact that it's fast is hidden but the fact that it's the correct schema is obvious. To me the loadTable option needs an extra step to reason that it is correct.

Okay, but don't we rely on caching catalog.loadTable anyway?
e.g. we want to read consistent table version in self-join query case.
So this is not only about schemas (which we can and do carry in the table handle), but also about other state.

I think i tend to agree that -- if we carry info in a table handle, this is "the version of information" to be used

cla-bot bot added the cla-signed label Sep 9, 2022

findinpath requested review from alexjo2144 and findepi September 9, 2022 15:02

findinpath mentioned this pull request Sep 9, 2022

Use table schema from the table handle #14076

Merged

findinpath force-pushed the iceberg-reuse-information-from-table-handle branch from 11e63d1 to a31ce06 Compare September 9, 2022 20:10

Reuse already computed schema

a41da26

findinpath force-pushed the iceberg-reuse-information-from-table-handle branch from a31ce06 to b08b9da Compare September 12, 2022 11:10

findepi reviewed Sep 14, 2022

View reviewed changes

findinpath added 2 commits September 14, 2022 13:18

Avoid reloading the Iceberg table

5afc63f

Avoid reloading the Iceberg table for optimize

ad2752e

findinpath force-pushed the iceberg-reuse-information-from-table-handle branch from 42d1bdd to ad2752e Compare September 14, 2022 11:51

empty

139e649

findepi approved these changes Sep 16, 2022

View reviewed changes

findepi reviewed Sep 16, 2022

View reviewed changes

alexjo2144 approved these changes Sep 19, 2022

View reviewed changes

findepi merged commit 6acfd82 into trinodb:master Sep 19, 2022

github-actions bot added this to the 397 milestone Sep 19, 2022

findinpath self-assigned this Sep 20, 2022

colebow mentioned this pull request Sep 20, 2022

Add Trino 397 release notes #14194

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Iceberg reuse information from table handle #14079

Iceberg reuse information from table handle #14079

findinpath commented Sep 9, 2022

findepi commented Sep 12, 2022

findinpath commented Sep 12, 2022

findepi Sep 16, 2022

alexjo2144 Sep 19, 2022 •

edited

findepi Sep 19, 2022

Iceberg reuse information from table handle #14079

Iceberg reuse information from table handle #14079

Conversation

findinpath commented Sep 9, 2022

Description

Non-technical explanation

Release notes

findepi commented Sep 12, 2022

findinpath commented Sep 12, 2022

findepi Sep 16, 2022

Choose a reason for hiding this comment

alexjo2144 Sep 19, 2022 • edited

Choose a reason for hiding this comment

findepi Sep 19, 2022

Choose a reason for hiding this comment

alexjo2144 Sep 19, 2022 •

edited