Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iceberg reuse information from table handle #14079

Merged

Conversation

findinpath
Copy link
Contributor

Description

Instead of loading the table from the catalog, reuse as much as possible the information already packed into the IcebergTableHandle

Non-technical explanation

Internal optimizations

Release notes

(x) This is not user-visible and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@findepi
Copy link
Member

findepi commented Sep 12, 2022

Iceberg tests are failing

@findinpath findinpath force-pushed the iceberg-reuse-information-from-table-handle branch from a31ce06 to b08b9da Compare September 12, 2022 11:10
@findinpath
Copy link
Contributor Author

Related discussion:

#14076 (comment)

@findinpath findinpath force-pushed the iceberg-reuse-information-from-table-handle branch from 42d1bdd to ad2752e Compare September 14, 2022 11:51
@@ -695,8 +695,11 @@ public Optional<ConnectorOutputMetadata> finishCreateTable(ConnectorSession sess
public Optional<ConnectorTableLayout> getInsertLayout(ConnectorSession session, ConnectorTableHandle tableHandle)
{
IcebergTableHandle table = (IcebergTableHandle) tableHandle;
Table icebergTable = catalog.loadTable(session, table.getSchemaTableName());
return getWriteLayout(icebergTable.schema(), icebergTable.spec(), false);
Schema schema = SchemaParser.fromJson(table.getTableSchemaJson());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SchemaParser.fromJson isn't very expensive because it's cached.
If it was not cached, it could be expensive.

catalog.loadTable isn't very expensive because it's cached.
if it was not cached, we could have query consistency issues (like one query accessing same table twice, and reading different versions)

how do we assess which one is actually better?
it sounds that in any case we rely on some (hidden) caching taking place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I find SchemaParser.fromJson(tableHandle.getSchemaJson()) less hidden than the loadTable option. The fact that it's fast is hidden but the fact that it's the correct schema is obvious. To me the loadTable option needs an extra step to reason that it is correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, but don't we rely on caching catalog.loadTable anyway?
e.g. we want to read consistent table version in self-join query case.
So this is not only about schemas (which we can and do carry in the table handle), but also about other state.

I think i tend to agree that -- if we carry info in a table handle, this is "the version of information" to be used

@findepi findepi merged commit 6acfd82 into trinodb:master Sep 19, 2022
@github-actions github-actions bot added this to the 397 milestone Sep 19, 2022
@findinpath findinpath self-assigned this Sep 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

3 participants