Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix reading NDV using Iceberg metadata on table with NULL partitions #21844

Merged

Conversation

jinyangli34
Copy link
Contributor

@jinyangli34 jinyangli34 commented May 7, 2024

Description

Use NullableValue.asNull for null values.

Additional context and related issues

Issue can be reproduced when query count distinct on partition with NULL value and optimize_metadata_queries session enabled.

Query 20240507_185319_10808_yp4pm failed: value is null
java.lang.NullPointerException: value is null
	at java.base/java.util.Objects.requireNonNull(Objects.java:259)
	at io.trino.spi.predicate.NullableValue.of(NullableValue.java:68)
	at io.trino.plugin.iceberg.IcebergMetadata.lambda$getTableProperties$8(IcebergMetadata.java:671)
	at com.google.common.collect.CollectCollectors.lambda$toImmutableMap$7(CollectCollectors.java:196)

Release notes

(x) Release notes are required, with the following suggested text:

# Iceberg
* Fix failure when reading tables with `null` on partition columns and 
  `optimize_metadata_queries` session property is enabled. ({issue}`21844`)

@cla-bot cla-bot bot added the cla-signed label May 7, 2024
@github-actions github-actions bot added the iceberg Iceberg connector label May 7, 2024
@jinyangli34 jinyangli34 changed the title Fix reading DNV using Iceberg metadata on table with NULL partitions Fix reading NDV using Iceberg metadata on table with NULL partitions May 8, 2024
@raunaqmorarka raunaqmorarka added the bug Something isn't working label May 8, 2024
@ebyhr

This comment was marked as resolved.

@ebyhr ebyhr force-pushed the jinyang-fix_null_ndv_with_metadata branch from c16bc98 to 20d7a4d Compare May 8, 2024 05:15

return NullableValue.of(column.getType(), prestoValue);
return partitionColumnValueStrings.get(columnId)
.map(value -> deserializePartitionValue(column.getType(), value, column.getName()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add @Nullable to deserializePartitionValue method in a separated commit?

partitionColumnValueStrings.get(columnId).orElse(null),
column.getName());

return NullableValue.of(column.getType(), prestoValue);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider calling the NullableValue constructor instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I don't have a preference. I don't quite understand why it has both constructor and static methods.
Which way do you think is better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they have different semantics -- NullableValue.of doesn't allow null value

(i with the difference was more obvious from the NullableValue class's api)

Copy link

github-actions bot commented Jun 3, 2024

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Jun 3, 2024
@mosabua
Copy link
Member

mosabua commented Jun 3, 2024

I assume you are all still collaborating on this.

@github-actions github-actions bot removed the stale label Jun 4, 2024
@jinyangli34 jinyangli34 force-pushed the jinyang-fix_null_ndv_with_metadata branch from 20d7a4d to dcc4645 Compare June 7, 2024 21:47
@jinyangli34
Copy link
Contributor Author

@ebyhr PTAL

testTable));

Session session = Session.builder(getPlanTester().getDefaultSession())
.setSystemProperty("optimize_metadata_queries", "true")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raunaqmorarka we've been speaking already on optimize_metadata_queries session property for Iceberg/ Delta Lake offline.
We should figure out a way where it is not necessary for the user to specify explicitly this low level session property to make use of https://trino.io/docs/current/admin/properties-optimizer.html#optimizer-optimize-metadata-queries

@jinyangli34 jinyangli34 force-pushed the jinyang-fix_null_ndv_with_metadata branch from dcc4645 to 94632c3 Compare June 13, 2024 00:11
Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix reading DNV using Iceberg metadata on table with NULL partitions

Please change DNV to NDV in the commit title.

@jinyangli34 jinyangli34 force-pushed the jinyang-fix_null_ndv_with_metadata branch from b7715b0 to f2bb093 Compare June 13, 2024 05:18
Copy link

github-actions bot commented Jul 4, 2024

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Jul 4, 2024
@mosabua mosabua added stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed. and removed stale labels Jul 4, 2024
@mosabua
Copy link
Member

mosabua commented Jul 4, 2024

Is this ready to merge @ebyhr @findepi @findinpath .. also cc @cwsteinbach and @alexjo2144

@ebyhr ebyhr merged commit 83183b9 into trinodb:master Jul 5, 2024
43 checks passed
@github-actions github-actions bot added this to the 452 milestone Jul 5, 2024
@mosabua
Copy link
Member

mosabua commented Jul 5, 2024

Thanks @ebyhr !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cla-signed iceberg Iceberg connector stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed.
Development

Successfully merging this pull request may close these issues.

6 participants