Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store NDV stats in Iceberg Puffin statistics file #15400

Merged
merged 4 commits into from
Dec 16, 2022

Conversation

findepi
Copy link
Member

@findepi findepi commented Dec 14, 2022

Iceberg 1.1.0 brought a standard way of storing and sharing statistics
information. This commits makes use of it, deprecating statistics stored
in Trino-specific table properties.

@yittg
Copy link
Contributor

yittg commented Dec 15, 2022

Hi @findepi , i have a question: is there a following change for computing incremental ndv sketch, and merging with previous existing sketch? thanks.

@findepi
Copy link
Member Author

findepi commented Dec 15, 2022

yes, @yittg , we will make a PR with incremental sketch updates soonish

Iceberg 1.1.0 brought a standard way of storing and sharing statistics
information. This commits makes use of it, deprecating statistics stored
in Trino-specific table properties.
@findepi findepi force-pushed the findepi/iceberg-analyze-puffin-v3 branch from ea68ce7 to 5ab0204 Compare December 16, 2022 11:54
@findepi
Copy link
Member Author

findepi commented Dec 16, 2022

Thanks for review!

AC

@findepi findepi merged commit 2f0b1d2 into trinodb:master Dec 16, 2022
@findepi findepi deleted the findepi/iceberg-analyze-puffin-v3 branch December 16, 2022 22:59
@findepi findepi added the no-release-notes This pull request does not require release notes entry label Dec 16, 2022
@github-actions github-actions bot added this to the 404 milestone Dec 16, 2022

// TODO (https://github.com/trinodb/trino/issues/15397): remove support for Trino-specific statistics properties
// Drop all stats. Empty table needs none
UpdateProperties updateProperties = transaction.updateProperties();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we have two blocks to remove the old properties if they exist. Would be better to have one at the top before the empty snapshotId check. But if we're going to remove this soon anyway, probably doesn't matter

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if we're going to remove this soon anyway, probably doesn't matter

exactly

{
if (!isExtendedStatisticsEnabled(session)) {
return ImmutableMap.of();
}

ImmutableMap.Builder<Integer, Long> ndvByColumnId = ImmutableMap.builder();
icebergTable.properties().forEach((key, value) -> {
Set<Integer> remainingColumnIds = new HashSet<>(columnIds);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're reading a column that's a projected dereference, will it show up here? Might need to filter those out if they don't have NDV stats, otherwise they'll always stick around in the remaining set.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're reading a column that's a projected dereference, will it show up here?

i think this looks only for top level primitive fields -- at least that was the intention.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, note that:

ImmutableSet.of(), // projectedColumns don't affect stats

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looking a few layers farther these are definitely only base columns. I don't think they're necessarily primitives though, could include struct/map/array types. Do we collect NDVs for those during analyze?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed docs no-release-notes This pull request does not require release notes entry
Development

Successfully merging this pull request may close these issues.

None yet

6 participants