Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for partition evolution in Iceberg. #7580

Closed
bitsondatadev opened this issue Apr 13, 2021 · 6 comments · Fixed by #12259
Closed

Add support for partition evolution in Iceberg. #7580

bitsondatadev opened this issue Apr 13, 2021 · 6 comments · Fixed by #12259
Assignees
Milestone

Comments

@bitsondatadev
Copy link
Member

Now that we've updated to 0.11.0 for Iceberg, we should consider adding support for partition evolution. I spoke with @phd3 and he said there may be some support for reading partition evolved tables but currently we don't support the DDL for updating the table properties after creation.

Docs: https://iceberg.apache.org/evolution/#partition-evolution

@bitsondatadev
Copy link
Member Author

bitsondatadev commented Apr 13, 2021

One thing we'll need to discuss is adding syntax for ALTER TABLE.

Looking at the postgresql syntax for CREATE TABLE and ALTER TABLE, we would likely want to add SET ( storage_parameter [= value] [, ... ] ) and RESET ( storage_parameter [, ... ] ).

For this table creation

CREATE TABLE distributors (
    did     integer,
    name    varchar(40),
    UNIQUE(name) WITH (fillfactor=70)
)
WITH (
 location='s3a://a/b/c/d',
 partitioning = ARRAY['identity(name)']
);

We could update with

ALTER TABLE distributors SET ( partitioning = ARRAY['day(name)', 'bucket(did, 64)']);
ALTER TABLE distributors RESET (location);

@lxynov lxynov mentioned this issue Apr 14, 2021
93 tasks
@tuppimax
Copy link

@bitsondatadev @phd3 when you say "there may be some support for reading partition evolved tables" are there any known issues with reading partition evolved tables ? I would be happy help out.

We use Trino only reads while all the writes happen through Spark. Trying to understand the gaps which affect our use case.

Thanks in advance

@phd3
Copy link
Member

phd3 commented May 22, 2021

@tuppimax Currently, there isn’t test coverage around tables with partition spec evolution, so adding them would give us confidence on the read support. However, Trino uses iceberg library itself to generate splits based on provided filters, so it should already be partition spec aware. In addition, partition pruning optimization and delete operation requires that the filters match entire partitions. This code is also aware of partition spec evolution currently. But need to have tests to make sure everything works.

@phd3
Copy link
Member

phd3 commented May 22, 2021

The tests would need to use Spark to evolve the spec (See io.trino.tests.iceberg.TestSparkCompatibility), since Trino doesn’t have support right now. IIRC the spec evolution syntax requires iceberg-spark3-runtime version 0.11.0, if so we need another release of trinodb/docker-images to have that.

@Kyo91
Copy link

Kyo91 commented Jun 15, 2021

@tuppimax See #8284 for an example of an issue with reading partition evolved tables.

@alexjo2144
Copy link
Member

Sorry, didn't see this older thread so I made a duplicate #12174 but updating the partitioning on a table using ALTER TABLE ... SET PROPERTIES is in this PR: #12259

@findepi findepi added this to the 382 milestone May 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

6 participants