Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle empty Iceberg tables while executing procedures #13582

Merged
merged 1 commit into from Aug 11, 2022

Conversation

alexjo2144
Copy link
Member

@alexjo2144 alexjo2144 commented Aug 9, 2022

Description

If a table was just created it may not contain any snapshots. Procedures run on tables that do not contain any snapshots can safely do nothing.

Is this change a fix, improvement, new feature, refactoring, or other?

Fix

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Iceberg connector

How would you describe this change to a non-technical end user or system administrator?

Prevent query failure in the edge case where a table is empty and has no history.

Related issues, pull requests, and links

Related to: #13576

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
(x) Release notes entries required with the following suggested text:

# Iceberg
* Fix execution of `optimize`, `expire_snapshots`, and `remove_orphan_files` procedures against tables which have empty snapshot histories.

@cla-bot cla-bot bot added the cla-signed label Aug 9, 2022
@alexjo2144 alexjo2144 requested a review from homar August 9, 2022 22:10
@alexjo2144 alexjo2144 self-assigned this Aug 9, 2022
@alexjo2144 alexjo2144 added the bug Something isn't working label Aug 9, 2022
@@ -1194,6 +1194,10 @@ public void executeRemoveOrphanFiles(ConnectorSession session, IcebergTableExecu
IcebergConfig.REMOVE_ORPHAN_FILES_MIN_RETENTION,
IcebergSessionProperties.REMOVE_ORPHAN_FILES_MIN_RETENTION);

if (table.currentSnapshot() == null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove_orphan_files procedure still fails on file metastore due to lack of /data directory. Could you leave a TODO comment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would it be specific to the TESTING_FILE_METASTORE?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a stack trace?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query and stacktrace:

trino:tpch> CREATE TABLE test (c1 int);
trino:tpch> ALTER TABLE test EXECUTE remove_orphan_files(retention_threshold => '7d');

Query 20220810_221426_00026_hjz7i failed: Failed accessing data for table: tpch.test
io.trino.spi.TrinoException: Failed accessing data for table: tpch.test
	at io.trino.plugin.iceberg.IcebergMetadata.scanAndDeleteInvalidFiles(IcebergMetadata.java:1255)
	at io.trino.plugin.iceberg.IcebergMetadata.removeOrphanFiles(IcebergMetadata.java:1215)
	at io.trino.plugin.iceberg.IcebergMetadata.executeRemoveOrphanFiles(IcebergMetadata.java:1198)
	at io.trino.plugin.iceberg.IcebergMetadata.executeTableExecute(IcebergMetadata.java:1116)
	at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorMetadata.executeTableExecute(ClassLoaderSafeConnectorMetadata.java:216)
	at io.trino.metadata.MetadataManager.executeTableExecute(MetadataManager.java:345)
	at io.trino.operator.SimpleTableExecuteOperator.getOutput(SimpleTableExecuteOperator.java:128)
	at io.trino.operator.Driver.processInternal(Driver.java:410)
	at io.trino.operator.Driver.lambda$process$10(Driver.java:313)
	at io.trino.operator.Driver.tryWithLock(Driver.java:698)
	at io.trino.operator.Driver.process(Driver.java:305)
	at io.trino.operator.Driver.processForDuration(Driver.java:276)
	at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:740)
	at io.trino.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:164)
	at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:490)
	at io.trino.$gen.Trino_testversion____20220810_221353_71.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.io.FileNotFoundException: File /var/folders/8s/dkvf18z55lj_9yxhy1n54sph0000gn/T/TrinoTest1711210759979949419/iceberg_data/tpch/test-b3dc0ba83a6542229b672271f21d09eb/data does not exist
	at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:489)
	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1868)
	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1910)
	at org.apache.hadoop.fs.FileSystem$4.<init>(FileSystem.java:2072)
	at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2071)
	at org.apache.hadoop.fs.ChecksumFileSystem.listLocatedStatus(ChecksumFileSystem.java:700)
	at org.apache.hadoop.fs.FileSystem$5.<init>(FileSystem.java:2183)
	at org.apache.hadoop.fs.FileSystem.listFiles(FileSystem.java:2180)
	at io.trino.plugin.hive.fs.TrinoFileSystemCache$FileSystemWrapper.listFiles(TrinoFileSystemCache.java:386)
	at io.trino.plugin.iceberg.IcebergMetadata.scanAndDeleteInvalidFiles(IcebergMetadata.java:1239)
	... 18 more

If a table was just created it may not contain any snapshots.
Procedures run on tables that do not contain any snapshots can
safely do nothing.
@alexjo2144 alexjo2144 force-pushed the iceberg/optimize-empty-tables branch from 7594677 to 9dee7a0 Compare August 10, 2022 16:21
@alexjo2144
Copy link
Member Author

All set, thanks for the reviews

@findepi
Copy link
Member

findepi commented Aug 10, 2022

@alexjo2144 can you please suggest RN wording?

@alexjo2144
Copy link
Member Author

CI hit #13556

@alexjo2144
Copy link
Member Author

@findepi added a suggestion to the PR description

@findepi findepi merged commit bfb1c63 into trinodb:master Aug 11, 2022
@github-actions github-actions bot added this to the 393 milestone Aug 11, 2022
@findepi findepi mentioned this pull request Aug 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cla-signed
Development

Successfully merging this pull request may close these issues.

None yet

4 participants