New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for time travel with version in Delta Lake #21052
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exciting changes! I have some concerns about the performance of finding the last checkpoint. For object storage and large tables, single threaded listing of everything within /_delta_log
can be very time consuming. I threw together a prototype for time travel using the "list-after" feature of object stores a while ago. It works by starting at the end version, listing the first batch of files after endVersion-500 and less than end version, if there is a checkpoint stop, if not, pick the next batch at endVersion-1000, and so on. The prototype is here: https://github.com/trinodb/trino/compare/master...jkylling:trino:basic-time-travel-delta?expand=1. This performs well on tables with large Delta logs, but require modifying the file system interfaces.
...-delta-lake/src/main/java/io/trino/plugin/deltalake/transactionlog/TransactionLogAccess.java
Show resolved
Hide resolved
...-delta-lake/src/main/java/io/trino/plugin/deltalake/transactionlog/TransactionLogAccess.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just skimmed.
We should focus on TransactionLogAccess
to work with only the necessary amount of read operations.
plugin/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeBasic.java
Show resolved
Hide resolved
plugin/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeConnectorTest.java
Show resolved
Hide resolved
...sts/src/main/java/io/trino/tests/product/deltalake/TestDeltaLakeTimeTravelCompatibility.java
Show resolved
Hide resolved
...in/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeFileOperations.java
Outdated
Show resolved
Hide resolved
@@ -191,6 +205,72 @@ public TableSnapshot loadSnapshot(ConnectorSession session, SchemaTableName tabl | |||
return snapshot; | |||
} | |||
|
|||
private TableSnapshot loadSnapshotForTimeTravel(TrinoFileSystem fileSystem, SchemaTableName table, String tableLocation, long endVersion) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should actually get the endVersion first, then the previous version , ..., until we find the checkpoint .
I'm thinking about something similar to io.trino.plugin.deltalake.transactionlog.checkpoint.TransactionLogTail#loadNewTail
but backwards.
Doing this would get us to retrieve exactly what we need and not do any unnecessary file listings in the transaction log directory.
24.log
23.log
22.log
21.log
20.log
20.checkpoint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suggested logic works for classic checkpoint, but it might be difficult in case of multipart checkpoints and UUID-named checkpoints. We can't estimate the file name in my understanding. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ye, you are right.
It was a naive thought.
We don't know for which key to look for :(
It is worth exploring the startAfter
listing idea for s3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually nevermind about startAfter - let's not overengineer yet this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, we can use different logic based on delta.checkpointPolicy
table property in the future.
https://github.com/delta-io/delta/blob/537ed8ee0be983579873851855ba3e96b20004bd/docs/source/table-properties.md?plain=1#L159-L170
b345429
to
99b46c1
Compare
Pls rebase on |
d16def0
to
1f8c458
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR is ready to be reviewed (definitely not anymore a draft) from my perspective.
...-delta-lake/src/main/java/io/trino/plugin/deltalake/transactionlog/TransactionLogAccess.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakePropertiesTable.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeConnectorTest.java
Show resolved
Hide resolved
.add(new FileOperation(TRANSACTION_LOG_JSON, "00000000000000000008.json", "InputFile.exists")) | ||
.addCopies(new FileOperation(DATA, "no partition", "InputFile.newInput"), 8) | ||
.build()); | ||
assertFileSystemAccesses("SELECT * FROM " + tableName + " FOR VERSION AS OF 13", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here would be nice to see that there is though a list
operation performed to figure out the checkpoint
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Show resolved
Hide resolved
...ta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeAlluxioCacheFileOperations.java
Show resolved
Hide resolved
plugin/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeConnectorTest.java
Show resolved
Hide resolved
plugin/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeConnectorTest.java
Show resolved
Hide resolved
plugin/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeConnectorTest.java
Show resolved
Hide resolved
4c0711a
to
8a1bdd1
Compare
plugin/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeConnectorTest.java
Show resolved
Hide resolved
plugin/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeConnectorTest.java
Show resolved
Hide resolved
plugin/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeConnectorTest.java
Show resolved
Hide resolved
plugin/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeConnectorTest.java
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
Pls rebase (and squash) to resolve the conflicts. |
9bc6a46
to
2ba73c9
Compare
plugin/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeConnectorTest.java
Show resolved
Hide resolved
...-delta-lake/src/main/java/io/trino/plugin/deltalake/transactionlog/TransactionLogAccess.java
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
2ba73c9
to
7238857
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. some comments.
plugin/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeConnectorTest.java
Show resolved
Hide resolved
plugin/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeConnectorTest.java
Outdated
Show resolved
Hide resolved
...in/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeFileOperations.java
Outdated
Show resolved
Hide resolved
...in/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeFileOperations.java
Show resolved
Hide resolved
...ta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeAlluxioCacheFileOperations.java
Show resolved
Hide resolved
...ta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeAlluxioCacheFileOperations.java
Show resolved
Hide resolved
72f2705
to
a4bbba5
Compare
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
...-delta-lake/src/main/java/io/trino/plugin/deltalake/transactionlog/TransactionLogAccess.java
Outdated
Show resolved
Hide resolved
...-delta-lake/src/main/java/io/trino/plugin/deltalake/transactionlog/TransactionLogAccess.java
Show resolved
Hide resolved
...-delta-lake/src/main/java/io/trino/plugin/deltalake/transactionlog/TransactionLogAccess.java
Outdated
Show resolved
Hide resolved
...-delta-lake/src/main/java/io/trino/plugin/deltalake/transactionlog/TransactionLogAccess.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % nits.
The one relevant comment is about versioned table check in the reconciliation logic.
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Show resolved
Hide resolved
...-delta-lake/src/main/java/io/trino/plugin/deltalake/transactionlog/TransactionLogAccess.java
Show resolved
Hide resolved
plugin/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeConnectorTest.java
Outdated
Show resolved
Hide resolved
a4bbba5
to
dfa774e
Compare
Addressed comments. |
...-delta-lake/src/main/java/io/trino/plugin/deltalake/transactionlog/TransactionLogAccess.java
Show resolved
Hide resolved
...-delta-lake/src/main/java/io/trino/plugin/deltalake/transactionlog/TransactionLogAccess.java
Outdated
Show resolved
Hide resolved
...-delta-lake/src/main/java/io/trino/plugin/deltalake/transactionlog/TransactionLogAccess.java
Show resolved
Hide resolved
plugin/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/TestTransactionLogAccess.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
dfa774e
to
7c1f565
Compare
...lta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeLocalConcurrentWritesTest.java
Outdated
Show resolved
Hide resolved
7c1f565
to
4928540
Compare
...lta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeLocalConcurrentWritesTest.java
Outdated
Show resolved
Hide resolved
4928540
to
5a06a1b
Compare
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeTableHandle.java
Show resolved
Hide resolved
...delta-lake/src/main/java/io/trino/plugin/deltalake/procedure/DropExtendedStatsProcedure.java
Show resolved
Hide resolved
...-delta-lake/src/main/java/io/trino/plugin/deltalake/transactionlog/TransactionLogAccess.java
Outdated
Show resolved
Hide resolved
...-delta-lake/src/main/java/io/trino/plugin/deltalake/transactionlog/TransactionLogAccess.java
Show resolved
Hide resolved
5a06a1b
to
95bfba7
Compare
@wendigo Thanks for your review. Addressed comments. |
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Show resolved
Hide resolved
LGTM % single nit. Please squash fixups before merging @ebyhr |
Co-authored-by: Vikash Kumar <vksh1612@gmail.com>
95bfba7
to
837f174
Compare
Description
Support version as time travel in Delta
Fixes #15894
Release notes
(x) Release notes are required, with the following suggested text: