Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for time travel with version in Delta Lake #21052

Merged
merged 2 commits into from Apr 9, 2024

Conversation

ebyhr
Copy link
Member

@ebyhr ebyhr commented Mar 13, 2024

Description

Support version as time travel in Delta
Fixes #15894

Release notes

(x) Release notes are required, with the following suggested text:

# Delta Lake
* Add support for time travel with versions. ({issue}`issuenumber`)

@ebyhr ebyhr self-assigned this Mar 13, 2024
@cla-bot cla-bot bot added the cla-signed label Mar 13, 2024
@github-actions github-actions bot added docs tests:hive delta-lake Delta Lake connector labels Mar 13, 2024
@findinpath findinpath requested a review from pajaks March 13, 2024 19:08
Copy link
Contributor

@jkylling jkylling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exciting changes! I have some concerns about the performance of finding the last checkpoint. For object storage and large tables, single threaded listing of everything within /_delta_log can be very time consuming. I threw together a prototype for time travel using the "list-after" feature of object stores a while ago. It works by starting at the end version, listing the first batch of files after endVersion-500 and less than end version, if there is a checkpoint stop, if not, pick the next batch at endVersion-1000, and so on. The prototype is here: https://github.com/trinodb/trino/compare/master...jkylling:trino:basic-time-travel-delta?expand=1. This performs well on tables with large Delta logs, but require modifying the file system interfaces.

Copy link
Contributor

@findinpath findinpath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just skimmed.
We should focus on TransactionLogAccess to work with only the necessary amount of read operations.

@@ -191,6 +205,72 @@ public TableSnapshot loadSnapshot(ConnectorSession session, SchemaTableName tabl
return snapshot;
}

private TableSnapshot loadSnapshotForTimeTravel(TrinoFileSystem fileSystem, SchemaTableName table, String tableLocation, long endVersion)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should actually get the endVersion first, then the previous version , ..., until we find the checkpoint .

I'm thinking about something similar to io.trino.plugin.deltalake.transactionlog.checkpoint.TransactionLogTail#loadNewTail but backwards.

Doing this would get us to retrieve exactly what we need and not do any unnecessary file listings in the transaction log directory.

24.log
23.log
22.log
21.log
20.log
20.checkpoint

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggested logic works for classic checkpoint, but it might be difficult in case of multipart checkpoints and UUID-named checkpoints. We can't estimate the file name in my understanding. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ye, you are right.
It was a naive thought.
We don't know for which key to look for :(

It is worth exploring the startAfter listing idea for s3

https://github.com/delta-io/delta/blob/fe3c3c05bf2656ab45b0a4e66fc010df13f5ed92/spark/src/main/scala/org/apache/spark/sql/delta/Checkpoints.scala#L426-L434

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually nevermind about startAfter - let's not overengineer yet this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, we can use different logic based on delta.checkpointPolicy table property in the future.
https://github.com/delta-io/delta/blob/537ed8ee0be983579873851855ba3e96b20004bd/docs/source/table-properties.md?plain=1#L159-L170

@findinpath
Copy link
Contributor

Pls rebase on master to address the conflicts.

@ebyhr ebyhr force-pushed the ebi/delta-timetravel branch 2 times, most recently from d16def0 to 1f8c458 Compare March 16, 2024 00:25
Copy link
Contributor

@findinpath findinpath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR is ready to be reviewed (definitely not anymore a draft) from my perspective.

.add(new FileOperation(TRANSACTION_LOG_JSON, "00000000000000000008.json", "InputFile.exists"))
.addCopies(new FileOperation(DATA, "no partition", "InputFile.newInput"), 8)
.build());
assertFileSystemAccesses("SELECT * FROM " + tableName + " FOR VERSION AS OF 13",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here would be nice to see that there is though a list operation performed to figure out the checkpoint

@ebyhr ebyhr force-pushed the ebi/delta-timetravel branch 2 times, most recently from 4c0711a to 8a1bdd1 Compare March 22, 2024 06:17
@ebyhr ebyhr marked this pull request as ready for review March 22, 2024 09:24
@findinpath
Copy link
Contributor

Pls rebase (and squash) to resolve the conflicts.

@ebyhr ebyhr force-pushed the ebi/delta-timetravel branch 2 times, most recently from 9bc6a46 to 2ba73c9 Compare March 25, 2024 08:14
Copy link
Contributor

@krvikash krvikash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. some comments.

Copy link
Contributor

@findinpath findinpath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % nits.

The one relevant comment is about versioned table check in the reconciliation logic.

@ebyhr
Copy link
Member Author

ebyhr commented Apr 3, 2024

Addressed comments.

@ebyhr
Copy link
Member Author

ebyhr commented Apr 8, 2024

@wendigo Thanks for your review. Addressed comments.

@ebyhr ebyhr requested a review from wendigo April 9, 2024 09:35
@wendigo
Copy link
Contributor

wendigo commented Apr 9, 2024

LGTM % single nit. Please squash fixups before merging @ebyhr

Co-authored-by: Vikash Kumar <vksh1612@gmail.com>
@ebyhr ebyhr merged commit 760e55d into trinodb:master Apr 9, 2024
4 of 15 checks passed
@ebyhr ebyhr deleted the ebi/delta-timetravel branch April 9, 2024 10:08
@github-actions github-actions bot added this to the 445 milestone Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

Support time travel in Delta Lake connector
6 participants