-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Fix failure when reading deep or shallow cloned Delta Lake tables #27098
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Fix failure when reading deep or shallow cloned Delta Lake tables #27098
Conversation
Reviewer's GuideThis PR adjusts the checkpoint metadata validation to accept version 0 on Delta Lake table clones and supplements it with comprehensive unit tests covering valid, zero, and negative version scenarios as well as JSON serialization. Class diagram for updated CheckpointMetadataEntry validationclassDiagram
class CheckpointMetadataEntry {
+long version
+Optional<Map<String, String>> tags
+CheckpointMetadataEntry(long version, Optional<Map<String, String>> tags)
}
CheckpointMetadataEntry : version >= 0 validation
CheckpointMetadataEntry : tags are copied as ImmutableMap
Class diagram for new TestCheckpointMetadataEntry unit testsclassDiagram
class TestCheckpointMetadataEntry {
+testValidVersion()
+testZeroVersion()
+testNegativeVersionThrows()
+testJsonSerialization()
}
TestCheckpointMetadataEntry --> CheckpointMetadataEntry
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `plugin/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/transactionlog/TestCheckpointMetadataEntry.java:55-64` </location>
<code_context>
+ }
+
+ @Test
+ void testInvalidCheckpointMetadataEntry()
+ {
+ @Language("JSON")
+ String jsonWithNegativeVersion = "{\"version\":-1,\"tags\":{\"sidecarNumActions\":\"1\",\"sidecarSizeInBytes\":\"20965\",\"numOfAddFiles\":\"1\",\"sidecarFileSchema\":\"\"}}";
+ assertThatThrownBy(() -> codec.fromJson(jsonWithNegativeVersion))
+ .isInstanceOf(IllegalArgumentException.class)
+ .hasMessageContaining("Invalid JSON string for");
+
+ @Language("JSON")
+ String jsonWithoutTags = "{\"version\":-1}";
+ assertThatThrownBy(() -> codec.fromJson(jsonWithoutTags))
+ .isInstanceOf(IllegalArgumentException.class)
</code_context>
<issue_to_address>
**suggestion (testing):** Missing test for valid CheckpointMetadataEntry with absent 'tags' field.
Please add a test for deserializing a valid CheckpointMetadataEntry with a non-negative version and no 'tags' field to confirm correct handling of this case.
</issue_to_address>
### Comment 2
<location> `plugin/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/transactionlog/TestCheckpointMetadataEntry.java:70-89` </location>
<code_context>
+ }
+
+ @Test
+ void testCheckpointMetadataEntryToJson()
+ {
+ assertThat(codec.toJson(new CheckpointMetadataEntry(
+ 100,
+ Optional.of(ImmutableMap.of(
+ "sidecarNumActions", "1",
+ "sidecarSizeInBytes", "20965",
+ "numOfAddFiles", "1",
+ "sidecarFileSchema", "")))))
+ .isEqualTo("{\n" +
+ " \"version\" : 100,\n" +
+ " \"tags\" : {\n" +
+ " \"sidecarNumActions\" : \"1\",\n" +
+ " \"sidecarSizeInBytes\" : \"20965\",\n" +
+ " \"numOfAddFiles\" : \"1\",\n" +
+ " \"sidecarFileSchema\" : \"\"\n" +
+ " }\n" +
+ "}");
+ }
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a test for serialization with version 0 and absent 'tags'.
Please add a test case for serializing a CheckpointMetadataEntry with version 0 and no 'tags', and verify the resulting JSON structure.
```suggestion
@Test
void testCheckpointMetadataEntryToJson()
{
assertThat(codec.toJson(new CheckpointMetadataEntry(
100,
Optional.of(ImmutableMap.of(
"sidecarNumActions", "1",
"sidecarSizeInBytes", "20965",
"numOfAddFiles", "1",
"sidecarFileSchema", "")))))
.isEqualTo("{\n" +
" \"version\" : 100,\n" +
" \"tags\" : {\n" +
" \"sidecarNumActions\" : \"1\",\n" +
" \"sidecarSizeInBytes\" : \"20965\",\n" +
" \"numOfAddFiles\" : \"1\",\n" +
" \"sidecarFileSchema\" : \"\"\n" +
" }\n" +
"}");
}
@Test
void testCheckpointMetadataEntryToJsonWithVersionZeroAndNoTags()
{
assertThat(codec.toJson(new CheckpointMetadataEntry(
0,
Optional.empty())))
.isEqualTo("{\n" +
" \"version\" : 0\n" +
"}");
}
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
...lake/src/test/java/io/trino/plugin/deltalake/transactionlog/TestCheckpointMetadataEntry.java
Show resolved
Hide resolved
| @@ -0,0 +1,90 @@ | |||
| /* | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We prefer query based tests to unit tests in this repository.
Please update existing integration tests or product tests instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, integration tests updated
https://trino.io/development/process#pull-request-and-commit-guidelines
|
14dd843 to
d1593a9
Compare
Fixed |
| throws Exception | ||
| { | ||
| testClonedTableWithCheckpointVersionZero("databricks154/clone_checkpoint_version_zero/checkpoint_v1/clone_source"); | ||
| testClonedTableWithCheckpointVersionZero("databricks154/clone_checkpoint_version_zero/checkpoint_v2/cloned_table"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo, should be databricks154/clone_checkpoint_version_zero/checkpoint_v1/cloned_table ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes thanks! fixed
| { | ||
| testClonedTableWithCheckpointVersionZero("databricks154/clone_checkpoint_version_zero/checkpoint_v1/clone_source"); | ||
| testClonedTableWithCheckpointVersionZero("databricks154/clone_checkpoint_version_zero/checkpoint_v2/cloned_table"); | ||
| testClonedTableWithCheckpointVersionZero("databricks154/clone_checkpoint_version_zero/checkpoint_v2/clone_source"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could remove source table, test cloned source is enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just test on my local, seems v1 test case not exercise the logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that make sense, as the v1 checkpoint file is in Parquet format, so it won’t go through this JSON deserialization logic. I was just testing v1 incidentally to see if there were any issues. If it’s not needed, I can remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we may just keep it, there’s no harm in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no necessary to add v1 case here, since CheckpointMetadata only allowed in v2 spec https://github.com/delta-io/delta/blob/master/PROTOCOL.md#checkpoint-metadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, updated
199baac to
3be6244
Compare
reminder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
| copyDirectoryContents(new File(Resources.getResource(resourceName).toURI()).toPath(), tableLocation); | ||
| assertUpdate("CALL system.register_table(CURRENT_SCHEMA, '%s', '%s')".formatted(tableName, tableLocation.toUri())); | ||
|
|
||
| assertThat(query("SELECT * FROM " + tableName + " ORDER BY id")).matches("VALUES " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: move .matches to new line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| testClonedTableWithCheckpointVersionZero("databricks154/clone_checkpoint_version_zero/checkpoint_v2/cloned_table"); | ||
| } | ||
|
|
||
| private void testClonedTableWithCheckpointVersionZero(String resourceName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now seems the the method is redundant, we could remove it, put code under testClonedTableWithCheckpointVersionZero()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
3be6244 to
78997c3
Compare
| Data generated using Databricks 15.4: | ||
|
|
||
| ```sql | ||
| CREATE TABLE cloned_table DEEP CLONE source_table; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also add a case for shallow clone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe no need for that? as the only difference is the flag isShallow for true or false in the transaction log.
Description
For cloned Delta Lake tables (either deep or shallow clones), the checkpoint version may start at
0.The previous validation in the
CheckpointMetadataEntryconstructor required the version to be positive,which caused the following exception:
Root cause is:
Additional context and related issues
Fixes #27097
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:
Summary by Sourcery
Allow checkpoint metadata version zero for Delta Lake clones and add comprehensive tests for CheckpointMetadataEntry.
Bug Fixes:
Enhancements:
Tests: