-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Test Databricks 17.3 LTS in Delta Lake connector #27100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This comment was marked as spam.
This comment was marked as spam.
.github/workflows/ci.yml
Outdated
| DATABRICKS_143_JDBC_URL: ${{ vars.DATABRICKS_143_JDBC_URL }} | ||
| DATABRICKS_154_JDBC_URL: ${{ vars.DATABRICKS_154_JDBC_URL }} | ||
| DATABRICKS_164_JDBC_URL: ${{ vars.DATABRICKS_164_JDBC_URL }} | ||
| DATABRICKS_173_JDBC_URL: jdbc:databricks://dbc-88a53597-967a.cloud.databricks.com:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/336535122231371/1025-002500-3q7gmg8y;AuthMech=3;EnableArrow=0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Replace with ${{ vars.DATABRICKS_173_JDBC_URL }} once the var is added.
0786c46 to
d0a937a
Compare
01b788e to
3bf298e
Compare
3bf298e to
9dd627f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- The extensive duplication of version constants in @test group annotations suggests centralizing supported Databricks versions into a shared list or helper to simplify future additions or removals.
- SuiteDeltaLakeDatabricks173 currently uses the generic DELTA_LAKE_DATABRICKS group—consider switching to a version‐specific group (DELTA_LAKE_DATABRICKS_173) to clearly scope this suite.
- After renaming the exclusion group from DELTA_LAKE_EXCLUDE_164 to DELTA_LAKE_EXCLUDE_173, double-check that the 16.4 suite still excludes the intended tests and wasn’t inadvertently affected.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The extensive duplication of version constants in @Test group annotations suggests centralizing supported Databricks versions into a shared list or helper to simplify future additions or removals.
- SuiteDeltaLakeDatabricks173 currently uses the generic DELTA_LAKE_DATABRICKS group—consider switching to a version‐specific group (DELTA_LAKE_DATABRICKS_173) to clearly scope this suite.
- After renaming the exclusion group from DELTA_LAKE_EXCLUDE_164 to DELTA_LAKE_EXCLUDE_173, double-check that the 16.4 suite still excludes the intended tests and wasn’t inadvertently affected.
## Individual Comments
### Comment 1
<location> `plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/transactionlog/DeltaLakeParquetStatisticsUtils.java:255` </location>
<code_context>
{
Map<String, Optional<Object>> allStats = stats.entrySet().stream()
- .filter(entry -> entry.getValue() != null && entry.getValue().isPresent() && !entry.getValue().get().isEmpty())
+ .filter(entry -> entry.getValue() != null && entry.getValue().isPresent() && !entry.getValue().get().isEmpty() && typeForColumn.containsKey(entry.getKey()))
.collect(toImmutableMap(Map.Entry::getKey, entry -> accessor.apply(typeForColumn.get(entry.getKey()), entry.getValue().get())));
</code_context>
<issue_to_address>
**suggestion:** Consider logging or handling missing type mappings for columns.
Silently dropping columns without type mappings may hinder debugging. Logging a warning or explicitly handling these cases will improve maintainability.
</issue_to_address>
### Comment 2
<location> `testing/trino-product-tests/src/main/java/io/trino/tests/product/deltalake/TestDeltaLakeColumnMappingMode.java:57-59` </location>
<code_context>
}
- @Test(groups = {DELTA_LAKE_DATABRICKS, DELTA_LAKE_DATABRICKS_133, DELTA_LAKE_DATABRICKS_143, DELTA_LAKE_DATABRICKS_154, DELTA_LAKE_OSS, PROFILE_SPECIFIC_TESTS})
+ @Test(groups = {DELTA_LAKE_DATABRICKS, DELTA_LAKE_DATABRICKS_133, DELTA_LAKE_DATABRICKS_143, DELTA_LAKE_DATABRICKS_154, DELTA_LAKE_DATABRICKS_164, DELTA_LAKE_OSS, PROFILE_SPECIFIC_TESTS})
@Flaky(issue = DATABRICKS_COMMUNICATION_FAILURE_ISSUE, match = DATABRICKS_COMMUNICATION_FAILURE_MATCH)
public void testNonLowercaseFieldNames()
</code_context>
<issue_to_address>
**question (testing):** Question about missing coverage for Databricks 17.3 LTS in test groups.
Please verify that DELTA_LAKE_DATABRICKS_173 is included in the test groups to cover Databricks 17.3 LTS.
</issue_to_address>
### Comment 3
<location> `testing/trino-product-tests/src/main/java/io/trino/tests/product/deltalake/TestDeltaLakeColumnMappingMode.java:244-245` </location>
<code_context>
+ // When setting the table property `delta.columnMapping.mode` on Databricks >= 16.x, it will enable the `delta.feature.generatedColumns`
// feature, which is not supported by Trino.
- @Test(groups = {DELTA_LAKE_DATABRICKS, DELTA_LAKE_EXCLUDE_164, DELTA_LAKE_OSS, PROFILE_SPECIFIC_TESTS}, dataProvider = "columnMappingDataProvider")
+ @Test(groups = {DELTA_LAKE_DATABRICKS, DELTA_LAKE_EXCLUDE_173, DELTA_LAKE_OSS, PROFILE_SPECIFIC_TESTS}, dataProvider = "columnMappingDataProvider")
@Flaky(issue = DATABRICKS_COMMUNICATION_FAILURE_ISSUE, match = DATABRICKS_COMMUNICATION_FAILURE_MATCH)
public void testDeltaColumnMappingMode(String mode)
{
</code_context>
<issue_to_address>
**suggestion:** Suggestion to clarify exclusion rationale for DELTA_LAKE_EXCLUDE_173.
Please add a brief comment describing the reason for excluding DELTA_LAKE_EXCLUDE_173, such as any specific incompatibility or feature issue in 17.3.
```suggestion
// When setting the table property `delta.columnMapping.mode` on Databricks >= 16.x, it will enable the `delta.feature.generatedColumns`
// feature, which is not supported by Trino.
// DELTA_LAKE_EXCLUDE_173 is excluded due to incompatibility with the `delta.feature.generatedColumns` feature in Databricks 17.3,
// which is not supported by Trino.
```
</issue_to_address>
### Comment 4
<location> `testing/trino-product-tests/src/main/java/io/trino/tests/product/deltalake/TestDeltaLakeInsertCompatibility.java:48-50` </location>
<code_context>
}
- @Test(groups = {DELTA_LAKE_DATABRICKS, DELTA_LAKE_DATABRICKS_133, DELTA_LAKE_DATABRICKS_143, DELTA_LAKE_DATABRICKS_154, DELTA_LAKE_OSS, PROFILE_SPECIFIC_TESTS})
+ @Test(groups = {DELTA_LAKE_DATABRICKS, DELTA_LAKE_DATABRICKS_133, DELTA_LAKE_DATABRICKS_143, DELTA_LAKE_DATABRICKS_154, DELTA_LAKE_DATABRICKS_164, DELTA_LAKE_OSS, PROFILE_SPECIFIC_TESTS})
@Flaky(issue = DATABRICKS_COMMUNICATION_FAILURE_ISSUE, match = DATABRICKS_COMMUNICATION_FAILURE_MATCH)
public void testNonLowercaseFieldNames()
</code_context>
<issue_to_address>
**suggestion (testing):** Missing test group for Databricks 17.3 LTS in insert compatibility tests.
Add DELTA_LAKE_DATABRICKS_173 to the test groups to cover Databricks 17.3 LTS.
Suggested implementation:
```java
import static io.trino.tests.product.TestGroups.DELTA_LAKE_DATABRICKS_143;
import static io.trino.tests.product.TestGroups.DELTA_LAKE_DATABRICKS_154;
import static io.trino.tests.product.TestGroups.DELTA_LAKE_DATABRICKS_164;
import static io.trino.tests.product.TestGroups.DELTA_LAKE_DATABRICKS_173;
import static io.trino.tests.product.TestGroups.DELTA_LAKE_OSS;
import static io.trino.tests.product.TestGroups.PROFILE_SPECIFIC_TESTS;
```
```java
@Test(groups = {DELTA_LAKE_DATABRICKS, DELTA_LAKE_DATABRICKS_133, DELTA_LAKE_DATABRICKS_143, DELTA_LAKE_DATABRICKS_154, DELTA_LAKE_DATABRICKS_164, DELTA_LAKE_DATABRICKS_173, DELTA_LAKE_OSS, PROFILE_SPECIFIC_TESTS})
```
```java
@Test(groups = {DELTA_LAKE_DATABRICKS, DELTA_LAKE_DATABRICKS_133, DELTA_LAKE_DATABRICKS_143, DELTA_LAKE_DATABRICKS_154, DELTA_LAKE_DATABRICKS_164, DELTA_LAKE_DATABRICKS_173, PROFILE_SPECIFIC_TESTS})
```
</issue_to_address>
### Comment 5
<location> `testing/trino-product-tests/src/main/java/io/trino/tests/product/deltalake/TestDeltaLakeInsertCompatibility.java:84` </location>
<code_context>
}
- @Test(groups = {DELTA_LAKE_DATABRICKS, DELTA_LAKE_DATABRICKS_133, DELTA_LAKE_DATABRICKS_143, DELTA_LAKE_DATABRICKS_154, PROFILE_SPECIFIC_TESTS})
+ @Test(groups = {DELTA_LAKE_DATABRICKS, DELTA_LAKE_DATABRICKS_133, DELTA_LAKE_DATABRICKS_143, DELTA_LAKE_DATABRICKS_154, DELTA_LAKE_DATABRICKS_164, PROFILE_SPECIFIC_TESTS})
@Flaky(issue = DATABRICKS_COMMUNICATION_FAILURE_ISSUE, match = DATABRICKS_COMMUNICATION_FAILURE_MATCH)
public void testDatabricksUsesCheckpointInterval()
</code_context>
<issue_to_address>
**suggestion (testing):** Missing test group for Databricks 17.3 LTS in partitioned insert compatibility tests.
Add DELTA_LAKE_DATABRICKS_173 to the test groups to cover Databricks 17.3 LTS.
```suggestion
@Test(groups = {DELTA_LAKE_DATABRICKS, DELTA_LAKE_DATABRICKS_133, DELTA_LAKE_DATABRICKS_143, DELTA_LAKE_DATABRICKS_154, DELTA_LAKE_DATABRICKS_164, DELTA_LAKE_DATABRICKS_173, PROFILE_SPECIFIC_TESTS})
```
</issue_to_address>
### Comment 6
<location> `testing/trino-product-tests/src/main/java/io/trino/tests/product/deltalake/TestDeltaLakeCaseInsensitiveMapping.java:77-79` </location>
<code_context>
}
- @Test(groups = {DELTA_LAKE_DATABRICKS, DELTA_LAKE_DATABRICKS_133, DELTA_LAKE_DATABRICKS_143, DELTA_LAKE_DATABRICKS_154, DELTA_LAKE_OSS, PROFILE_SPECIFIC_TESTS})
+ @Test(groups = {DELTA_LAKE_DATABRICKS, DELTA_LAKE_DATABRICKS_133, DELTA_LAKE_DATABRICKS_143, DELTA_LAKE_DATABRICKS_154, DELTA_LAKE_DATABRICKS_164, DELTA_LAKE_OSS, PROFILE_SPECIFIC_TESTS})
@Flaky(issue = DATABRICKS_COMMUNICATION_FAILURE_ISSUE, match = DATABRICKS_COMMUNICATION_FAILURE_MATCH)
public void testNonLowercaseFieldNames()
{
</code_context>
<issue_to_address>
**suggestion (testing):** Missing test group for Databricks 17.3 LTS in case-insensitive mapping tests.
Add DELTA_LAKE_DATABRICKS_173 to the test groups to cover Databricks 17.3 LTS.
Suggested implementation:
```java
import static io.trino.tests.product.TestGroups.DELTA_LAKE_DATABRICKS_143;
import static io.trino.tests.product.TestGroups.DELTA_LAKE_DATABRICKS_154;
import static io.trino.tests.product.TestGroups.DELTA_LAKE_DATABRICKS_164;
import static io.trino.tests.product.TestGroups.DELTA_LAKE_DATABRICKS_173;
import static io.trino.tests.product.TestGroups.DELTA_LAKE_OSS;
import static io.trino.tests.product.TestGroups.PROFILE_SPECIFIC_TESTS;
```
```java
@Test(groups = {DELTA_LAKE_DATABRICKS, DELTA_LAKE_DATABRICKS_133, DELTA_LAKE_DATABRICKS_143, DELTA_LAKE_DATABRICKS_154, DELTA_LAKE_DATABRICKS_164, DELTA_LAKE_DATABRICKS_173, DELTA_LAKE_OSS, PROFILE_SPECIFIC_TESTS})
```
</issue_to_address>
### Comment 7
<location> `testing/trino-product-tests/src/main/java/io/trino/tests/product/deltalake/TestDeltaLakeChangeDataFeedCompatibility.java:543-545` </location>
<code_context>
}
- @Test(groups = {DELTA_LAKE_DATABRICKS, DELTA_LAKE_DATABRICKS_133, DELTA_LAKE_DATABRICKS_143, DELTA_LAKE_DATABRICKS_154, DELTA_LAKE_OSS, PROFILE_SPECIFIC_TESTS})
+ @Test(groups = {DELTA_LAKE_DATABRICKS, DELTA_LAKE_DATABRICKS_133, DELTA_LAKE_DATABRICKS_143, DELTA_LAKE_DATABRICKS_154, DELTA_LAKE_DATABRICKS_164, DELTA_LAKE_OSS, PROFILE_SPECIFIC_TESTS})
@Flaky(issue = DATABRICKS_COMMUNICATION_FAILURE_ISSUE, match = DATABRICKS_COMMUNICATION_FAILURE_MATCH)
public void testNonLowercaseFieldNames()
</code_context>
<issue_to_address>
**suggestion (testing):** Missing test group for Databricks 17.3 LTS in change data feed compatibility tests.
Add DELTA_LAKE_DATABRICKS_173 to the test groups to cover Databricks 17.3 LTS compatibility.
</issue_to_address>
### Comment 8
<location> `testing/trino-product-tests/src/main/java/io/trino/tests/product/deltalake/TestDeltaLakeCheckpointsCompatibility.java:280-282` </location>
<code_context>
}
- @Test(groups = {DELTA_LAKE_DATABRICKS, DELTA_LAKE_DATABRICKS_133, DELTA_LAKE_DATABRICKS_143, DELTA_LAKE_DATABRICKS_154, PROFILE_SPECIFIC_TESTS})
+ @Test(groups = {DELTA_LAKE_DATABRICKS, DELTA_LAKE_DATABRICKS_133, DELTA_LAKE_DATABRICKS_143, DELTA_LAKE_DATABRICKS_154, DELTA_LAKE_DATABRICKS_164, PROFILE_SPECIFIC_TESTS})
@Flaky(issue = DATABRICKS_COMMUNICATION_FAILURE_ISSUE, match = DATABRICKS_COMMUNICATION_FAILURE_MATCH)
public void testDatabricksUsesCheckpointInterval()
{
</code_context>
<issue_to_address>
**suggestion (testing):** Missing test group for Databricks 17.3 LTS in checkpoint compatibility tests.
Add DELTA_LAKE_DATABRICKS_173 to the test groups to cover Databricks 17.3 LTS checkpoint compatibility.
Suggested implementation:
```java
import static io.trino.tests.product.TestGroups.DELTA_LAKE_DATABRICKS_143;
import static io.trino.tests.product.TestGroups.DELTA_LAKE_DATABRICKS_154;
import static io.trino.tests.product.TestGroups.DELTA_LAKE_DATABRICKS_164;
import static io.trino.tests.product.TestGroups.DELTA_LAKE_DATABRICKS_173;
import static io.trino.tests.product.TestGroups.DELTA_LAKE_OSS;
import static io.trino.tests.product.TestGroups.PROFILE_SPECIFIC_TESTS;
```
```java
@Test(groups = {DELTA_LAKE_DATABRICKS, DELTA_LAKE_DATABRICKS_133, DELTA_LAKE_DATABRICKS_143, DELTA_LAKE_DATABRICKS_154, DELTA_LAKE_DATABRICKS_164, DELTA_LAKE_DATABRICKS_173, PROFILE_SPECIFIC_TESTS})
```
</issue_to_address>
### Comment 9
<location> `testing/trino-product-tests/src/main/java/io/trino/tests/product/deltalake/TestDeltaLakeCreateTableAsSelectCompatibility.java:46-48` </location>
<code_context>
}
- @Test(groups = {DELTA_LAKE_DATABRICKS, DELTA_LAKE_DATABRICKS_133, DELTA_LAKE_DATABRICKS_143, DELTA_LAKE_DATABRICKS_154, PROFILE_SPECIFIC_TESTS})
+ @Test(groups = {DELTA_LAKE_DATABRICKS, DELTA_LAKE_DATABRICKS_133, DELTA_LAKE_DATABRICKS_143, DELTA_LAKE_DATABRICKS_154, DELTA_LAKE_DATABRICKS_164, PROFILE_SPECIFIC_TESTS})
@Flaky(issue = DATABRICKS_COMMUNICATION_FAILURE_ISSUE, match = DATABRICKS_COMMUNICATION_FAILURE_MATCH)
public void testDatabricksUsesCheckpointInterval()
</code_context>
<issue_to_address>
**suggestion (testing):** Missing test group for Databricks 17.3 LTS in CTAS compatibility tests.
Add DELTA_LAKE_DATABRICKS_173 to the test groups to cover Databricks 17.3 LTS in CTAS compatibility tests.
</issue_to_address>
### Comment 10
<location> `testing/trino-product-tests/src/main/java/io/trino/tests/product/deltalake/TestDeltaLakeSelectCompatibility.java:44-46` </location>
<code_context>
}
- @Test(groups = {DELTA_LAKE_DATABRICKS, DELTA_LAKE_DATABRICKS_133, DELTA_LAKE_DATABRICKS_143, DELTA_LAKE_DATABRICKS_154, DELTA_LAKE_OSS, PROFILE_SPECIFIC_TESTS})
+ @Test(groups = {DELTA_LAKE_DATABRICKS, DELTA_LAKE_DATABRICKS_133, DELTA_LAKE_DATABRICKS_143, DELTA_LAKE_DATABRICKS_154, DELTA_LAKE_DATABRICKS_164, DELTA_LAKE_OSS, PROFILE_SPECIFIC_TESTS})
@Flaky(issue = DATABRICKS_COMMUNICATION_FAILURE_ISSUE, match = DATABRICKS_COMMUNICATION_FAILURE_MATCH)
public void testNonLowercaseFieldNames()
</code_context>
<issue_to_address>
**suggestion (testing):** Missing test group for Databricks 17.3 LTS in select compatibility tests.
Add DELTA_LAKE_DATABRICKS_173 to the test groups to cover Databricks 17.3 LTS compatibility.
Suggested implementation:
```java
@Test(groups = {DELTA_LAKE_DATABRICKS, DELTA_LAKE_DATABRICKS_133, DELTA_LAKE_DATABRICKS_143, DELTA_LAKE_DATABRICKS_154, DELTA_LAKE_DATABRICKS_164, DELTA_LAKE_DATABRICKS_173, DELTA_LAKE_OSS, PROFILE_SPECIFIC_TESTS})
```
If there are other select compatibility test methods in this file (e.g., `testNonLowercaseFieldNames` or similar), you should also add `DELTA_LAKE_DATABRICKS_173` to their `@Test(groups = {...})` annotations for full coverage.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
...uct-tests/src/main/java/io/trino/tests/product/deltalake/TestDeltaLakeColumnMappingMode.java
Show resolved
Hide resolved
...uct-tests/src/main/java/io/trino/tests/product/deltalake/TestDeltaLakeColumnMappingMode.java
Show resolved
Hide resolved
| { | ||
| Map<String, Optional<Object>> allStats = stats.entrySet().stream() | ||
| .filter(entry -> entry.getValue() != null && entry.getValue().isPresent() && !entry.getValue().get().isEmpty()) | ||
| .filter(entry -> entry.getValue() != null && entry.getValue().isPresent() && !entry.getValue().get().isEmpty() && typeForColumn.containsKey(entry.getKey())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What extra stats deltas are written, are they for complex types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it was int column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CREATE TABLE default.test_173 (lower int, UPPER int, downpart int) USING DELTA PARTITIONED BY (downpart) LOCATION '...';
INSERT INTO default.test_173 VALUES(1, 1, 0), (2, 2, 0), (3, 3, 1);parquet-tools cat downpart=0/part-00000-31146b9a-9daf-4e82-b055-4dce0911af57.c000.snappy.parquet
[{"Downpart":0,"Lower":1,"UPPER":1},{"Downpart":0,"Lower":2,"UPPER":2}]
parquet-tools cat downpart=1/part-00000-7e668519-eb3a-4322-a1e4-8f6ba7b54a7b.c000.snappy.parquet
[{"Downpart":1,"Lower":3,"UPPER":3}]cat 00000000000000000001.json | jq 'add.stats'
null
"{\"numRecords\":2,\"minValues\":{\"lower\":1,\"UPPER\":1},\"maxValues\":{\"lower\":2,\"UPPER\":2},\"nullCount\":{\"lower\":0,\"UPPER\":0},\"tightBounds\":true}"
"{\"numRecords\":1,\"minValues\":{\"lower\":3,\"UPPER\":3},\"maxValues\":{\"lower\":3,\"UPPER\":3},\"nullCount\":{\"lower\":0,\"UPPER\":0},\"tightBounds\":true}"The add entry doesn't contain the part stats, so just skipping should be fine.
Description
Test Databricks 17.3 LTS in Delta Lake connector
Release notes
Summary by Sourcery
Introduce testing for Databricks 17.3 LTS in the Delta Lake connector by adding a new test suite, environment configuration, and CI support.
New Features:
Enhancements:
Build:
Documentation:
Summary by Sourcery
Add support for testing Delta Lake connector against Databricks Runtime 17.3 LTS by introducing a new test suite and environment, updating CI and test configurations, enhancing test group definitions, extending product tests, updating documentation, and fixing a Parquet statistics filtering issue to prevent write failures.
New Features:
Bug Fixes:
Enhancements:
Build:
Documentation: