Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decode paths as URIs in delta-lake connector #15201

Merged
merged 1 commit into from
Dec 13, 2022

Conversation

jkylling
Copy link
Contributor

@jkylling jkylling commented Nov 25, 2022

Description

Fixes #15183

The sample uri delta table was obtained by copying databricks/bar and changing the name of the parquet files and modifying the log entries.

Additional context and related issues

Fixes #15183

Release notes

(x) Release notes are required, with the following suggested text:

# Delta Lake
* Fix failure when the path contains special characters. ({issue}`15183`)

@cla-bot
Copy link

cla-bot bot commented Nov 25, 2022

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

1 similar comment
@cla-bot
Copy link

cla-bot bot commented Nov 25, 2022

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

public void testPathUriDecoding()
{
String tableName = "test_uri_table_" + randomNameSuffix();
registerTableFromResources(tableName, "databricks/uri", getQueryRunner());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test doesn't prove the Delta change is correct impl of Delta spec, since it can be both our writer and our reader are wrong.

@ebyhr do you know which test class would be the best for this type of a test?
we have a few TestDeltaLakeDatabricks*Compatibility, but none seem applicable to me

Copy link
Member

@ebyhr ebyhr Nov 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find a good existing class either. Adding a new product test class looks good to me.

@jkylling Can you try adding a product test? Let us know if you need help.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a bit of test coverage of this (as witnessed by the test failures in 1ca31c1). The new test added only uses the reader, and would fail with the old implementation.

I've tried adding a product test in 7a51388

@ebyhr
Copy link
Member

ebyhr commented Dec 9, 2022

Rebased on upstream since check-commits-dispatcher has been failing.

@ebyhr
Copy link
Member

ebyhr commented Dec 9, 2022

/test-with-secrets sha=df6138fba5b3544dbeb092752c05c6cde457798f

@github-actions
Copy link

github-actions bot commented Dec 9, 2022

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/3653870944

Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good except for comments. Please squash commits into one.

@ebyhr
Copy link
Member

ebyhr commented Dec 10, 2022

The new test is failing on OSS Delta Lake.

2022-12-09T21:16:22.1791177Z tests               | 2022-12-10 03:01:22 INFO: FAILURE     /    io.trino.tests.product.deltalake.TestDeltaLakeDatabricksSelectCompatibility.testPartitionedSelectSpecialCharacters (Groups: profile_specific_tests, delta-lake-databricks, delta-lake-oss) took 9.0 seconds
2022-12-09T21:16:22.1792228Z tests               | 2022-12-10 03:01:22 SEVERE: Failure cause:
2022-12-09T21:16:22.1792737Z tests               | java.lang.AssertionError: Expected row count to be <1>, but was <0>; rows=[]
2022-12-09T21:16:22.1794078Z tests               | 	at io.trino.tests.product.deltalake.TestDeltaLakeDatabricksSelectCompatibility.testPartitionedSelectSpecialCharacters(TestDeltaLakeDatabricksSelectCompatibility.java:69)
2022-12-09T21:16:22.1795317Z tests               | 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2022-12-09T21:16:22.1796063Z tests               | 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
2022-12-09T21:16:22.1796969Z tests               | 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2022-12-09T21:16:22.1797550Z tests               | 	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
2022-12-09T21:16:22.1798212Z tests               | 	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:104)
2022-12-09T21:16:22.1798875Z tests               | 	at org.testng.internal.Invoker.invokeMethod(Invoker.java:645)
2022-12-09T21:16:22.1799471Z tests               | 	at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:851)
2022-12-09T21:16:22.1800058Z tests               | 	at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1177)
2022-12-09T21:16:22.1800617Z tests               | 	at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:129)
2022-12-09T21:16:22.1801281Z tests               | 	at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:112)
2022-12-09T21:16:22.1804805Z tests               | 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
2022-12-09T21:16:22.1805688Z tests               | 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
2022-12-09T21:16:22.1806301Z tests               | 	at java.base/java.lang.Thread.run(Thread.java:833)

@trinodb trinodb deleted a comment from cla-bot bot Dec 10, 2022
@trinodb trinodb deleted a comment from cla-bot bot Dec 10, 2022
@trinodb trinodb deleted a comment from cla-bot bot Dec 10, 2022
@jkylling
Copy link
Contributor Author

jkylling commented Dec 10, 2022

The new test is failing on OSS Delta Lake.

2022-12-09T21:16:22.1791177Z tests               | 2022-12-10 03:01:22 INFO: FAILURE     /    io.trino.tests.product.deltalake.TestDeltaLakeDatabricksSelectCompatibility.testPartitionedSelectSpecialCharacters (Groups: profile_specific_tests, delta-lake-databricks, delta-lake-oss) took 9.0 seconds
2022-12-09T21:16:22.1792228Z tests               | 2022-12-10 03:01:22 SEVERE: Failure cause:
2022-12-09T21:16:22.1792737Z tests               | java.lang.AssertionError: Expected row count to be <1>, but was <0>; rows=[]
2022-12-09T21:16:22.1794078Z tests               | 	at io.trino.tests.product.deltalake.TestDeltaLakeDatabricksSelectCompatibility.testPartitionedSelectSpecialCharacters(TestDeltaLakeDatabricksSelectCompatibility.java:69)
2022-12-09T21:16:22.1795317Z tests               | 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2022-12-09T21:16:22.1796063Z tests               | 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
2022-12-09T21:16:22.1796969Z tests               | 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2022-12-09T21:16:22.1797550Z tests               | 	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
2022-12-09T21:16:22.1798212Z tests               | 	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:104)
2022-12-09T21:16:22.1798875Z tests               | 	at org.testng.internal.Invoker.invokeMethod(Invoker.java:645)
2022-12-09T21:16:22.1799471Z tests               | 	at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:851)
2022-12-09T21:16:22.1800058Z tests               | 	at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1177)
2022-12-09T21:16:22.1800617Z tests               | 	at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:129)
2022-12-09T21:16:22.1801281Z tests               | 	at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:112)
2022-12-09T21:16:22.1804805Z tests               | 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
2022-12-09T21:16:22.1805688Z tests               | 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
2022-12-09T21:16:22.1806301Z tests               | 	at java.base/java.lang.Thread.run(Thread.java:833)

The $path syntax is not supported on Spark, and it was even escaped in ", so we got the string literal '$path'. I've changed it to use input_file_name. However, I tested this on Databricks, and it looks like input_file_name returns the URI encoded path (created delta-io/delta#1517 for this), so the test will likely continue to fail. We can probably skip comparing with the filename obtained from Spark here, and only check that Trino paths are internally compatible?

Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove .crc files.

@ebyhr
Copy link
Member

ebyhr commented Dec 12, 2022

/test-with-secrets sha=b63a9780e3174e4de3a96bdd1f5b48c76d4acc19

@github-actions
Copy link

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/3671461022

@ebyhr
Copy link
Member

ebyhr commented Dec 12, 2022

I will investigate TestDeltaLakeAdlsConnectorSmokeTest.testPathUriDecoding failure.

@ebyhr
Copy link
Member

ebyhr commented Dec 13, 2022

/test-with-secrets sha=edb11bea6c15a548e80a85dd5dedb4c176e80067
https://github.com/trinodb/trino/actions/runs/3681717402

@github-actions
Copy link

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/3681717402

@ebyhr ebyhr merged commit 58bd159 into trinodb:master Dec 13, 2022
@ebyhr ebyhr mentioned this pull request Dec 13, 2022
@github-actions github-actions bot added this to the 404 milestone Dec 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Delta-lake connector assumes paths to be URL encoded, but should be using URI encoding
3 participants