Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky "Communication link failure" in Databricks tests #14391

Closed
ebyhr opened this issue Sep 30, 2022 · 7 comments
Closed

Flaky "Communication link failure" in Databricks tests #14391

ebyhr opened this issue Sep 30, 2022 · 7 comments
Assignees
Labels

Comments

@ebyhr
Copy link
Member

ebyhr commented Sep 30, 2022

tests               | 2022-09-30 10:07:20 INFO: FAILURE     /    io.trino.tests.product.deltalake.TestHiveAndDeltaLakeRedirect.testViewReferencingHiveAndDeltaTable [false] (Groups: profile_specific_tests, delta-lake-databricks, delta-lake-oss) took 22.1 seconds
tests               | 2022-09-30 10:07:20 SEVERE: Failure cause:
tests               | io.trino.tempto.query.QueryExecutionException: java.sql.SQLException: [Databricks][DatabricksJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: HTTP retry after response received with no Retry-After header, error: HTTP Response code: 503, Error message: Unknown.
tests               | 	at io.trino.tempto.query.JdbcQueryExecutor.execute(JdbcQueryExecutor.java:119)
tests               | 	at io.trino.tempto.query.JdbcQueryExecutor.executeQuery(JdbcQueryExecutor.java:84)
tests               | 	at io.trino.tests.product.utils.QueryExecutors$3.lambda$executeQuery$0(QueryExecutors.java:141)
tests               | 	at net.jodah.failsafe.Functions.lambda$get$0(Functions.java:48)
tests               | 	at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:62)
tests               | 	at net.jodah.failsafe.Execution.executeSync(Execution.java:129)
tests               | 	at net.jodah.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:376)
tests               | 	at net.jodah.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:67)
tests               | 	at io.trino.tests.product.utils.QueryExecutors$3.executeQuery(QueryExecutors.java:141)
tests               | 	at io.trino.tests.product.deltalake.TestHiveAndDeltaLakeRedirect.testViewReferencingHiveAndDeltaTable(TestHiveAndDeltaLakeRedirect.java:842)
tests               | 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
tests               | 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
tests               | 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
tests               | 	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
tests               | 	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:104)
tests               | 	at org.testng.internal.Invoker.invokeMethod(Invoker.java:645)
tests               | 	at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:851)
tests               | 	at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1177)
tests               | 	at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:129)
tests               | 	at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:112)
tests               | 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
tests               | 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
tests               | 	at java.base/java.lang.Thread.run(Thread.java:833)
tests               | Caused by: java.sql.SQLException: [Databricks][DatabricksJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: HTTP retry after response received with no Retry-After header, error: HTTP Response code: 503, Error message: Unknown.
tests               | 	at com.databricks.client.hivecommon.api.HS2Client.handleTTransportException(Unknown Source)
tests               | 	at com.databricks.client.spark.jdbc.DowloadableFetchClient.handleTTransportException(Unknown Source)
tests               | 	at com.databricks.client.hivecommon.api.HS2Client.executeStatementInternal(Unknown Source)
tests               | 	at com.databricks.client.hivecommon.api.HS2Client.executeStatement(Unknown Source)
tests               | 	at com.databricks.client.hivecommon.dataengine.HiveJDBCNativeQueryExecutor.executeRowCountQueryHelper(Unknown Source)
tests               | 	at com.databricks.client.hivecommon.dataengine.HiveJDBCNativeQueryExecutor.execute(Unknown Source)
tests               | 	at com.databricks.client.jdbc.common.SStatement.executeNoParams(Unknown Source)
tests               | 	at com.databricks.client.jdbc.common.BaseStatement.execute(Unknown Source)
tests               | 	at com.databricks.client.hivecommon.jdbc42.Hive42Statement.execute(Unknown Source)
tests               | 	at io.trino.tempto.query.JdbcQueryExecutor.executeQueryNoParams(JdbcQueryExecutor.java:128)
tests               | 	at io.trino.tempto.query.JdbcQueryExecutor.execute(JdbcQueryExecutor.java:112)
tests               | 	... 22 more
tests               | 	Suppressed: java.lang.Exception: Query: DROP TABLE IF EXISTS test_view_delta_region_table_1lfsdt7kbxy8
tests               | 		at io.trino.tempto.query.JdbcQueryExecutor.executeQueryNoParams(JdbcQueryExecutor.java:136)
tests               | 		... 23 more
tests               | Caused by: com.databricks.client.support.exceptions.ErrorException: [Databricks][DatabricksJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: HTTP retry after response received with no Retry-After header, error: HTTP Response code: 503, Error message: Unknown.
tests               | 	... 33 more
tests               | Caused by: com.databricks.client.jdbc42.internal.apache.thrift.transport.TTransportException: HTTP retry after response received with no Retry-After header, error: HTTP Response code: 503, Error message: Unknown
tests               | 	at com.databricks.client.hivecommon.HttpRetrySettings.shouldRetry(Unknown Source)
tests               | 	at com.databricks.client.hivecommon.api.HS2ClientWrapper.shouldReexecuteRequest(Unknown Source)
tests               | 	at com.databricks.client.hivecommon.api.HS2ClientWrapper.executeWithRetry(Unknown Source)
tests               | 	at com.databricks.client.hivecommon.api.HS2ClientWrapper.ExecuteStatement(Unknown Source)
tests               | 	... 31 more

Databricks JDBC driver has TemporarilyUnavailableRetry and TemporarilyUnavailableRetryTimeout properties, but it seems the server returned without Retry-After header.

@ebyhr
Copy link
Member Author

ebyhr commented Oct 3, 2022

@ebyhr
Copy link
Member Author

ebyhr commented Oct 6, 2022

@findinpath
Copy link
Contributor

@findinpath
Copy link
Contributor

@findinpath
Copy link
Contributor

A bit of additional context:

The Databricks server replies irregularly with HTTP Response code: 503 for both SELECT as well as DML statements. In case of DML statements getting this error code does not mean that the operation is not eventually successfully processed on the server - we can get 503, but on the server side the outcome of the INSERT statement we just submitted is eventually visible. This is why retrying the operation is not necessarily an option.

@findepi
Copy link
Member

findepi commented Nov 9, 2022

Let's hope #14899 fixes the problem.

@ebyhr
Copy link
Member Author

ebyhr commented Jul 31, 2023

We may want to adjust the pattern message:

tests               | 2023-07-24 15:43:41 SEVERE: Failure cause:
tests               | io.trino.tempto.query.QueryExecutionException: java.sql.SQLException: [Databricks][DatabricksJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: com.databricks.client.jdbc42.internal.apache.http.NoHttpResponseException: dbc-88a53597-967a.cloud.databricks.com:443 failed to respond.
tests               | 	at io.trino.tempto.query.JdbcQueryExecutor.execute(JdbcQueryExecutor.java:119)

https://github.com/trinodb/trino/actions/runs/5642386366/job/15283091948?pr=18381

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 participants