Open
Description
Is your feature request related to a problem? If so, please give a short summary of the problem and how the feature would resolve it
We are consuming, mssql-jdbc driver version 12.4.2.jre11
with mssql-spark-connector version 3.1
.
And while loading the data from CSV to target database with some invalid data (i.e length of data in csv file exceeds than column length in target database's table) then it failed with generic error message Bulk load data was expected but not sent. The batch will be terminated.
.
Through its is almost impossible to diagnose the RCA of failure, please refer the sample stacktrace in case of such failure:
08-08-2024 05:46:40.107 [task-result-getter-1] WARN o.a.spark.scheduler.TaskSetManager.logWarning - Lost task 0.0 in stage 189.0 (TID 189) (af5307e43913 executor driver): com.microsoft.sqlserver.jdbc.SQLServerException: Bulk load data was expected but not sent. The batch will be terminated.
at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:261)
at com.microsoft.sqlserver.jdbc.TDSTokenHandler.onEOF(tdsparser.java:316)
at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:137)
at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:42)
at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:31)
at com.microsoft.sqlserver.jdbc.SQLServerConnection$1ConnectionCommand.doExecute(SQLServerConnection.java:4533)
at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7748)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:4410)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectionCommand(SQLServerConnection.java:4541)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.rollback(SQLServerConnection.java:4747)
at com.microsoft.sqlserver.jdbc.spark.BulkCopyUtils$.savePartition(BulkCopyUtils.scala:68)
at com.microsoft.sqlserver.jdbc.spark.SingleInstanceWriteStrategies$.$anonfun$write$2(BestEffortSingleInstanceStrategy.scala:43)
at com.microsoft.sqlserver.jdbc.spark.SingleInstanceWriteStrategies$.$anonfun$write$2$adapted(BestEffortSingleInstanceStrategy.scala:42)
at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2(RDD.scala:1009)
at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2$adapted(RDD.scala:1009)
at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2303)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:139)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
Describe the preferred solution
Error should properly point against which table's and column having issue with specific data. For example),
String or binary data would be truncated in table 'SourceDB2019.dbo.test_table_target', column 'char_col_with_2_length'. Truncated value: 'lo'
Describe alternatives you've considered
Additional context
Reference Documentations/Specifications
Reference Implementation
Metadata
Metadata
Assignees
Type
Projects
Status
Waiting for Customer