Skip to content

[FEATURE REQUEST] Descriptive error message with exception when having invalid data in BULK INSERT #2610

Open
@atul-delphix

Description

@atul-delphix

Is your feature request related to a problem? If so, please give a short summary of the problem and how the feature would resolve it

We are consuming, mssql-jdbc driver version 12.4.2.jre11 with mssql-spark-connector version 3.1.

And while loading the data from CSV to target database with some invalid data (i.e length of data in csv file exceeds than column length in target database's table) then it failed with generic error message Bulk load data was expected but not sent. The batch will be terminated..

Through its is almost impossible to diagnose the RCA of failure, please refer the sample stacktrace in case of such failure:

08-08-2024 05:46:40.107 [task-result-getter-1] WARN  o.a.spark.scheduler.TaskSetManager.logWarning - Lost task 0.0 in stage 189.0 (TID 189) (af5307e43913 executor driver): com.microsoft.sqlserver.jdbc.SQLServerException: Bulk load data was expected but not sent. The batch will be terminated.
	at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:261)
	at com.microsoft.sqlserver.jdbc.TDSTokenHandler.onEOF(tdsparser.java:316)
	at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:137)
	at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:42)
	at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:31)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection$1ConnectionCommand.doExecute(SQLServerConnection.java:4533)
	at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7748)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:4410)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectionCommand(SQLServerConnection.java:4541)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.rollback(SQLServerConnection.java:4747)
	at com.microsoft.sqlserver.jdbc.spark.BulkCopyUtils$.savePartition(BulkCopyUtils.scala:68)
	at com.microsoft.sqlserver.jdbc.spark.SingleInstanceWriteStrategies$.$anonfun$write$2(BestEffortSingleInstanceStrategy.scala:43)
	at com.microsoft.sqlserver.jdbc.spark.SingleInstanceWriteStrategies$.$anonfun$write$2$adapted(BestEffortSingleInstanceStrategy.scala:42)
	at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2(RDD.scala:1009)
	at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2$adapted(RDD.scala:1009)
	at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2303)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
	at org.apache.spark.scheduler.Task.run(Task.scala:139)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:840)

Describe the preferred solution

Error should properly point against which table's and column having issue with specific data. For example),

 String or binary data would be truncated in table 'SourceDB2019.dbo.test_table_target', column 'char_col_with_2_length'. Truncated value: 'lo'

Describe alternatives you've considered

Additional context

Reference Documentations/Specifications

Reference Implementation

Metadata

Metadata

Assignees

Labels

Waiting for ResponseWaiting for a reply from the original poster, or affiliated party

Type

No type

Projects

Status

Waiting for Customer

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions