Deal with partial BigQuery failures more elegantly #53

mtagle · 2016-10-27T17:21:19Z

BigQuery write requests can partially fail. Some rows for a request may be successfully written, and others will not be. If this is the case, the whole flush will be considered a failure we'll end up with a stack trace like this:

Caused by: java.util.concurrent.ExecutionException: com.wepay.kafka.connect.bigquery.exception.BigQueryConnectException: table insertion failed for the following rows:
    [row index 3000]: backendError: null
    [row index 3001]: backendError: null
    [row index 3002]: backendError: null

Since the whole flush is considered a failure, kafka connect will rebalance and end up re-writing all the rows to BQ. This results in duplicated rows.

This is not a huge issue (BQ views can be written to dedup duplicated rows), but, if possible, it would be nice to take advantage of the fact that some rows were successfully written and only attempt to write the unsuccessful rows.

The text was updated successfully, but these errors were encountered:

whynick1 · 2018-03-26T16:51:37Z

According to https://cloud.google.com/bigquery/docs/reference/rest/v2/tabledata/insertAll

When there is a partial BQ write failure, "a list of insertErrors" will be return, which contains the index of the failed rows(I think the "index" here means index in insert row list, rather than row number in BQ table). So, we can always filter out succeed rows in our "retry logic" to eliminate duplication.

whynick1 · 2018-03-26T16:54:13Z

Another interesting source provided by @criccomini, about how BQ internally handle deduplicate
https://cloud.google.com/blog/big-data/2017/06/life-of-a-bigquery-streaming-insert

mtagle changed the title ~~Deal with partial BigQuery Failures more elegantly~~ Deal with partial BigQuery failures more elegantly Oct 27, 2016

whynick1 mentioned this issue Mar 26, 2018

Deal with partial bigquery failure #108

Merged

mtagle closed this as completed May 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deal with partial BigQuery failures more elegantly #53

Deal with partial BigQuery failures more elegantly #53

mtagle commented Oct 27, 2016

whynick1 commented Mar 26, 2018

whynick1 commented Mar 26, 2018

Deal with partial BigQuery failures more elegantly #53

Deal with partial BigQuery failures more elegantly #53

Comments

mtagle commented Oct 27, 2016

whynick1 commented Mar 26, 2018

whynick1 commented Mar 26, 2018