You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
BigQuery write requests can partially fail. Some rows for a request may be successfully written, and others will not be. If this is the case, the whole flush will be considered a failure we'll end up with a stack trace like this:
Caused by: java.util.concurrent.ExecutionException: com.wepay.kafka.connect.bigquery.exception.BigQueryConnectException: table insertion failed for the following rows:
[row index 3000]: backendError: null
[row index 3001]: backendError: null
[row index 3002]: backendError: null
Since the whole flush is considered a failure, kafka connect will rebalance and end up re-writing all the rows to BQ. This results in duplicated rows.
This is not a huge issue (BQ views can be written to dedup duplicated rows), but, if possible, it would be nice to take advantage of the fact that some rows were successfully written and only attempt to write the unsuccessful rows.
The text was updated successfully, but these errors were encountered:
mtagle
changed the title
Deal with partial BigQuery Failures more elegantly
Deal with partial BigQuery failures more elegantly
Oct 27, 2016
According to https://cloud.google.com/bigquery/docs/reference/rest/v2/tabledata/insertAll
When there is a partial BQ write failure, "a list of insertErrors" will be return, which contains the index of the failed rows(I think the "index" here means index in insert row list, rather than row number in BQ table). So, we can always filter out succeed rows in our "retry logic" to eliminate duplication.
BigQuery write requests can partially fail. Some rows for a request may be successfully written, and others will not be. If this is the case, the whole flush will be considered a failure we'll end up with a stack trace like this:
Since the whole flush is considered a failure, kafka connect will rebalance and end up re-writing all the rows to BQ. This results in duplicated rows.
This is not a huge issue (BQ views can be written to dedup duplicated rows), but, if possible, it would be nice to take advantage of the fact that some rows were successfully written and only attempt to write the unsuccessful rows.
The text was updated successfully, but these errors were encountered: