You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
KCBQ currently waits until flush() is called before it starts sending messages to BigQuery. This happens every 30s, by default. The flush() method, then sends the messages to BigQuery in a series of threads, and waits for all of them to respond back. This sometimes takes 30s (or longer) when there are a large number of messages in the buffer.
I believe that adding logic in the put() method to more aggressively begin sending messages to BigQuery BEFORE flush() has been called could significantly speed up writes, since we won't be sitting idle for 30s before doing the write. If we have a 30s flush interval, and it takes 30s to write messages to BigQuery, flushing during put could increase performance by as much as 2x, since the two 30s intervals would overlap.
flush() should just flush any outstanding data in the buffer, and then sync on all futures (including those that had been invoked during put() methods).
Some thought should be put into how this will impact the adaptive batch sizes.
The text was updated successfully, but these errors were encountered:
KCBQ currently waits until
flush()
is called before it starts sending messages to BigQuery. This happens every 30s, by default. Theflush()
method, then sends the messages to BigQuery in a series of threads, and waits for all of them to respond back. This sometimes takes 30s (or longer) when there are a large number of messages in the buffer.I believe that adding logic in the put() method to more aggressively begin sending messages to BigQuery BEFORE
flush()
has been called could significantly speed up writes, since we won't be sitting idle for 30s before doing the write. If we have a 30s flush interval, and it takes 30s to write messages to BigQuery, flushing during put could increase performance by as much as 2x, since the two 30s intervals would overlap.flush()
should just flush any outstanding data in the buffer, and then sync on all futures (including those that had been invoked during put() methods).Some thought should be put into how this will impact the adaptive batch sizes.
The text was updated successfully, but these errors were encountered: