You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 7, 2022. It is now read-only.
We have implemented the hosebird client (hbc-core-1.4.0.jar) to consume twitter firehose stream, everything works great 👍 , except the backfill param "count" which doesn't always work. That to be said, we set the backfill on startup, and we can see it works on first connection. However, we don't see the backfill param "count" anymore whenever it re-connects due to error during consuming.
We went through the HBC's source code few times, and it looks like backfill should always work with rate retrieved from rate tracker (we tried with the default rate tracker in ClientBuilder and our own instance of rate tracker as well).
Here are some logs around first connection, where we can see the backfill param "count"
2013-09-20 08:59:05,725 INFO TwitterStreamReader com.twitter.hbc.httpclient.BasicClient - New connection executed: HosebirdClient, endpoint: /1.1/statuses/firehose.json?count=150000&allow_restricted=true&delimited=length&partitions=0%2C1%2C2%2C3&stall_warnings=true
2013-09-20 08:59:06,080 INFO hosebird-client-io-thread-0 com.twitter.hbc.httpclient.ClientBase - HosebirdClient Establishing a connection
2013-09-20 08:59:06,085 INFO TwitterEventQueue TwitterEventQueue - [CONNECTION_ATTEMPT] - GET https://stream.twitter.com/1.1/statuses/firehose.json?count=150000&allow_restricted=true&delimited=length&partitions=0%2C1%2C2%2C3&stall_warnings=true HTTP/1.1
2013-09-20 08:59:06,741 DEBUG hosebird-client-io-thread-0 com.twitter.hbc.httpclient.ClientBase - HosebirdClient Connection successfully established
2013-09-20 08:59:06,742 INFO TwitterEventQueue TwitterEventQueue - [CONNECTED] - HTTP/1.1 200 OK
2013-09-20 08:59:06,742 INFO hosebird-client-io-thread-0 com.twitter.hbc.httpclient.ClientBase - HosebirdClient Processing connection data
2013-09-20 08:59:06,742 INFO TwitterEventQueue TwitterEventQueue - [PROCESSING] - Processing messages
Here are some logs around re-connection, where the backfill param "count" is missing
2013-09-20 11:28:27,083 INFO hosebird-client-io-thread-0 com.twitter.hbc.httpclient.ClientBase - HosebirdClient Disconnected during processing - will reconnect
2013-09-20 11:28:27,083 INFO hosebird-client-io-thread-0 com.twitter.hbc.httpclient.ClientBase - HosebirdClient Done processing, preparing to close connection
2013-09-20 11:28:27,083 INFO TwitterEventQueue TwitterEventQueue - [DISCONNECTED] - Read timed out
2013-09-20 11:28:27,093 INFO hosebird-client-io-thread-0 com.twitter.hbc.httpclient.ClientBase - HosebirdClient Establishing a connection
2013-09-20 11:28:27,093 INFO TwitterEventQueue TwitterEventQueue - [CONNECTION_ATTEMPT] - GET https://stream.twitter.com/1.1/statuses/firehose.json?allow_restricted=true&delimited=length&partitions=0%2C1%2C2%2C3&stall_warnings=true HTTP/1.1
2013-09-20 11:28:27,684 DEBUG hosebird-client-io-thread-0 com.twitter.hbc.httpclient.ClientBase - HosebirdClient Connection successfully established
2013-09-20 11:28:27,684 INFO TwitterEventQueue TwitterEventQueue - [CONNECTED] - HTTP/1.1 200 OK
2013-09-20 11:28:27,684 INFO hosebird-client-io-thread-0 com.twitter.hbc.httpclient.ClientBase - HosebirdClient Processing connection data
2013-09-20 11:28:27,685 INFO TwitterEventQueue TwitterEventQueue - [PROCESSING] - Processing messages
From source code we can see that it was IOExcepton causing the disconnection
catch (IOException ex) {
// connection issue? whatever. let's try connecting again
// we can't really diagnosis the actual disconnection reason without parsing (looking at disconnect message)
// but we can make a good guess at when we're stalling. TODO
logger.info("{} Disconnected during processing - will reconnect", name);
statsReporter.incrNumDisconnects();
addEvent(new Event(EventType.DISCONNECTED, ex));
}
According to a line of comment on ClientBase.run() method
"if IOException, time to restart the connection: handle http connection cleanup, do some backoff, set backfill"
So, we expect the backfill param would still be set before next connection, but it didn't.
Any help or direction will be much appreciated!
Thanks in advance!
Jack
The text was updated successfully, but these errors were encountered:
I noticed the same issue during Firehose testing. It appears that the issue is due to backoffMillis in BasicReconnectionManager being 0 for the first reconnection. It is only if the reconnect fails that backoffMillis is increased and estimateBackfill(tps) will result in a count > 0. This means that if a read times out or we get a gzip/ssl error (not uncommon), we'd be missing a few ms worth of tweets.
My solution to this was to write a custom ReconnectionManager that listens for DISCONNECT events. It then calculates the backfill based on tps * amount of time disconnected (i.e., currentTime - disconnectTime).
I'm still in the process of testing this, but so far so good
Hello everybody,
We have implemented the hosebird client (hbc-core-1.4.0.jar) to consume twitter firehose stream, everything works great 👍 , except the backfill param "count" which doesn't always work. That to be said, we set the backfill on startup, and we can see it works on first connection. However, we don't see the backfill param "count" anymore whenever it re-connects due to error during consuming.
We went through the HBC's source code few times, and it looks like backfill should always work with rate retrieved from rate tracker (we tried with the default rate tracker in ClientBuilder and our own instance of rate tracker as well).
Here are some logs around first connection, where we can see the backfill param "count"
2013-09-20 08:59:05,725 INFO TwitterStreamReader com.twitter.hbc.httpclient.BasicClient - New connection executed: HosebirdClient, endpoint: /1.1/statuses/firehose.json?count=150000&allow_restricted=true&delimited=length&partitions=0%2C1%2C2%2C3&stall_warnings=true
2013-09-20 08:59:06,080 INFO hosebird-client-io-thread-0 com.twitter.hbc.httpclient.ClientBase - HosebirdClient Establishing a connection
2013-09-20 08:59:06,085 INFO TwitterEventQueue TwitterEventQueue - [CONNECTION_ATTEMPT] - GET https://stream.twitter.com/1.1/statuses/firehose.json?count=150000&allow_restricted=true&delimited=length&partitions=0%2C1%2C2%2C3&stall_warnings=true HTTP/1.1
2013-09-20 08:59:06,741 DEBUG hosebird-client-io-thread-0 com.twitter.hbc.httpclient.ClientBase - HosebirdClient Connection successfully established
2013-09-20 08:59:06,742 INFO TwitterEventQueue TwitterEventQueue - [CONNECTED] - HTTP/1.1 200 OK
2013-09-20 08:59:06,742 INFO hosebird-client-io-thread-0 com.twitter.hbc.httpclient.ClientBase - HosebirdClient Processing connection data
2013-09-20 08:59:06,742 INFO TwitterEventQueue TwitterEventQueue - [PROCESSING] - Processing messages
Here are some logs around re-connection, where the backfill param "count" is missing
2013-09-20 11:28:27,083 INFO hosebird-client-io-thread-0 com.twitter.hbc.httpclient.ClientBase - HosebirdClient Disconnected during processing - will reconnect
2013-09-20 11:28:27,083 INFO hosebird-client-io-thread-0 com.twitter.hbc.httpclient.ClientBase - HosebirdClient Done processing, preparing to close connection
2013-09-20 11:28:27,083 INFO TwitterEventQueue TwitterEventQueue - [DISCONNECTED] - Read timed out
2013-09-20 11:28:27,093 INFO hosebird-client-io-thread-0 com.twitter.hbc.httpclient.ClientBase - HosebirdClient Establishing a connection
2013-09-20 11:28:27,093 INFO TwitterEventQueue TwitterEventQueue - [CONNECTION_ATTEMPT] - GET https://stream.twitter.com/1.1/statuses/firehose.json?allow_restricted=true&delimited=length&partitions=0%2C1%2C2%2C3&stall_warnings=true HTTP/1.1
2013-09-20 11:28:27,684 DEBUG hosebird-client-io-thread-0 com.twitter.hbc.httpclient.ClientBase - HosebirdClient Connection successfully established
2013-09-20 11:28:27,684 INFO TwitterEventQueue TwitterEventQueue - [CONNECTED] - HTTP/1.1 200 OK
2013-09-20 11:28:27,684 INFO hosebird-client-io-thread-0 com.twitter.hbc.httpclient.ClientBase - HosebirdClient Processing connection data
2013-09-20 11:28:27,685 INFO TwitterEventQueue TwitterEventQueue - [PROCESSING] - Processing messages
From source code we can see that it was IOExcepton causing the disconnection
catch (IOException ex) {
// connection issue? whatever. let's try connecting again
// we can't really diagnosis the actual disconnection reason without parsing (looking at disconnect message)
// but we can make a good guess at when we're stalling. TODO
logger.info("{} Disconnected during processing - will reconnect", name);
statsReporter.incrNumDisconnects();
addEvent(new Event(EventType.DISCONNECTED, ex));
}
According to a line of comment on ClientBase.run() method
"if IOException, time to restart the connection: handle http connection cleanup, do some backoff, set backfill"
So, we expect the backfill param would still be set before next connection, but it didn't.
Any help or direction will be much appreciated!
Thanks in advance!
Jack
The text was updated successfully, but these errors were encountered: