-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming.py Crash on Incomplete Read Error when tweets are very high... #448
Comments
Yes, I'm facing this error only during high streams of tweets. I suspect that the connection is being closed by Twitter because my program is not consuming data as fast as they are produced. I can monitor how many tweets my program process by minute and date (created_at) of the last processed tweet. Sometimes, the last processed tweet is 5 to 8 minutes behind current hour. My program is running in a VM hosted in US and the database server (MongoDB) is running in Brazil. Next things I'll check are network latency and database throughput. I never had this problem before when both, program and database, were running on the same network, even in high streams. |
@waltersf the connection is not closed by twitter. It's a bug how httplib handle things i patched the streaming.py to suppress the error and continue streaming. Because my application completely dependent on realtime twitter data and can't afford to stop for any reason . But i will know only if my patch works tonight during the match .. You can look at the code
|
Did this end up working for you? I just started receiving this error. Using v2.3.0. |
I suppressed the error. It's working all fine now.. Regards, Febin John James On Tue, Jul 8, 2014 at 3:07 AM, Calvin Belden notifications@github.com
|
@jamesfebin: Would you mind opening a PR making that change? It looks great! |
Sorry all, but I don't think suppressing an exception is a good plan. If this is a bug in httplib this should be fixed there. Does anyone have more info on what is exactly happening here? Is there a reason why |
Reading through the code, it seems httplib.read() returns IncompleteRead when socket.read() returns 0 bytes. Unfortunately it can't distinguish between being disconnected and an interrupt causing the read call to return with no data. In practice, I only see this when I'm falling behind, meaning that twitter cuts me off. It's easy to replicate this by listening for a popular keyword, then in the on_status callback, just sleep for a bit to guarantee you're falling behind. I guess you could try to distinguish between those two cases by keeping track of the delay between the tweet being posted and received. If it's greater than a couple seconds, you're probably falling behind and it's a disconnect. If it's real time, you're probably ok and should just continue. Alternatively, try reading again, and disconnect/reconnect if it happens again or you get some other error (socket closed?). Any better solutions? |
I'm quite confident that you (@tewalds) are correct. I've just done multiple tests and I've noticed the following. I've ran Tweepy - streaming the sample API - 4 times and every time I receive a Tweet I display the time it was received and the created_at value provided by the Twitter API indicating the time the Tweet was tweeted. I've come back with the following results:
As you can see with each test I'm falling quite a bit behind on the live stream. Obviously, either my script or my connection is having trouble keeping up with the data stream provided by Twitter. Additionally, what is interesting is that the crash almost always happens at the 5 minute mark. This seems quite consistent with the rate 'stall_warnings' are send out. It appears the sample API supports the stall_warnings but Tweepy doesn't have stall_warnings implemented in the sample call. Is this for a particular reason? That said, even when implementing the stall_warnings parameter into the sample function it appears I'm not receiving these warnings when appropriate. Unfortunately I don't have a solution to the disconnecting problem. Mostly because if @tewalds is right there probably is no decent solution for this. |
Well, the solution is easy: stop falling behind. This can either be achieved by listening to fewer or less popular words, or by processing them faster. You can probably achieve that by pushing the tweets to a different thread for the real processing. In my case the problem was that the streaming api did many socket.read() calls per tweet and on appengine every socket.read() is an api call over the network, so it was very slow. I fixed that in #496 by doing only 2 per read at most, and allowing buffering to read many at a time. |
This should handle both cases of incomplete read catched by requests, or catched by tweepy. This resolves tweepy#237, resolves tweepy#448, resolves tweepy#536, resolves tweepy#650, resolves tweepy#691, resolves tweepy#798. Similar to tweepy#498.
Marking as a duplicate of #237, but keeping this open until it's resolved, due to the relevant conversation in this thread. |
This comment has been minimized.
This comment has been minimized.
This should now be resolved with 68e19cc by simply handling it as a connection error and attempting to reconnect. It might be worth mentioning this behavior (of Tweepy automatically attempting to reconnect when this connection error occurs due to Twitter's API disconnecting the stream for falling too behind) in the documentation for streaming. I plan on improving the documentation for streaming at some point in the future, so if it hasn't been added to the documentation by then, I'll probably mention that Tweepy automatically attempts to reconnect when connection errors occur and note the @jamesfebin For code block usage, see https://docs.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks. @rthijssen |
This error occurs when there is a high stream of tweets in a particular time.. Example try streaming world cup hashtag during the game. The problem looks similar to this https://dev.twitter.com/discussions/9554 Can anyone help in fixing this?
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 504, in run
self.__target(_self.__args, *_self.__kwargs)
File "/usr/local/lib/python2.7/dist-packages/tweepy/streaming.py", line 173, in _run
self._read_loop(resp)
File "/usr/local/lib/python2.7/dist-packages/tweepy/streaming.py", line 220, in _read_loop
d = resp.read(1)
File "/usr/lib/python2.7/httplib.py", line 541, in read
return self._read_chunked(amt)
File "/usr/lib/python2.7/httplib.py", line 586, in _read_chunked
raise IncompleteRead(''.join(value))
The text was updated successfully, but these errors were encountered: