Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python and Javascript stream examples break after 5 minutes #34

Closed
wiertz opened this issue May 10, 2021 · 25 comments
Closed

Python and Javascript stream examples break after 5 minutes #34

wiertz opened this issue May 10, 2021 · 25 comments
Labels
bug Something isn't working

Comments

@wiertz
Copy link

wiertz commented May 10, 2021

Describe the bug
Both Python and Javascript example code for streaming break after 5 minutes of retrieving tweets. Javascript just hangs/ends without message, Python throws the following error:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/response.py", line 543, in _update_chunk_length
    self.chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: b''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/response.py", line 302, in _error_catcher
    yield
  File "/usr/lib/python3/dist-packages/urllib3/response.py", line 598, in read_chunked
    self._update_chunk_length()
  File "/usr/lib/python3/dist-packages/urllib3/response.py", line 547, in _update_chunk_length
    raise httplib.IncompleteRead(line)
http.client.IncompleteRead: IncompleteRead(0 bytes read)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/xxx/.local/lib/python3.6/site-packages/requests/models.py", line 753, in generate
    for chunk in self.raw.stream(chunk_size, decode_content=True):
  File "/usr/lib/python3/dist-packages/urllib3/response.py", line 432, in stream
    for line in self.read_chunked(amt, decode_content=decode_content):
  File "/usr/lib/python3/dist-packages/urllib3/response.py", line 626, in read_chunked
    self._original_response.close()
  File "/usr/lib/python3.6/contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/lib/python3/dist-packages/urllib3/response.py", line 320, in _error_catcher
    raise ProtocolError('Connection broken: %r' % e, e)
urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "twitter-sample-python.py", line 93, in <module>
    main()
  File "twitter-sample-python.py", line 89, in main
    get_stream(headers, set, bearer_token)
  File "twitter-sample-python.py", line 76, in get_stream
    for response_line in response.iter_lines():
  File "/home/xxx/.local/lib/python3.6/site-packages/requests/models.py", line 797, in iter_lines
    for chunk in self.iter_content(chunk_size=chunk_size, decode_unicode=decode_unicode):
  File "/home/xxx/.local/lib/python3.6/site-packages/requests/models.py", line 756, in generate
    raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))

To Reproduce
Download example code, set bearer token, run example code. Each time the code is executed, it will stop receiving tweets after the same amount of time, i.e. ~5 minutes. I could reproduce this on two different machines, os (ubuntu+osx), twitter accounts, and three different internet connections.

Expected behavior
Stream should not stop.

@wiertz wiertz added the bug Something isn't working label May 10, 2021
@Da1ne
Copy link

Da1ne commented May 17, 2021

Also experiencing this consistently at 5 minute mark. Noticed after these however, I wouldn't receive a 429, so wrapped a try/except around the iter_lines() and continued with a while loop.

try:
    for response_line in response.iter_lines():
        if response_line:
            json_response = json.loads(response_line)
            print(json_response)
except requests.exceptions.ChunkedEncodingError:
    encoding_count += 1

A crude patch until Twitter Devs figure out what's happening on their end?

@wiertz
Copy link
Author

wiertz commented May 17, 2021

Glad someone else can confirm this. In production I reconnect, but it seems an imperfect solution as it likely still means I lose data. Furthermore, nodejs seems to behave awkward here – in v12 the stream ends and I can reconnect, but in v14 and v16 nodejs just silently stops code execution: no error to catch, no loop to continue.

@mustafayd
Copy link

mustafayd commented May 23, 2021

I encounter similar issue with "filtered_stream.py" which was working for hours without any problem a month ago.

@jahvi
Copy link

jahvi commented May 24, 2021

Does anyone have a workaround for the NodeJS version? I've attached listeners to every possible event and the stream just stops without triggering any of my logs.

@wiertz
Copy link
Author

wiertz commented May 24, 2021

Only workaround I found is to reconnect immediately. This only works properly with node v 12.x since in later versions the stream does not stop immediately, but hangs for a considerable amount of time before it disconnects. Another alternative is to use python as python at least throws an error which you can handle (and reconnect).

@mustafayd
Copy link

Only workaround I found is to reconnect immediately. This only works properly with node v 12.x since in later versions the stream does not stop immediately, but hangs for a considerable amount of time before it disconnects. Another alternative is to use python as python at least throws an error which you can handle (and reconnect).

Thanks for the reply, I am working with python but I did not fully understand the structure. I want to ask you if you can help. If we consider the example python code (https://github.com/twitterdev/Twitter-API-v2-sample-code/blob/master/Filtered-Stream/filtered_stream.py) where should I put try/except block? And what is the meaning of reconnection in terms of the code should I call get_stream function in the exception.

@jahvi
Copy link

jahvi commented May 24, 2021

I ended up switching to the https module and that works fine, must be an issue with needle.

@wiertz
Copy link
Author

wiertz commented May 24, 2021

I can reproduce the problem with needle, fetch, python and also https. Using https in the following function as a replacement for the example code function streamConnect(retryAttempt) still breaks after five minutes:

function streamConnect(retryAttempt) {
    https.get(streamURL, {
        headers: {
            "User-Agent": "v2FilterStreamJS",
            "Authorization": `Bearer ${token}`
        },
        timeout: 20000
    }, (stream) => {
        console.log(stream.statusCode)
        stream.on('data', data => {
            try {
                const json = JSON.parse(data);
                console.log(json);
                // A successful connection resets retry count.
                retryAttempt = 0;
            } catch (e) {
                if (data.detail === "This stream is currently at the maximum allowed connection limit.") {
                    console.log(data.detail)
                    process.exit(1)
                } else {
                    // Keep alive signal received. Do nothing.
                }
            }
        }).on('err', error => {
            if (error.code !== 'ECONNRESET') {
                console.log(error.code);
                process.exit(1);
            } else {
                // This reconnection logic will attempt to reconnect when a disconnection is detected.
                // To avoid rate limits, this logic implements exponential backoff, so the wait time
                // will increase if the client cannot reconnect to the stream. 
                setTimeout(() => {
                    console.warn("A connection error occurred. Reconnecting...")
                    streamConnect(++retryAttempt);
                }, 2 ** retryAttempt)
            }
        });
        return stream;
    });
}

Do you mind sharing your code, @jahvi (and the node version you are on since this seems to have some impact on the error?)

@jahvi
Copy link

jahvi commented May 25, 2021

@wiertz Actually I just noticed it does stop as well after ~5mins, the only reason I thought it didn't was because it doesn't exit like with needle but it still stops nonetheless..

I'll try a few more things but I may end up having to switch to the python version.

@jahvi
Copy link

jahvi commented May 25, 2021

I think I managed to fix it now by adding the reconnection logic to the end event as well (still using https, I tried the same thing with needle and it didn't work).

It's been running for 20+ minutes now and it still stops but it reconnects automatically:

stream.on('end', () => {
    setTimeout(() => {
        console.warn(
            'A connection error occurred. Reconnecting...'
        );
        streamConnect(++retryAttempt);
    }, 2 ** retryAttempt);
});

EDIT: Ignore this, it doesn't work anymore somehow... I switched to using python + @Da1ne fix above as it's more consistent.

@sergis4ake
Copy link

I have the same issue with python, any solution?

Also i get this error with code 104:

Traceback (most recent call last):
File "/home/xxx/anaconda3/envs/ai/lib/python3.7/site-packages/urllib3/response.py", line 437, in _error_catcher
yield
File "/home/xxx/anaconda3/envs/ai/lib/python3.7/site-packages/urllib3/response.py", line 519, in read
data = self._fp.read(amt) if not fp_closed else b""
File "/home/xxx/anaconda3/envs/ai/lib/python3.7/http/client.py", line 461, in read
n = self.readinto(b)
File "/home/xxx/anaconda3/envs/ai/lib/python3.7/http/client.py", line 495, in readinto
return self._readinto_chunked(b)
File "/home/xxx/anaconda3/envs/ai/lib/python3.7/http/client.py", line 590, in _readinto_chunked
chunk_left = self._get_chunk_left()
File "/home/xxx/anaconda3/envs/ai/lib/python3.7/http/client.py", line 558, in _get_chunk_left
chunk_left = self._read_next_chunk_size()
File "/home/xxx/anaconda3/envs/ai/lib/python3.7/http/client.py", line 518, in _read_next_chunk_size
line = self.fp.readline(_MAXLINE + 1)
File "/home/xxx/anaconda3/envs/ai/lib/python3.7/socket.py", line 589, in readinto
return self._sock.recv_into(b)
File "/home/xxx/anaconda3/envs/ai/lib/python3.7/ssl.py", line 1071, in recv_into
return self.read(nbytes, buffer)
File "/home/xxx/anaconda3/envs/ai/lib/python3.7/ssl.py", line 929, in read
return self._sslobj.read(len, buffer)
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/xxx/anaconda3/envs/ai/lib/python3.7/site-packages/TwitterAPI/TwitterAPI.py", line 373, in _iter_stream
buf += self.stream.read(1)
File "/home/xxx/anaconda3/envs/ai/lib/python3.7/site-packages/urllib3/response.py", line 541, in read
raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
File "/home/xxx/anaconda3/envs/ai/lib/python3.7/contextlib.py", line 130, in exit
self.gen.throw(type, value, traceback)
File "/home/xxx/anaconda3/envs/ai/lib/python3.7/site-packages/urllib3/response.py", line 455, in _error_catcher
raise ProtocolError("Connection broken: %r" % e, e)
urllib3.exceptions.ProtocolError: ("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/xxx/repos/2themoon/news-classifier/newsclassifier/websource/api/impl/twitter/v2.py", line 276, in producer_streaming
for line in stream:
File "/home/xxx/anaconda3/envs/ai/lib/python3.7/site-packages/TwitterAPI/TwitterAPI.py", line 409, in iter
for item in self._iter_stream():
File "/home/xxx/anaconda3/envs/ai/lib/python3.7/site-packages/TwitterAPI/TwitterAPI.py", line 399, in _iter_stream
raise TwitterConnectionError(e)
TwitterAPI.TwitterError.TwitterConnectionError: ("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer'))

Thanks in advance!

@pranftw
Copy link

pranftw commented Jun 5, 2021

Hey Everyone, it's a problem that arises with some defective http servers that can't be handled by httplib (dependency of requests). So I came up with the following method to tackle it.

while True:

            try:
                  response = requests.get("https://api.twitter.com/2/tweets/search/stream?expansions=author_id", headers=self.headers, stream=True)
                  if(response.status_code!=200):
                      logger.error(f"SearchStream {response.json()}")
                      break
                  self.on_stream_trigger(response)

            except KeyboardInterrupt as e:
                print("\nSTREAM CLOSED!")
                raise SystemExit(e)

            except Exception as e:
                logger.exception(e)
                continue

You can use loggers to keep track of all the exceptions that are occuring. I know that this is an ugly fix but hey, it works!

@ShivamJoker
Copy link

Guys any workaround for NodeJS ?

@ShivamJoker
Copy link

It seems we can use PM2 to keep the server running if it crashes

@arekgotfryd
Copy link

arekgotfryd commented Jun 6, 2021

@twitterdev is anyone going to look into this? I have seen this problem described across node, ruby and python so I don't think it's a problem with client implementations which are consuming stream api. First reports are dated back to October 2020
https://twittercommunity.com/t/rate-limit-on-tweets-stream-api/144389

@ShivamJoker
Copy link

If this is such a long issue then how people are making bots with it ? @arekgotfryd

@arekgotfryd
Copy link

@ShivamJoker I would love to know. Do you happen to know?

@andypiper
Copy link
Contributor

Thanks for all the information here and the workarounds to reconnect. We'll see what we can do to integrate those changes into future versions of the sample code as needed.

@pranftw
Copy link

pranftw commented Jun 7, 2021

@andypiper I've submitted a pull request for the Python one at #36

Thanks for all the information here and the workarounds to reconnect. We'll see what we can do to integrate those changes into future versions of the sample code as needed.

@ctilly
Copy link

ctilly commented Jun 10, 2021

Also experiencing this consistently at 5 minute mark. Noticed after these however, I wouldn't receive a 429, so wrapped a try/except around the iter_lines() and continued with a while loop.

try:
    for response_line in response.iter_lines():
        if response_line:
            json_response = json.loads(response_line)
            print(json_response)
except requests.exceptions.ChunkedEncodingError:
    encoding_count += 1

A crude patch until Twitter Devs figure out what's happening on their end?

Hi @Da1ne,
I appreciate the work-around for catching the error. What are you doing with the error though once you've caught it, and what is the "encoding_count" used for? How are you restarting the script once it bonks? Do you have this inside a while loop that can keep the script running after each exception?

@wiertz
Copy link
Author

wiertz commented Jun 10, 2021

Thank you @andypiper for taking a look at it. Nevertheless I think it has become pretty clear by now that this is not a client side issue. All solutions I have seen thus far just mean catching the error and reconnecting. Afaics this is a backend issue and it would be awesome if this would get some attention there.

To be really honest I never expected a solution here, but was trying to raise attention and see if others have this issue. So feel free to close this if it is more appropriately addressed elsewhere.

@Da1ne
Copy link

Da1ne commented Jun 11, 2021

Also experiencing this consistently at 5 minute mark. Noticed after these however, I wouldn't receive a 429, so wrapped a try/except around the iter_lines() and continued with a while loop.

try:
    for response_line in response.iter_lines():
        if response_line:
            json_response = json.loads(response_line)
            print(json_response)
except requests.exceptions.ChunkedEncodingError:
    encoding_count += 1

A crude patch until Twitter Devs figure out what's happening on their end?

Hi @Da1ne,
I appreciate the work-around for catching the error. What are you doing with the error though once you've caught it, and what is the "encoding_count" used for? How are you restarting the script once it bonks? Do you have this inside a while loop that can keep the script running after each exception?

Hi ctilly,

Really depends what you're doing with your implementation. For me:

if __name__ == "__main__":
    encoding_count = 0
    while encoding_count < 5:
        try:
            main()
        except Exception as e:
            if encoding_count == 4:
                # do something here. I send an alert
                raise SystemExit
            else:
                print(e) # typically 429
                # do something to wait for the alert. 
                time.sleep(1000) # this is enough for me, however a better implementation is a backoff method (see forums)
                encoding_count +=1

Hope this helps!

@ctilly
Copy link

ctilly commented Jun 12, 2021

Also experiencing this consistently at 5 minute mark. Noticed after these however, I wouldn't receive a 429, so wrapped a try/except around the iter_lines() and continued with a while loop.

try:
    for response_line in response.iter_lines():
        if response_line:
            json_response = json.loads(response_line)
            print(json_response)
except requests.exceptions.ChunkedEncodingError:
    encoding_count += 1

A crude patch until Twitter Devs figure out what's happening on their end?

Hi @Da1ne,
I appreciate the work-around for catching the error. What are you doing with the error though once you've caught it, and what is the "encoding_count" used for? How are you restarting the script once it bonks? Do you have this inside a while loop that can keep the script running after each exception?

Hi ctilly,

Really depends what you're doing with your implementation. For me:

if __name__ == "__main__":
    encoding_count = 0
    while encoding_count < 5:
        try:
            main()
        except Exception as e:
            if encoding_count == 4:
                # do something here. I send an alert
                raise SystemExit
            else:
                print(e) # typically 429
                # do something to wait for the alert. 
                time.sleep(1000) # this is enough for me, however a better implementation is a backoff method (see forums)
                encoding_count +=1

Hope this helps!

Ah! I didn't think of putting the error trap in the "if name" block. I'm trapping the error and then resetting the connection within the main function and that worked really well today. However, I like this idea of trapping the error outside main() because I can reset the whole script if I need to. Thanks for the tip!

@Da1ne
Copy link

Da1ne commented Jul 14, 2021

For anyone still following/escaping this error, it would appear the error was on the server side, and was patched yesterday:
https://twittercommunity.com/t/filtered-stream-request-breaks-in-5-min-intervals/153926
Good to know we weren't all going mad!

@ShivamJoker
Copy link

Okay I tested this on JavaScript and seems to be fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

10 participants