Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle RemoteDisconnected exception from http.client and retry request. #1911

Closed
wants to merge 3 commits into from

Conversation

keuko
Copy link

@keuko keuko commented Jul 23, 2020

RemoteDisconnected is raised when the attempt to read the
response results in no data read from the connection, indicating
that the remote end has closed the connection.

As server did not receive the request and closed connection,
so it should be safe to retry.

https://bugs.python.org/issue3566
https://hg.python.org/cpython/rev/eba80326ba53

@codecov
Copy link

codecov bot commented Jul 23, 2020

Codecov Report

Merging #1911 into master will increase coverage by 0.04%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master     #1911      +/-   ##
===========================================
+ Coverage   99.95%   100.00%   +0.04%     
===========================================
  Files          23        23              
  Lines        2055      2056       +1     
===========================================
+ Hits         2054      2056       +2     
+ Misses          1         0       -1     
Flag Coverage Δ
#unittests 99.51% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/urllib3/connectionpool.py 100.00% <100.00%> (ø)
src/urllib3/exceptions.py 100.00% <100.00%> (ø)
src/urllib3/util/retry.py 100.00% <100.00%> (ø)
src/urllib3/connection.py 100.00% <0.00%> (ø)
src/urllib3/util/ssl_.py 100.00% <0.00%> (+1.16%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6bd33f6...354df6b. Read the comment docs.

@gtherond
Copy link

Hey folks, is there anyone from urllib3 able to comment or validate on this issue? It seems to be pretty common and weirdly not implemented even if a proposal from 5 years ago is available.

Copy link
Member

@hodbn hodbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python2.7 does not have RemoteDisconnected. Treating all BadStatusLines as RemoteDisconnected will resend requests that have a bad status line.

try:
from http.client import RemoteDisconnected
except ImportError:
from httplib import BadStatusLine as RemoteDisconnected
Copy link
Member

@hodbn hodbn Aug 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line will catch other BadStatusLine exceptions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hodbn in https://hg.python.org/cpython/rev/eba80326ba53 they noted this for exception exception:: RemoteDisconnected -> " Previously, :exc:BadStatusLine\ ('')
was raised. "

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only until that commit 😞

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, If I understand correctly, I have to distinguish if BadStatusLine("No status line received - the server has closed the connection") or BadStatusLine(something) for py2. Do you have some recommendation how I should do it ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of a way that does not involve handling all three version ranges differently and peeking inside BadStatusLine for the newly introduced message 😞

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question: since the old message contained the actual line could we see if BadStatusLine.args[0] == ""?

Copy link
Member

@hodbn hodbn Aug 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but that will only work for < 2.7.16.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems silly but could we do BadStatusLine.args[0] in ("", "No status line received - the server has closed the connection")? The worst that'd happen is we'd retry on a "legit" bad status error because some host decided to add that as its actual HTTP status.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems silly but could we do BadStatusLine.args[0] in ("", "No status line received - the server has closed the connection")? The worst that'd happen is we'd retry on a "legit" bad status error because some host decided to add that as its actual HTTP status.

Hmm, but it looks like everytime when BadStatusLine is raised urllib should retry request. And my patch is doing this ...

def test_retry_weird_http_version(self):

@keuko keuko force-pushed the fix-remote-disconnected branch 6 times, most recently from 524cdeb to e6c4bd8 Compare August 19, 2020 18:11
@keuko
Copy link
Author

keuko commented Aug 19, 2020

Looks like commit 3671b807e83e28971cfe2ae75de7a9fb2742cee3 is somehow breaking tests for ssl in travis :(

@keuko keuko force-pushed the fix-remote-disconnected branch 3 times, most recently from 51c234c to 143b32a Compare August 20, 2020 09:51
@pquentin
Copy link
Member

@keuko The currently failing tests don't appear to be SSL-related and are known to be flaky, this isn't related to 3671b80. Sorry about the state of our test suite. Please ignore those failures for now

@keuko
Copy link
Author

keuko commented Aug 20, 2020 via email

@keuko keuko force-pushed the fix-remote-disconnected branch 2 times, most recently from 9675fdd to 87c535b Compare August 21, 2020 12:29
@sethmlarson sethmlarson added this to the v1.26 milestone Aug 27, 2020
@sethmlarson
Copy link
Member

I pulled these changes locally but I think my Python 2.7 version is too recent to even test this. Will try to get to it :)

@sethmlarson
Copy link
Member

sethmlarson commented Aug 30, 2020

@keuko I tried running the test case provided without the fix on Python 2.7.9, 2.7.15 and 2.7.16 and couldn't get the test to fail, are we sure this test case reproduces the issue or am I doing something wrong here?

@keuko
Copy link
Author

keuko commented Aug 31, 2020 via email

@keuko
Copy link
Author

keuko commented Aug 31, 2020 via email

@keuko
Copy link
Author

keuko commented Aug 31, 2020

@keuko I tried running the test case provided without the fix on Python 2.7.9, 2.7.15 and 2.7.16 and couldn't get the test to fail, are we sure this test case reproduces the issue or am I doing something wrong here?

Hi @sethmlarson ,

I've already force-pushed edited unit test, check it with current code and with this patch.

@keuko
Copy link
Author

keuko commented Aug 31, 2020

One more thing @sethmlarson, please read my original bug report in cpython https://bugs.python.org/issue41345 << .

I used apache with default keepalive timeout to 5 sec and sleep 5 sec in script.
On apache's server side I had simple script which just wrote received data to file to BE SURE that POST request never arrived to server when exception was thrown.

Sometimes it took long time to hit the bug, so be patient.
I've used iptables command below to decrease time needed to replicate the bug :

iptables -A INPUT -p tcp --sport 80 -m statistic --mode random --probability 0.5 -j DROP

Michal Arbet ( kevko )

@keuko keuko force-pushed the fix-remote-disconnected branch 2 times, most recently from f665eed to 4d436f5 Compare August 31, 2020 17:01
@keuko
Copy link
Author

keuko commented Aug 31, 2020

Hmm, don't know why it is failing on CI , but locally I am passing :(

(urllib3-python3) michalarbet@pixla:~/work/git/urllib3$ coverage run --parallel-mode -m pytest -r a --tb=native --no-success-flaky-report test/with_dummyserver/test_socketlevel.py::TestSocketClosing::test_server_closed_connection_right_before_request
=================================================================================================================================== test session starts ===================================================================================================================================
platform linux -- Python 3.8.2, pytest-4.6.9, py-1.9.0, pluggy-0.13.1
rootdir: /home/michalarbet/work/git/urllib3, inifile: setup.cfg
plugins: timeout-1.3.4, flaky-3.6.1
collected 1 item

test/with_dummyserver/test_socketlevel.py . [100%]

==================================================================================================================================== warnings summary =====================================================================================================================================
test/with_dummyserver/test_socketlevel.py::TestSocketClosing::test_server_closed_connection_right_before_request
/home/michalarbet/work/git/urllib3/dummyserver/server.py:118: NoIPv6Warning: No IPv6 support. Falling back to IPv4.
warnings.warn("No IPv6 support. Falling back to IPv4.", NoIPv6Warning)

-- Docs: https://docs.pytest.org/en/latest/warnings.html
========================================================================================================================== 1 passed, 1 warnings in 0.07 seconds ===========================================================================================================================

RemoteDisconnected is raised when the attempt to read the
response results in no data read from the connection, indicating
that the remote end has closed the connection.

As server did not receive the request and closed connection,
so it should be safe to retry.

https://bugs.python.org/issue3566
https://hg.python.org/cpython/rev/eba80326ba53
@Dobatymo
Copy link

Do I understand this PR correctly that the RemoteDisconnected exception is proposed to be treated as a connection error which can always retried safely regardless of the HTTP method?

If so, that is a very bad idea. It might fix your usecase, however other servers can close the connection after receiving the request successfully and before sending a response, which triggers the RemoteDisconnected error as well.

@keuko
Copy link
Author

keuko commented Sep 29, 2020

Do I understand this PR correctly that the RemoteDisconnected exception is proposed to be treated as a connection error which can always retried safely regardless of the HTTP method?
Well, yes

If so, that is a very bad idea. It might fix your usecase, however other servers can close the connection after receiving the request successfully and before sending a response, which triggers the RemoteDisconnected error as well.

No, it will not trigger RemoteDisconnected, or let's say this situation can't happen in real world.

Why ? Because there is no reason server will not send response after receive data. If exist some reason when client will not receive response... it is connection error - Timeout Error, not RemoteDisconnected.

RemoteDisconnected was implemented for this specific case , it is wery well documented in Cpython.
Please check discussion where RemoteDisconnected was introduced in Cpython - https://hg.python.org/cpython/rev/eba80326ba53.

And also code in cpython.

This bug is very annoying not only for me, google it and you will see this is long-term bug/problem

@Dobatymo
Copy link

Dobatymo commented Sep 29, 2020

I am not sure what you are talking about

import threading, sys

sys.path.insert(0, "src")
from dummyserver.server import SocketServerThread

from urllib3 import HTTPConnectionPool

THE_DATA = b"asdqwe"

def socket_handler(listener):
	sock = listener.accept()[0]

	buf = b""
	while not buf.endswith(THE_DATA):
		buf += sock.recv(65536)

	print(buf)

	sock.close()

ready_event = threading.Event()
server_thread = SocketServerThread(
	socket_handler=socket_handler, ready_event=ready_event, host="localhost"
)
server_thread.start()
ready_event.wait(5)
if not ready_event.is_set():
	raise Exception("most likely failed to start server")

with HTTPConnectionPool(server_thread.host, server_thread.port) as pool:
	r = pool.request("POST", "/", retries=0, body=THE_DATA)

will raise a RemoteDisconnect error and clearly print all the posted data. And it's happening in the real world. See my comment here: jjjake/internetarchive#341 (comment)

Of course the server SHOULD reply something, but there can be many reasons why it's not replying. In any case it's not safe to retry non-idempotent methods like this. There is a reason the retry logic in you first BPO link was not implemented.

Or am I missing something?

@keuko
Copy link
Author

keuko commented Sep 29, 2020

I am not sure what you are talking about

import threading, sys

sys.path.insert(0, "src")
from dummyserver.server import SocketServerThread

from urllib3 import HTTPConnectionPool

THE_DATA = b"asdqwe"

def socket_handler(listener):
	sock = listener.accept()[0]

	buf = b""
	while not buf.endswith(THE_DATA):
		buf += sock.recv(65536)

	print(buf)

	sock.close()

ready_event = threading.Event()
server_thread = SocketServerThread(
	socket_handler=socket_handler, ready_event=ready_event, host="localhost"
)
server_thread.start()
ready_event.wait(5)
if not ready_event.is_set():
	raise Exception("most likely failed to start server")

with HTTPConnectionPool(server_thread.host, server_thread.port) as pool:
	r = pool.request("POST", "/", retries=0, body=THE_DATA)

will raise a RemoteDisconnect error and clearly print all the posted data. And it's happening in the real world. See my comment here: jjjake/internetarchive#341 (comment)

Of course the server SHOULD reply something, but there can be many reasons why it's not replying. In any case it's not safe to retry non-idempotent methods like this. There is a reason the retry logic in you first BPO link was not implemented.

Or am I missing something?

Ok, if I got your point .. it means that RemoteDisconnected can be catched ..but moreover it has to be tested if it is non-idempotent metod or NOT ..and raise Error which will not be retried ..but in other case has to be retried ..

As RFC say :

A client, server, or proxy MAY close the transport connection at any time. For example, a client might have started to send a new request at the same time that the server has decided to close the "idle" connection. From the server's point of view, the connection is being closed while it was idle, but from the client's point of view, a request is in progress.

This means that clients, servers, and proxies MUST be able to recover from asynchronous close events. Client software SHOULD reopen the transport connection and retransmit the aborted sequence of requests without user interaction so long as the request sequence is idempotent (see section 9.1.2). Non-idempotent methods or sequences MUST NOT be automatically retried, although user agents MAY offer a human operator the choice of retrying the request(s). Confirmation by user-agent software with semantic understanding of the application MAY substitute for user confirmation. The automatic retry SHOULD NOT be repeated if the second sequence of requests fails.

@keuko
Copy link
Author

keuko commented Sep 29, 2020

In current situation both types of request is failing , non-idempotent and also idempotent.

@Dobatymo
Copy link

Dobatymo commented Sep 30, 2020

Yes it should be fine to retry idempotent requests.

I think it's a good idea to reraise it as a proper exception (and not just as some argument in ProtocolError) so it can be handled easier from the user/application.

In current situation both types of request is failing , non-idempotent and also idempotent.

Currently RemoteDisconnected is already handled as a read error (and thus retried automatically) for idempotent requests.

Test for yourself:

import threading, sys

sys.path.insert(0, "src")
from dummyserver.server import SocketServerThread

from urllib3 import HTTPConnectionPool

THE_DATA = b"\r\n"

def socket_handler(listener):
	while True:
		sock = listener.accept()[0]

		buf = b""
		while not buf.endswith(THE_DATA):
			buf += sock.recv(65536)

		print(buf)

		sock.close()

ready_event = threading.Event()
server_thread = SocketServerThread(
	socket_handler=socket_handler, ready_event=ready_event, host="localhost"
)
server_thread.start()
ready_event.wait(5)
if not ready_event.is_set():
	raise Exception("most likely failed to start server")

with HTTPConnectionPool(server_thread.host, server_thread.port) as pool:
	r = pool.request("GET", "/", retries=1)

buf is printed 2 times and then a MaxRetryError is raised. To get the retry behavior for POST, simply use r = pool.request("POST", "/", retries=Retry(total=1, allowed_methods=["POST"]), body=THE_DATA)

So imo urllib3 is already doing everything perfectly.

@pquentin
Copy link
Member

pquentin commented Oct 1, 2020

@keuko Since I caused conflicts, I merged master into your branch, please tell me if you need help with git.

@sethmlarson
Copy link
Member

It sounds like there are some questions as to whether this is needed. Just an FYI I'd like to land this in 1.26 but I'm not going to mark it as a blocker for 1.26.0.

@sethmlarson sethmlarson removed this from the v1.26 milestone Oct 30, 2020
@keuko
Copy link
Author

keuko commented Oct 30, 2020

It sounds like there are some questions as to whether this is needed. Just an FYI I'd like to land this in 1.26 but I'm not going to mark it as a blocker for 1.26.0.

Now I am not sure if my patch is working perfectly as @Dobatymo noted above , please test yourself .. @Dobatymo has a good point wit POST and SocketServerThread ... Please discuss it in urrlib ..maybe it can't be fixed :(

Base automatically changed from master to main January 16, 2021 20:06
@sethmlarson
Copy link
Member

I think given the state of this PR and that we'd have to now rebase this on main instead of targetting 1.26 I'm going to close this. I'm still unsure if this is really needed, especially in the context of Python 3.x only for urllib3 v2.0. Would appreciate anyone looking into that further.

Thanks all for the work and weighing in on this PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants