New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delayed response from particular host (only Windows, no timeout) #1879
Comments
It works virtually immediately on my machine (Windows, urllib3=1.24.3). |
Yeah, same effect with HTTP. But when I tested a few minutes ago, I also got immediate responses, and now it's back to 42s (on both HTTP and HTTPS), so I guess there's some kind of rate limiting going on. I just don't understand how Chrome is immune from it, but apparently there's got to be a small difference in how I make my request. Do you have any idea why setting a timeout might 'fix' the issue? Is the protocol-level request that gets sent in any way different if a timeout is enabled? |
User-Agent strings probably, but it wouldn't explain why the connect timeout makes a difference. My best guess for why the There are multiple DNS entries and browsers cache the working IP Addresses that they've looked up and used previously. Urllib3 looks up the DNS each time and the ordering of the DNS entries is non-deterministic so we get "bad" addresses for that URL more often than not. The connect timeout means we don't spend too much time on those bad hosts. |
I did sent Chrome's User-Agent string, that's why I initially dropped down to the level of urllib3. It didn't seem to make a difference, but there might be some finer details I'm missing — and I then got distracted by the timeout 'fix'.
Interesting theory. Can you think of any experiments I should run to validate this, or break it down further? I guess that would explain why Chrome isn't affected here, because IIRC it is hardcoded into using Google DNS as opposed to the system-configured DNS (currently 1.1.1.1). |
One more data point I didn't think about earlier, that I just tested: The response time is not correlated to the length of the timeout set! If I set a connection timeout of 0.2s, I get a sub-second total response. But if I set a connection timeout of 10s, I still get sub-second total response times. My interpretation is that the timeout doesn't cause any internal process to actually time out (e.g. "spending time on a bad host in DNS rotation") and then causing the system to recover from it, but rather the very act of setting a timeout seems to affect the behavior of the system.
|
Can you post a packet capture of the two cases? |
That's a good idea, and I'm curious too. I'll look into it. Give me a few days, please. |
I have Wireshark set up for capturing now, but cannot get into the (presumed) rate limited state anymore. I've downloaded about 10 GiB from this host, but that didn't trigger it. If anyone has an idea or better reproduction steps, please let me know. I'll also try some more the next days (far fetched, but maybe there's a day of the week-component?). Otherwise, I'll close this issue in a couple of days. |
Thanks to everyone for your suggestions! I'm really curious about this one and would like to capture and investigate further. Unfortunately, I'm currently unable to get the system into this supposed 'rate limited' state. Therefore, I'm closing this issue for now. Should I ever manage to reproduce, I'll reopen. |
I'm investigating a stalling issue I have with some downstream libraries of urllib3 around a very particular host: https://amsmediusw-ak.studios.ms/
Requesting a response with urllib3 1.25.9 on Windows takes over 40s, but then ultimately gives the expected result of a 200 and a GUID in the response body:
The oddity is that various other means of requesting a response from this host are returning virtually immediately, here's a subset of what I've tested:
fetch
in the dev tools: response in under 50msHere's the kicker, the response is also immediate if I set a connection timeout! — even though no timeout is actually triggered, or at least no exception is being raised.
(Setting a read timeout but no connection timeout has no effect (40+s).)
Reusing the pool manager also means future responses will be immediate. This seems like a major clue to help you understand what's going on here, but I cannot tell.
The timing appears to be very stable at between 42.1 and 42.2s, with minor fluctuations making it further unlikely that any routing issues are going on:
Making a request via PowerShell (
Invoke-WebRequest https://amsmediusw-ak.studios.ms/ -UseBasicParsing
) is also slow. Indicating that this is rooted somewhere in the Windows sockets implementation, possibly some kind of default connection timeout?The text was updated successfully, but these errors were encountered: