Delayed response from particular host (only Windows, no timeout) #1879

hheimbuerger · 2020-05-25T03:08:44Z

I'm investigating a stalling issue I have with some downstream libraries of urllib3 around a very particular host: https://amsmediusw-ak.studios.ms/
Requesting a response with urllib3 1.25.9 on Windows takes over 40s, but then ultimately gives the expected result of a 200 and a GUID in the response body:

urllib3.PoolManager().request('GET', 'https://amsmediusw-ak.studios.ms/').data

The oddity is that various other means of requesting a response from this host are returning virtually immediately, here's a subset of what I've tested:

Chrome on Windows (both as a regular page request, as well as with fetch in the dev tools: response in under 50ms
cURL on Linux (via WSL1): fast
urllib3 on Linux (via Glitch): fast

Here's the kicker, the response is also immediate if I set a connection timeout! — even though no timeout is actually triggered, or at least no exception is being raised.

>>> timeit.timeit("urllib3.PoolManager().request('GET', 'https://amsmediusw-ak.studios.ms/',
                           timeout=urllib3.Timeout(connect=0.1)).data",
                  setup="import urllib3", number=1)
0.377045900000212
>>> timeit.timeit("urllib3.PoolManager().request('GET', 'https://amsmediusw-ak.studios.ms/').data",
                  setup="import urllib3", number=1)
42.16889579999997

(Setting a read timeout but no connection timeout has no effect (40+s).)

Reusing the pool manager also means future responses will be immediate. This seems like a major clue to help you understand what's going on here, but I cannot tell.

The timing appears to be very stable at between 42.1 and 42.2s, with minor fluctuations making it further unlikely that any routing issues are going on:

>>> timeit.repeat("urllib3.PoolManager().request('GET', 'https://amsmediusw-ak.studios.ms/').data", setup="import urllib3", number=1, repeat=5)
[42.18481859999997, 42.1579299, 42.24814900000001, 42.16824580000002, 42.18917329999999]

Making a request via PowerShell (Invoke-WebRequest https://amsmediusw-ak.studios.ms/ -UseBasicParsing) is also slow. Indicating that this is rooted somewhere in the Windows sockets implementation, possibly some kind of default connection timeout?

The text was updated successfully, but these errors were encountered:

hodbn · 2020-05-25T05:57:58Z

It works virtually immediately on my machine (Windows, urllib3=1.24.3).
Does this happen with HTTP? Can you post a packet capture with HTTP?

hheimbuerger · 2020-05-25T13:46:42Z

Yeah, same effect with HTTP.

But when I tested a few minutes ago, I also got immediate responses, and now it's back to 42s (on both HTTP and HTTPS), so I guess there's some kind of rate limiting going on. I just don't understand how Chrome is immune from it, but apparently there's got to be a small difference in how I make my request.

Do you have any idea why setting a timeout might 'fix' the issue? Is the protocol-level request that gets sent in any way different if a timeout is enabled?

sigmavirus24 · 2020-05-25T14:50:05Z

But when I tested a few minutes ago, I also got immediate responses, and now it's back to 42s (on both HTTP and HTTPS), so I guess there's some kind of rate limiting going on. I just don't understand how Chrome is immune from it, but apparently there's got to be a small difference in how I make my request.

User-Agent strings probably, but it wouldn't explain why the connect timeout makes a difference. My best guess for why the connect timeout helps is:

There are multiple DNS entries and browsers cache the working IP Addresses that they've looked up and used previously. Urllib3 looks up the DNS each time and the ordering of the DNS entries is non-deterministic so we get "bad" addresses for that URL more often than not. The connect timeout means we don't spend too much time on those bad hosts.

hheimbuerger · 2020-05-27T03:46:21Z

User-Agent strings probably

I did sent Chrome's User-Agent string, that's why I initially dropped down to the level of urllib3. It didn't seem to make a difference, but there might be some finer details I'm missing — and I then got distracted by the timeout 'fix'.

There are multiple DNS entries and browsers cache the working IP Addresses that they've looked up and used previously. Urllib3 looks up the DNS each time and the ordering of the DNS entries is non-deterministic so we get "bad" addresses for that URL more often than not. The connect timeout means we don't spend too much time on those bad hosts.

Interesting theory. Can you think of any experiments I should run to validate this, or break it down further?

I guess that would explain why Chrome isn't affected here, because IIRC it is hardcoded into using Google DNS as opposed to the system-configured DNS (currently 1.1.1.1).

hheimbuerger · 2020-05-27T03:56:01Z

One more data point I didn't think about earlier, that I just tested:

The response time is not correlated to the length of the timeout set!

If I set a connection timeout of 0.2s, I get a sub-second total response. But if I set a connection timeout of 10s, I still get sub-second total response times.

My interpretation is that the timeout doesn't cause any internal process to actually time out (e.g. "spending time on a bad host in DNS rotation") and then causing the system to recover from it, but rather the very act of setting a timeout seems to affect the behavior of the system.

>>> def test(timeout): return timeit.repeat(f"urllib3.PoolManager().request('GET', 'http://amsmediusw-ak.studios.ms/', timeout=urllib3.Timeout(connect={timeout})).data", setup="import urllib3", number=1, repeat=5)
...
>>> test(timeout=0.2)
[0.05902539999806322, 0.055709000007482246, 0.07046360001550056, 0.07770219998201355, 0.06996609998168424]
>>> test(timeout=10)
[0.07713540000258945, 0.06717739999294281, 0.06995289999758825, 0.07092060000286438, 0.07774379997863434]

hodbn · 2020-06-05T08:46:06Z

Can you post a packet capture of the two cases?

hheimbuerger · 2020-06-06T03:36:55Z

Can you post a packet capture of the two cases?

That's a good idea, and I'm curious too. I'll look into it. Give me a few days, please.

hheimbuerger · 2020-06-13T04:13:08Z

I have Wireshark set up for capturing now, but cannot get into the (presumed) rate limited state anymore. I've downloaded about 10 GiB from this host, but that didn't trigger it.

If anyone has an idea or better reproduction steps, please let me know. I'll also try some more the next days (far fetched, but maybe there's a day of the week-component?).

Otherwise, I'll close this issue in a couple of days.

hheimbuerger · 2020-07-11T04:11:33Z

Thanks to everyone for your suggestions! I'm really curious about this one and would like to capture and investigate further. Unfortunately, I'm currently unable to get the system into this supposed 'rate limited' state.

Therefore, I'm closing this issue for now. Should I ever manage to reproduce, I'll reopen.

hheimbuerger mentioned this issue May 28, 2020

Microsoft videos (eg on mybuild.microsoft.com) failing ytdl-org/youtube-dl#25363

Open

5 tasks

hheimbuerger closed this as completed Jul 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delayed response from particular host (only Windows, no timeout) #1879

Delayed response from particular host (only Windows, no timeout) #1879

hheimbuerger commented May 25, 2020 •

edited

hodbn commented May 25, 2020

hheimbuerger commented May 25, 2020

sigmavirus24 commented May 25, 2020

hheimbuerger commented May 27, 2020

hheimbuerger commented May 27, 2020 •

edited

hodbn commented Jun 5, 2020

hheimbuerger commented Jun 6, 2020 •

edited

hheimbuerger commented Jun 13, 2020 •

edited

hheimbuerger commented Jul 11, 2020

Delayed response from particular host (only Windows, no timeout) #1879

Delayed response from particular host (only Windows, no timeout) #1879

Comments

hheimbuerger commented May 25, 2020 • edited

hodbn commented May 25, 2020

hheimbuerger commented May 25, 2020

sigmavirus24 commented May 25, 2020

hheimbuerger commented May 27, 2020

hheimbuerger commented May 27, 2020 • edited

hodbn commented Jun 5, 2020

hheimbuerger commented Jun 6, 2020 • edited

hheimbuerger commented Jun 13, 2020 • edited

hheimbuerger commented Jul 11, 2020

hheimbuerger commented May 25, 2020 •

edited

hheimbuerger commented May 27, 2020 •

edited

hheimbuerger commented Jun 6, 2020 •

edited

hheimbuerger commented Jun 13, 2020 •

edited