New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reducing performance impact of _socket.getaddrinfo #4070
Comments
Modified yt-dlp:
I have attached the output as well. See in particular column 2, which has raw time taken excluding any subcalls to other functions. |
How many times did you do this? I can't reproduce even after 1M calls Testing code: import socket, timeit
N = 1000; LIM = 1_000_000
for i in range(LIM//N):
print(f'{(i + 1) * N:>10d}', timeit.timeit(
'socket.getaddrinfo("youtube.com", 443)',
globals=globals(), number=N)) |
At the time, after about every five calls, the sixth call takes 5 seconds to run. Here are a few timings, in seconds: |
I have found that it is probably my ISP that is throttling these requests, as this happens to all hosts... Should I close the issue? |
Windows: >>> import socket, timeit
>>> N = 1; LIM = 20
>>>
>>> for i in range(LIM//N):
... print(f'{(i + 1) * N:>10d}', timeit.timeit(
... 'socket.getaddrinfo("youtube.com", 443)',
... globals=globals(), number=N))
...
1 0.030426299897953868
2 0.0006706998683512211
3 0.0003217000048607588
4 0.00029360014013946056
5 0.0002962998114526272
6 0.00029530003666877747
7 0.0002888999879360199
8 0.0002865998540073633
9 0.000282099936157465
10 0.0003408000338822603
11 0.00032909982837736607
12 0.0002951999194920063
13 0.0003035000991076231
14 0.0002933000214397907
15 0.00038430001586675644
16 0.0003557000309228897
17 0.0004267999902367592
18 0.001039199996739626
19 0.0005477000959217548
20 0.0003512999974191189 Linux: >>> import socket, timeit
>>> N = 1; LIM = 20
>>>
>>> for i in range(LIM//N):
... print(f'{(i + 1) * N:>10d}', timeit.timeit(
... 'socket.getaddrinfo("youtube.com", 443)',
... globals=globals(), number=N))
...
1 0.019489400001475587
2 0.0036687999963760376
3 0.0029771999979857355
4 0.003344099997775629
5 0.002450100000714883
6 0.016608899997663684
7 0.0029057999927317724
8 0.004744399993796833
9 0.003114899998763576
10 0.0027027999894926324
11 0.004003299996838905
12 0.014604500000132248
13 0.004866500006755814
14 0.003497900004731491
15 0.0024579000019002706
16 0.0023951000039232895
17 0.0036401999968802556
18 0.0034352999937254936
19 0.0025858000008156523
20 0.002880100000766106 On linux, I too get a slight bump in the timing for some of the requests. I suspect that both OSes are already caching it, but linux invalidates the cache it after every few requests. But still, I don't get the 100x slowdown like you do. My hypothesis is that either your network or OS has some issue that is slowing it down. Can you test it on another device/network? |
In that case, there is really nothing we can do. In theory, we could cache it like you suggest, but you can't expect every program you use to implement DNS caching separately. You should instead try to figure out what exactly is causing the slowdown |
Found it, it was my custom public DNS resolver. Thank you! |
Possibly related: ytdl-org/youtube-dl#13734 yt-dl/p opens a new connection for each request which probably explains why there are so many DNS requests. #3668 should improve this by introducing persistent connections. |
Adds support for HTTPS proxies and persistent connections (keep-alive) Closes #1890 Resolves #4070 Resolves ytdl-org/youtube-dl#32549 Resolves ytdl-org/youtube-dl#14523 Resolves ytdl-org/youtube-dl#13734 Authored by: coletdjnz, Grub4K, bashonly
Adds support for HTTPS proxies and persistent connections (keep-alive) Closes yt-dlp#1890 Resolves yt-dlp#4070 Resolves ytdl-org/youtube-dl#32549 Resolves ytdl-org/youtube-dl#14523 Resolves ytdl-org/youtube-dl#13734 Authored by: coletdjnz, Grub4K, bashonly
Checklist
Description
I was downloading a video off YouTube with cProfile turned on (profile attached) and I have found that half the time for downloading was being spent in _socket.getaddrinfo, which was being called 25 times. I have tested running socket.getaddrinfo("youtube.com", 443) repeatedly and it does get rate limited after a while. Would there be a practical way to, for example, cache these outputs?
For context, I have attached the modified /bin/yt-dlp script and the full cProfile output. The key is the second column, tottime, which measures, in seconds, the amount of time spent in a specific function, without any subcalls to other functions affecting the results.
Verbose log
The text was updated successfully, but these errors were encountered: