Multi-threading slowdown for YouTube #30641

noembryo · 2022-02-15T19:35:08Z

Checklist

I'm reporting a broken site support
I've verified that I'm running youtube-dl version 2021.12.17
I've checked that all provided URLs are alive and playable in a browser
I've checked that all URLs and arguments with special characters are properly quoted or escaped
I've searched the bugtracker for similar issues including closed ones

Verbose log

PASTE VERBOSE LOG HERE

Description

Using youtube_dl with multiple threads to get information about multiple videos, is a lot slower after the last breakage.
Using the code below gives me ~70 sec for 42 videos.
With the yt-dlp the time it takes for the same videos is ~30 sec.
Before the last code change (I'm using the current git code), youtube_dl was faster than yt-dlp.
Changing the THREAD_NR didn't change the difference..

The processing is also much higher than before.
Because I use a similar strategy in an app I have, in my system (with an older i7) it got noticeable worst than before.
Trying it with an older laptop, made the app totally unusable..

The ids are some random links, you can use whatever you like.

# coding=utf-8
from __future__ import absolute_import, division, print_function, unicode_literals
from queue import Queue
from threading import Thread, Event
import youtube_dl
# import yt_dlp as youtube_dl

ydl_opts = {"quiet": True, "no_warnings": True}
link_ids = ["4jduuQh-Uho", "9GNpv7QDvMY", "MbEOR2Flc-4", "ZKUzNF21n9w", "y-JqH1M4Ya8",
            "pUqfaiUb3l4", "bL5eqSOXMtE", "HyMm4rJemtI", "BU4kGkrrJEw", "wA1v207xlOw",
            "pFS4zYWxzNA", "aF6hDcAbSoE", "G1ckKDRc69w", "o9_jzBtdMZ4", "AGoQZx8Mn0g",
            "6W-pHCD6Tow", "kszLwBaC4Sw", "mwTd_PzGY-c", "iqLTYD_nhsU", "X335gdcPE7A",
            "z_54vDk8lWw", "8a82arE0JSQ", "tJmzQHWl9kc", "8jPQjjsBbIc", "ENJUB5thpB4",
            "dEhUMvjFuQY", "D6XyJh1tsGI", "tFCfb-Qqdz0", "UkafA6r1caQ", "OO8HtAXnRqQ",
            "--da0m2K4I4", "EOlI0UtLDk4", "r7tQbxTImKw", "s_YLPcW4Tu8", "9wIbhES2UkA",
            "YkX9X4td7j8", "14cHz4ebonY", "saVUUZE50Co", "N1K4NYHqMx4", "iCBL33NKvPA",
            "QPTNS3llm2E", "pFS4zYWxzNA"]
THREAD_NR = 8
infos = []


class Base(object):
    def __init__(self, **kwargs):
        super(Base, self).__init__(**kwargs)
        self.feed_q = Queue()
        self.threads = []
        with youtube_dl.YoutubeDL({}) as ydl:
            ydl.cache.remove()
        for i in range(THREAD_NR):
            thread = Worker(self.feed_q)
            thread.daemon = True
            self.threads.append(thread)
            thread.start()


class Node(object):
    pass


class Worker(Thread):
    def __init__(self, feed_q):
        super(Worker, self).__init__()
        self.node = None
        self.feed_q = feed_q
        self.stop = False

    def run(self):
        while not self.stop:
            self.node = self.feed_q.get()
            url = "https://www.youtube.com/watch?v=" + self.node.id
            with youtube_dl.YoutubeDL(ydl_opts) as ydl:
                try:
                    ydl_info = ydl.extract_info(url, download=False)
                except Exception as e:
                    error_text = "A {0} occurred for {2}. Arguments:\n{1!r}"
                    print(error_text.format(type(e).__name__, e.args, url))
                    self.feed_q.task_done()
                    continue
                infos.append(ydl_info)
                print("Got info from {}".format(url))
                self.node.updated.set()
            while True:
                if self.node.updated.wait(timeout=2):
                    break
            self.feed_q.task_done()


if __name__ == "__main__":
    base = Base()
    print("Getting info for {} YouTube videos".format(len(link_ids)))
    for tube_id in link_ids:
        node = Node()
        node.id = tube_id
        node.updated = Event()
        base.feed_q.put(node)

    from timeit import default_timer as timer
    start = timer()
    base.feed_q.join()
    print("Finished in {} seconds".format(round(timer() - start)))

The text was updated successfully, but these errors were encountered:

dirkf · 2022-02-15T21:12:59Z

So, on a dump T7700 laptop:

2.7: 146s (from Queue import Queue)
3.5: 110s
3.9: 70s

Might the difference between yt-dlp and yt-dl be related to the Python version?

Also, https://lwn.net/Articles/872869/.

noembryo · 2022-02-15T21:17:00Z

Might the difference between yt-dlp and yt-dl be related to the Python version?

No, I don't think so.
I tested both in 3.7.
Also the yt-dl has almost the same performance in 2.7 and in 3.7..

noembryo · 2022-02-15T23:50:47Z

Also, https://lwn.net/Articles/872869/.

The code I use is not optimized of course, but the memory usage (for the non released objects) was not of my concern.
The think we should remember is that, it used to work much better before the current change, and yt-dlp still does.
With the same example code..

dirkf · 2022-02-16T17:23:07Z

The point of the linked article was the the opportunities for multi-threading speed-up in current Python emvironmments are limited. Although that doesn't directly address why two versions of extractor/youtube.py might perform differently, one might conclude that fewer Python operations would reduce the chance of threads blocking on the interpreter lock.

The main change in extractor/youtube.py in the master branch is responding to the n parameter challenge using the "toy" JS interpreter as enhanced for yt-dlp by @pukkandan. Obviously not doing this would make normal extraction for YT much slower since the faster download massively outweighs the time taken to calculate the n response. In OP's application, this time may be wasted if the media URLs are not being used (or if the response expires before then -- does it expire?).

There are easy optimisation opportunities in yt-dl's jsinterp.py. I set up a profiled version of OP's test program with one thread and 3 videos. By pulling out some constant regex expressions and changing a loop over assignment operators into a regex alternative match, the execution time (Py 2.7) went from 22s to 18s. The number of function calls was down by almost 1M, nearly 25%. For a single thread with 42 videos (Py 3.9) the time for yt-dl (60s) was almost the same as for yt-dlp (65s), though a bit less. The regex optimisations apply to expressions that used f-strings in the yt-dlp version, which are said to be faster than formatting operations available to yt-dl; however, the loop optimisation, and one already made in the back-port could equally apply to the yt-dlp version.

With the optimised code:

Threads Time-2.7    Time-3.9
   1      87        60
   2      70        42
   4      90        48
   8      100       54

It's a slowish unparallel machine so the speed goes down for more threads.

Here's the patch:

--- old/youtube_dl/jsinterp.py
+++ new/youtube_dl/jsinterp.py
@@ -33,6 +33,7 @@ _OPERATORS = [
 ]
 _ASSIGN_OPERATORS = [(op + '=', opfunc) for op, opfunc in _OPERATORS]
 _ASSIGN_OPERATORS.append(('=', (lambda cur, right: right)))
+_ASSIGN_OPERATORS = dict(_ASSIGN_OPERATORS)
 
 _NAME_RE = r'[a-zA-Z_$][a-zA-Z_$0-9]*'
 
@@ -84,6 +85,28 @@ class LocalNameSpace(MutableMapping):
 
 
 class JSInterpreter(object):
+    _EXPR_SPLIT_RE = (
+        r'''(?x)
+            (?P<pre_sign>\+\+|--)(?P<var1>%(_NAME_RE)s)|
+            (?P<var2>%(_NAME_RE)s)(?P<post_sign>\+\+|--)'''
+            % {'_NAME_RE': _NAME_RE, })
+    _VARNAME_RE = r'(?!if|return|true|false|null)(?P<name>%s)$' % _NAME_RE
+    _ARRAY_REF_RE = r'(?P<in>%s)\[(?P<idx>.+)\]$' % _NAME_RE
+    _FN_CALL_RE = r'^(?P<func>%s)\((?P<args>[a-zA-Z0-9_$,]*)\)$' % _NAME_RE
+    _MEMBER_REF_RE = (
+        r'(?P<var>%s)(?:\.(?P<member>[^(]+)|\[(?P<member2>[^]]+)\])\s*'
+        % _NAME_RE)
+    _FN_NAME_RE = r'''(?:[a-zA-Z$0-9]+|"[a-zA-Z$0-9]+"|'[a-zA-Z$0-9]+')'''
+    _FN_DEF_RE = (
+        r'(?P<key>%s)\s*:\s*function\s*\((?P<args>[a-z,]+)\){(?P<code>[^}]+)}'
+        % _FN_NAME_RE)
+    _ASSIGN_EXPR_RE = (
+        r'''(?x)
+            (?P<out>%s)(?:\[(?P<index>[^\]]+?)\])?
+            \s*(?P<op>%s)
+            (?P<expr>.*)$'''
+            % (_NAME_RE, '|'.join(re.escape(op) for op in _ASSIGN_OPERATORS.keys())))
+
     def __init__(self, code, objects=None):
         if objects is None:
             objects = {}
@@ -269,9 +292,7 @@ class JSInterpreter(object):
         for sub_expr in sub_expressions:
             self.interpret_expression(sub_expr, local_vars, allow_recursion)
 
-        for m in re.finditer(r'''(?x)
-                (?P<pre_sign>\+\+|--)(?P<var1>%(_NAME_RE)s)|
-                (?P<var2>%(_NAME_RE)s)(?P<post_sign>\+\+|--)''' % globals(), expr):
+        for m in re.finditer(self._EXPR_SPLIT_RE, expr):
             var = m.group('var1') or m.group('var2')
             start, end = m.span()
             sign = m.group('pre_sign') or m.group('post_sign')
@@ -281,13 +302,10 @@ class JSInterpreter(object):
                 ret = local_vars[var]
             expr = expr[:start] + json.dumps(ret) + expr[end:]
 
-        for op, opfunc in _ASSIGN_OPERATORS:
-            m = re.match(r'''(?x)
-                (?P<out>%s)(?:\[(?P<index>[^\]]+?)\])?
-                \s*%s
-                (?P<expr>.*)$''' % (_NAME_RE, re.escape(op)), expr)
-            if not m:
-                continue
+        m = re.match(self._ASSIGN_EXPR_RE, expr)
+        if m:
+            op = m.group('op')
+            opfunc = _ASSIGN_OPERATORS[op]
             right_val = self.interpret_expression(m.group('expr'), local_vars, allow_recursion)
 
             if m.groupdict().get('index'):
@@ -313,9 +331,7 @@ class JSInterpreter(object):
         elif expr == 'continue':
             raise JS_Continue()
 
-        var_m = re.match(
-            r'(?!if|return|true|false|null)(?P<name>%s)$' % _NAME_RE,
-            expr)
+        var_m = re.match(self._VARNAME_RE, expr)
         if var_m:
             return local_vars[var_m.group('name')]
 
@@ -324,8 +340,7 @@ class JSInterpreter(object):
         except ValueError:
             pass
 
-        m = re.match(
-            r'(?P<in>%s)\[(?P<idx>.+)\]$' % _NAME_RE, expr)
+        m = re.match(self._ARRAY_REF_RE, expr)
         if m:
             val = local_vars[m.group('in')]
             idx = self.interpret_expression(m.group('idx'), local_vars, allow_recursion)
@@ -350,9 +365,7 @@ class JSInterpreter(object):
                 raise_expr_error('right-side', op, expr)
             return opfunc(left_val or 0, right_val)
 
-        m = re.match(
-            r'(?P<var>%s)(?:\.(?P<member>[^(]+)|\[(?P<member2>[^]]+)\])\s*' % _NAME_RE,
-            expr)
+        m = re.match(self._MEMBER_REF_RE, expr)
         if m:
             variable = m.group('var')
             nl = Nonlocal()
@@ -469,7 +482,7 @@ class JSInterpreter(object):
             else:
                 return eval_method()
 
-        m = re.match(r'^(?P<func>%s)\((?P<args>[a-zA-Z0-9_$,]*)\)$' % _NAME_RE, expr)
+        m = re.match(self._FN_CALL_RE, expr)
         if m:
             fname = m.group('func')
             argvals = tuple([
@@ -485,22 +498,17 @@ class JSInterpreter(object):
             raise ExtractorError('Unsupported JS expression %r' % expr)
 
     def extract_object(self, objname):
-        _FUNC_NAME_RE = r'''(?:[a-zA-Z$0-9]+|"[a-zA-Z$0-9]+"|'[a-zA-Z$0-9]+')'''
         obj = {}
         obj_m = re.search(
             r'''(?x)
                 (?<!this\.)%s\s*=\s*{\s*
                     (?P<fields>(%s\s*:\s*function\s*\(.*?\)\s*{.*?}(?:,\s*)?)*)
                 }\s*;
-            ''' % (re.escape(objname), _FUNC_NAME_RE),
+            ''' % (re.escape(objname), self._FN_NAME_RE),
             self.code)
         fields = obj_m.group('fields')
         # Currently, it only supports function definitions
-        fields_m = re.finditer(
-            r'''(?x)
-                (?P<key>%s)\s*:\s*function\s*\((?P<args>[a-z,]+)\){(?P<code>[^}]+)}
-            ''' % _FUNC_NAME_RE,
-            fields)
+        fields_m = re.finditer(self._FN_DEF_RE, fields)
         for f in fields_m:
             argnames = f.group('args').split(',')
             obj[remove_quotes(f.group('key'))] = self.build_function(argnames, f.group('code'))

noembryo · 2022-02-16T17:51:50Z

In OP's application, this time may be wasted if the media URLs are not being used (or if the response expires before then -- does it expire?).

I'm not sure of the meaning here.
The media URLs are used to download the video/audio streams later.
The YouTube response usually is active for about 6 hours.
Is that what you asking?

Single threading results are OK, but I'm interested in the multi-threading usage.
The app is a player (among other things), and up until now it was very fast when it loaded a playlist with lets say 50 tracks.
Even now, it works OK with yt-dlp (there are other problems there, because of PySide2).
But with the current yt-dl code , I have to wait 2 minutes on an i7 and for ever on any lesser machine.

I don't think that its a restriction of python that suddenly is causing this.
If some thread calculates the challenge, why all the rest need to recalculate it again?

What I need is a way to know that the calculation is not done yet, so the other threads will wait until the first calculation and then use it (from somewhere that it gets stored) for themselves.
Unfortunately, the way I've done it in previous versions (using the yt-dl builtin cache mechanism) does not work anymore, and the code is really over my head, so I can't do anything else.. 😢

Is there a youtube.py extractor that is using your updated code?

pukkandan · 2022-02-16T18:10:40Z

Using the code below gives me ~70 sec for 42 videos.
With the yt-dlp the time it takes for the same videos is ~30 sec.

This is very surprising if true. yt-dlp's youtube extractor is much more complex than youtube-dl's and is known to be slower (at the benefit of more robustness/features).

Verify that you are downloading the same format with both. The default format sorting of yt-dlp is different. So this is my primary suspicion

The think we should remember is that, it used to work much better before the current change, and yt-dlp still does.

Which change do you mean? You could try and bisect the commit history to find the problematic commit

There are easy optimisation opportunities in yt-dl's jsinterp.py

I doubt the culprit is jsinterp. Last I checked, the time taken for n-sig decryption is insignificant when compared to the total run-time of the extractor. Hence why I never bothered with trying to optimize it

noembryo · 2022-02-16T18:26:16Z

This is very surprising if true. yt-dlp's youtube extractor is much more complex than youtube-dl's and is known to be slower (at the benefit of more robustness/features).

Verify that you are downloading the same format with both. The default format sorting of yt-dlp is different. So this is my primary suspicion

You can try the simple code at the OP yourself.
I use the same code for all tests.
I don't download anything, just ask for the videos' info (stream urls, title, etc.)

Which change do you mean? You could try and bisect the commit history to find the problematic commit

Last OK fix was created by @dirkf ~23 Nov 21 as a youtube.py in his PR.
I used it and it worked fine.
After that, there was another small fix for the regex it used ~15 Dec 21.
This extractor (with an added cache mechanism I added) was working up until the beginning of Feb 22, and it was faster than yt-dlp's one.
Then YouTube made another change, that was fixed here with this delayed version of the extractor..

pukkandan · 2022-02-16T18:32:17Z

I don't download anything, just ask for the videos' info (stream urls, title, etc.)

Sorry, didn't notice download=False. I can't test your code at the moment. Will do when I am able to

dirkf · 2022-02-16T19:49:40Z

@89z

Code below. I know its apples and oranges, but I would think the times would be similar.

Is the code solving the n parameter challenge? As I recall, the Android client doesn't have to do that but it also doesn't get all the formats that users expect. Also, I guess the Go code is compiled down to machine code whereas we're running Python byte code.

@noembryo, @pukkandan

The media URLs are used to download the video/audio streams later.

OK, so the unthrottling isn't wasted. And users don't get 404s or throttled bandwidth using the media links some time after unthrottling ?

If some thread calculates the challenge, why all the rest need to recalculate it again?

So far we haven't managed to work out how to share the player cache among embedded yt-dl instances, which TBH is too specialised to merit a lot of effort at the moment. My attempt (make the player cache a module-level var and protect accesses to it with a lock) crashes with 8 threads in Py 2.7 (probably doesn't like running the descrambling function created from the player JS simultaneously in multiple threads) but makes no significant difference for either 2 threads in Py 2.7 or 8 threads in Py 3.9, neither of which crash. Of course, I may have done it wrong.

I doubt the culprit is jsinterp. Last I checked, the time taken for n-sig decryption is insignificant when compared to the total run-time of the extractor. Hence why I never bothered with trying to optimize it.

Definitely true if the download is included, but the patch posted above reduces the number of function calls per extraction from more than 70k (that's with one optimisation from the back-port already) to less than 50k, so I think there's a worthwhile saving. As to whether the constant regex expressions used for parsing should be class or global vars, I have no firm idea. In profiling output, the source files with more than 1 item in the top 20 are jsinterp.py, ssl.py, re.py and sre_parse.py. interpret_statement() gets called on average 2.5 times per extraction and then is called recursively some 312 times.

Single threading results are OK, but I'm interested in the multi-threading usage.

My suggestion is that the optimised jsinterp.py, from the patch posted above, will also improve your experience in the multithreaded environment, simply by reducing the number of Python operations. You only need to apply that patch: the YT extractor will automatically use the modified code.

Also, PyPy 2.7 (pypy-7.3.6) runs the 8x42 test in 44s vs 100s for CPython.

noembryo · 2022-02-16T20:04:33Z

And users don't get 404s or throttled bandwidth using the media links some time after unthrottling ?

As, I said, not until they expire, after ~6 hours that give a 403.

So far we haven't managed to work out how to share the player cache among embedded yt-dl instances, which TBH is too specialised to merit a lot of effort at the moment.

As, I said,
"What I need is a way to know that the calculation is not done yet, so the other threads will wait until the first calculation and then use it (from somewhere that it gets stored) for themselves."
If you understand the builtin cache structure, we could use that.
Since I don't understand much of the code, can you point me to the result of the js interpreter and how the extractor use it?

You only need to apply that patch: the YT extractor will automatically use the modified code.

So, if I just copy the jsinterp.py from here, it will work?

dirkf · 2022-02-16T20:20:54Z

So, if I just copy the jsinterp.py from here, it will work?

Apply the patch to the installed master version.

If someone has a good reason why its needed to [respond to the n challenge]

From what I've seen users are very disappointed if all the formats from the web player aren't found by yt-dl. Currently, we don't use the Android client: it may soon be added for age-gate bypassing, as better than nothing. yt-dlp lets the user who cares select from a set of players IIRC.

In any case the n response overhead is typically unimportant when compared to the download time, even when unthrottled. OP's use case is somewhat specialised.

noembryo · 2022-02-16T21:00:01Z

Apply the patch to the installed master version.

Can you tell me how to do it?
Sorry for that, but I don't use git that much, so I'm ignorant of the commands.
I'll have to manually change the lines, so, maybe I'll have to wait until you merge the patch to the current repository..

dirkf · 2022-02-16T23:28:52Z

man patch, or that in a web search.

But, as I'll put this in anyway, the patched source file here.

And PR #30643.

noembryo · 2022-02-17T01:11:47Z

But, as I'll put this in anyway, the patched source file here.

Thank you, I tested it and its faster than the current.
In 2.7 it gets me ~65-70 sec instead of the ~100 that I was getting in all today's tests.
In 3.7 it gets me ~40-50 sec instead of the ~65-75 I was getting.
But yt-dlp 2022.2.4 that I downloaded today gives me ~15-17 which is still a lot less..

pukkandan · 2022-02-17T07:37:56Z

The issue is in fact that jsinterp is slow. But this does not affect yt-dlp much in normal use because of the android fallback

Tested with 10 videos on py3.10 with lazy extractors disabled

Program	Threads	Client	n-sig	time	note
yt-dlp	1	web	❌	10
yt-dlp	1	web	✔️	20
yt-dlp	4	web	❌	6
yt-dlp	4	web	✔️	16
yt-dlp	1	android	N/A	10
yt-dlp	4	android	N/A	5
yt-dlp	1	android+web (default)	❌	12
yt-dlp	1	android+web (default)	✔️	14
yt-dlp	4	android+web (default)	❌	5
yt-dlp	4	android+web (default)	✔️	5	what OP is testing
youtube-dl	1	web (only choice)	❌	9
youtube-dl	1	web (only choice)	✔️	16
youtube-dl	4	web (only choice)	❌	5
youtube-dl	4	web (only choice)	✔️	13	what OP is testing
youtube-dl	1	web (only choice)	✔️	14	with dirkf's patch
youtube-dl	4	web (only choice)	✔️	8	with dirkf's patch

dirkf · 2022-02-17T07:54:01Z

So in the default case, is yt-dlp only descrambling for formats that aren't available from the Android client?

And for the n-sig comparison you have 10 videos without a challenge, and 10 with?

pukkandan · 2022-02-17T09:36:13Z

So in the default case, is yt-dlp only descrambling for formats that aren't available from the Android client?

Yes

And for the n-sig comparison you have 10 videos without a challenge, and 10 with?

No, non-fragmented formats have the n-sig challenge for all videos. I manually disabled the descrambling code so that we can identify how much time is being spent on it.

Motivated by: ytdl-org/youtube-dl#30641 (comment) Authored by: dirkf, pukkandan

pukkandan · 2022-06-21T18:03:07Z

I did some optimizations to yt-dlp (yt-dlp/yt-dlp@230d5c8) based on dirkf's above patch and here are the new timings:

Program	Threads	Client	n-sig	time	note
yt-dlp	1	web	✔️	17
yt-dlp	4	web	✔️	13
yt-dlp	1	web		11	Download player JSON, but don't process
yt-dlp	4	web		7	Download player JSON, but don't process

(The times are scaled so that the unpatched version matches up with what I posted before)

This cuts the time taken by just jsinterp by 33%. I expect backporting this patch will give similar numbers for youtube-dl

PS: It appears that (due to GIL?) the time taken by jsinterp is independent of the number of threads, and so will disproportionately affect multithreaded use-case.

noembryo · 2022-08-21T12:30:12Z

OK, bad news.
After the recent update things got far worse!
From the ~60/40 sec it got up to 180.
It is getting unusable when it comes to playlists.. 😢

dirkf · 2022-08-21T12:49:24Z

I rather expected that it would be slower.

Feel free to run some tests as Pukkandan did and report back.

noembryo · 2022-08-21T14:58:02Z

Program	Python	Client	Threads	time
`youtube_dl`	2.7x86	web	1	6/7 sec
`youtube_dl`	3.7x86	web	1	6 sec
`youtube_dl`	3.8x64	web	1	4 sec
`yt-dlp`	3.7x86	web	1	5 sec
`yt-dlp`	3.8x64	web	1	4 sec
`yt-dlp`	3.7x86	android	1	2 sec
`yt-dlp`	3.8x64	android	1	2 sec
`youtube_dl`	2.7x86	web	4	26/28 sec
`youtube_dl`	3.7x86	web	4	19/20 sec
`youtube_dl`	3.8x64	web	4	13/14 sec
`yt-dlp`	3.7x86	web	4	17 sec
`yt-dlp`	3.8x64	web	4	11/12 sec
`yt-dlp`	3.7x86	android	4	3/4 sec
`yt-dlp`	3.8x64	android	4	2/3 sec

noembryo · 2022-08-21T15:53:23Z

Also, the throttling is back.. 😠
I couldn't find a new issue about it.
Should I open a new one?

P.S. A throttled link for test

dirkf · 2022-08-21T17:17:22Z

https://github.com/yt-dlp/yt-dlp#user-content-youtube: use --extractor-args 'youtube:player-client=web' to make yt-dlp use the same client as yt-dl.

At any rate, though yt-dlp still comes out faster for me, I can't see obvious optimisations in the JS processing. We have to evaluate, depending on the challenge, 250000 JS expressions and the main time hogs in the processing are just the routines doing that.

dirkf · 2022-08-21T17:20:47Z

... the throttling is back.

Your test URL works fine in the latest git master, Py2.7, 3.9.

noembryo · 2022-08-21T18:34:01Z

@dirkf Thanks, I updated the stats..

Your test URL works fine in the latest git master, Py2.7, 3.9.

I reinstalled yt-dl and there is no throttling now. You were right.
Unfortunately though, now is even slower than before.. 😞

From the tests I see that the yt-dlp android client is much faster.
I seem to remember a previous comment of yours, about a potential support for the android client in youtube_dl, but didn't follow the news. 😄
Was it real, or my mind is playing tricks on me?

dirkf · 2022-08-21T19:46:58Z

The TVembedded client is used in PR #31043 to do what the Android client would have done. Otherwise no. We'd have to invent some way of configuring the client like yt-dlp has and I'm not convinced that there's enough demand for this feature, especially given that yt-dlp covers most of the need.

noembryo · 2022-08-21T19:49:54Z

The TVembedded client is used in PR #31043 to do what the Android client would have done.

So, after the merge, how somebody can invoke that mode?

dirkf · 2022-08-21T20:12:01Z

Use an age-gated video! Or modify the code so that yt-dl always thinks the video is age-gated ...

noembryo · 2022-08-21T20:14:20Z

Hmm, the latter might serve as a mode changer.
Waiting for the merge..
Thanks 👍

noembryo · 2023-02-02T17:49:33Z

Bad news.
Using today's fix, the initial gathering of YouTube video's info using the OP script, fell from ~70sec to 210sec!! 😮
This is unusable.
I really must work in porting my app to Python3..😠

dirkf · 2023-02-12T12:51:16Z

PR #31043 has been merged.

noembryo · 2023-02-12T12:59:36Z

Is there an easy way to use the TVembedded mode?

dirkf · 2023-02-12T13:09:52Z

No, unless patching the YT extractor like this is easy:

-        if (is_agegated(playability_status)
-                and int_or_none(self._downloader.params.get('age_limit'), default=18) >= 18):
+        if True and ((is_agegated(playability_status)
+                and int_or_none(self._downloader.params.get('age_limit'), default=18) >= 18)):

noembryo · 2023-02-12T15:10:49Z

Well, it was easy, but the results did not change a bit (using the OP script).
Both ways I get ~200sec..

[YouTube] [core] Improve platform debug log, based on yt-dlp ytdl-org/youtube-dl@d1c6c5c Except: * 6ed34338285f722d0da312ce0af3a15a077a3e2a [jsinterp] Add short-cut evaluation for common expression * There was no performance improvement when tested with ytdl-org/youtube-dl#30641 * e8de54bce50f6f77a4d7e8e80675f7003d5bf630 [core] Handle `/../` sequences in HTTP URLs * We plan to implement this differently

noembryo mentioned this issue Feb 15, 2022

Youtube video download slow #30583

Open

6 tasks

dirkf mentioned this issue Feb 17, 2022

[JSInterpreter] Improve performance #30643

Closed

11 tasks

pukkandan added a commit to yt-dlp/yt-dlp that referenced this issue Jun 21, 2022

[jsinterp] Some optimizations and refactoring

230d5c8

Motivated by: ytdl-org/youtube-dl#30641 (comment) Authored by: dirkf, pukkandan

pukkandan mentioned this issue Aug 5, 2022

Bad performance (requires over 5 seconds for simple --dump-json) yt-dlp/yt-dlp#4558

Closed

8 tasks

noembryo mentioned this issue Feb 2, 2023

Unable to decode n-parameter: download likely to be throttled #31509

Closed

6 tasks

This comment was marked as off-topic.

Sign in to view

dirkf added the question label Jun 28, 2023

Multi-threading slowdown for YouTube #30641

Multi-threading slowdown for YouTube #30641

Comments

noembryo commented Feb 15, 2022

Checklist

Verbose log

Description

dirkf commented Feb 15, 2022

noembryo commented Feb 15, 2022 • edited Loading

noembryo commented Feb 15, 2022 • edited Loading

dirkf commented Feb 16, 2022

noembryo commented Feb 16, 2022

pukkandan commented Feb 16, 2022

noembryo commented Feb 16, 2022

pukkandan commented Feb 16, 2022

dirkf commented Feb 16, 2022 • edited Loading

noembryo commented Feb 16, 2022

dirkf commented Feb 16, 2022

noembryo commented Feb 16, 2022 • edited Loading

dirkf commented Feb 16, 2022 • edited Loading

noembryo commented Feb 17, 2022

pukkandan commented Feb 17, 2022 • edited Loading

dirkf commented Feb 17, 2022

pukkandan commented Feb 17, 2022

pukkandan commented Jun 21, 2022

noembryo commented Aug 21, 2022

dirkf commented Aug 21, 2022

noembryo commented Aug 21, 2022 • edited Loading

noembryo commented Aug 21, 2022

dirkf commented Aug 21, 2022

dirkf commented Aug 21, 2022

noembryo commented Aug 21, 2022

dirkf commented Aug 21, 2022

noembryo commented Aug 21, 2022

dirkf commented Aug 21, 2022

noembryo commented Aug 21, 2022 • edited Loading

noembryo commented Feb 2, 2023

dirkf commented Feb 12, 2023

noembryo commented Feb 12, 2023

dirkf commented Feb 12, 2023

noembryo commented Feb 12, 2023

This comment was marked as off-topic.

noembryo commented Feb 15, 2022 •

edited

Loading

noembryo commented Feb 15, 2022 •

edited

Loading

dirkf commented Feb 16, 2022 •

edited

Loading

noembryo commented Feb 16, 2022 •

edited

Loading

dirkf commented Feb 16, 2022 •

edited

Loading

pukkandan commented Feb 17, 2022 •

edited

Loading

noembryo commented Aug 21, 2022 •

edited

Loading

noembryo commented Aug 21, 2022 •

edited

Loading