Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YouTube] Unthrottle downloads by responding to the "n" parameter challenge #30184

Closed
wants to merge 20 commits into from

Conversation

dirkf
Copy link
Contributor

@dirkf dirkf commented Nov 1, 2021

Please follow the guide below


Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

Since summer 2021, YouTube has started serving media URLs with a query parameter such as ...&n=SXiXBH-xzrjeioPN&.... This now appears to be the default behaviour. Unless the value of this parameter is transformed according to an algorithm delivered in the site player JS, the download speed for the URL is throttled to ~50kB/s.

Solutions for this include:

  • implementing a more complete JS interpreter, eg based on PR [jsinterp] Actual JS interpreter #11272;
  • using an external JS interpreter (PhantomJS, now unmaintained, is used by some extractors);
  • spoofing the Android or iOS client to acquire unthrottled links, as successfully implemented in yt-dlp.

This PR now uses the first approach taking the enhanced JSInterpreter module from yt-dlp PR #1437, as back-ported in PR #30188.

The PR originally took a different approach derived from the successful solution used in VLC's youtube.lua (also implemented differently in pytube, relying on the fact that the challenge algorithm is served in a mini-language within the minified player JS and therefore the specific algorithm could be extracted and executed by interpreting the mini-language without actually running the JS itself. This version had the benefit of offering a single file update, but was unstable against regular site changes and raised licensing issues as discussed below.

The PR also includes fixes for test/test_youtube_lists.py. This download test failed in Python 2 and contained tests that failed or were redundant because of obsolete assumptions about the youtube.py extractor or YouTube services.

Anyone who wants a single file update to fix the 2021.06.06 (or 2012.12.17) release pending merge/release of the PR should use this version of extractor/youtube.py which is a drop-in replacement. This also includes the URL signature fix from #30366 and 2021.12.17.

Resolves #29326 (original analysis of the "n" parameter challenge)
Resolves #29790
Resolves #30004
Resolves #30024
Resolves #30052
Resolves #30088
Resolves #30097 (original reference to n_descramble())
Resolves #30102
Resolves #30109
Resolves #30119
Resolves #30125
Resolves #30128
Resolves #30162
Resolves #30173
Resolves #30186
Resolves #30192
Resolves #30221
Resolves #30239
Resolves #30539
Resolves #30552.

@dirkf dirkf force-pushed the df-youtube-unthrottle-patch branch from b17a74f to 20fc434 Compare November 1, 2021 06:38
@gaming-hacker
Copy link

Thanks I cherry picked this into my fork and it works for single video downloads but if I download a playlist, I get ssl errors

example downloading this playlist from youtube

PLUJAYadtuizA5JblMkTYUuOhUqiQco0pu

[download]  33.3% of ~12.54MiB at  5.91MiB/s ETA 00:04ERROR: unable to download video data: [SSL: DECRYPTION_FAILED_OR_BAD_RECORD_MAC] decryption failed or bad record mac (_ssl.c:2633)

@dirkf
Copy link
Contributor Author

dirkf commented Nov 1, 2021

I doubt that this is related. This issue has been seen before and it has been put down to server-side weirdness. You could iterate the playlist download, perhaps using --download-archive, if this is a regular problem.

Just now I tried for the mentioned playlist and got items 1-4 of 17 with no error before I gave up.

@draekko
Copy link

draekko commented Nov 2, 2021

Thanks, this totally fixed my youtube download issues here. Hopefully this makes it into the official code and the next release.

@linkfanel
Copy link

I am the original author of this code and I am willing to release it under Unlicense

Not sure that you can assert that: this is arguably a derivative work of VLC's youtube.lua GPL code. This is great for providing a drop-in replacement for people who want to fix their installs! But I think that as is, it would remain problematic to merge into and redistribute as part of an Unlicense project.

@linkfanel
Copy link

Thanks, this totally fixed my youtube download issues here. Hopefully this makes it into the official code and the next release.

In my opinion, #30188 is a better approach: youtube-dl already has a javascript interpreter framework, which is a technically superior solution, and from a project direction point of view, it doesn't make much sense to start doing ad hoc emulation in individual extractors instead. I appreciate the work, though :)

@dirkf
Copy link
Contributor Author

dirkf commented Nov 3, 2021

Yes, in the long term, should merging ever happen, it would be better to use the solution based on JSInterp, but that needs further work that is not needed for this PR.

Not sure that you can assert that: this is arguably a derivative work of VLC's youtube.lua GPL code.

In case that should not come to pass, I agree that while I assert authorship in the PR boilerplate the implementation of n_descramble() is deliberately a derivative work, since a re-implementation would have been more time-consuming and the result less useful. However the extent to which a manually transformed version of a source code is covered by the copyright license of the original work appears to be controversial. To avoid such discussions and as a specific exception based on your original comment, I would hope that it could be agreed to release this single component derived from the VLC youtube.lua work in parallel under Unlicense, with attribution as at present in the PR code, or not, if preferred.

While there is no copyright in ideas, and I don't know whether it was used in the n_descramble() implementation, a study of the scrambling code was outlined here that could certainly have been helpful.

@pukkandan
Copy link
Contributor

pukkandan commented Nov 3, 2021

This exact same thing has already been implemented by pytube (Unlicense) pytube/pytube@79befd6 long before the VLC implementation. So I am not sure how much this license argument holds

@pukkandan

This comment was marked as off-topic.

@dirkf
Copy link
Contributor Author

dirkf commented Nov 3, 2021

Now, now.

Generally I would prefer open source projects to use copyleft. The permissive licences offer too much leeway to commercial interests. Further, yt-dl's Unlicense is something of a chocolate teapot among licences: it would be no problem if (say) VLC wished to adopt (say, again) the recently enhanced JSInterp in a reverse derivation, but then it is unclear whether yt-dl could make use of enhancements made to the adopted version without seeking permission each time.

In this case we are talking about a small script component adapted into the PR that was, as I understood, proposed to yt-dl for the use that I made of it. Since the nature of yt-dl (and other interpreted language projects) is that the source code is delivered with the product, as well as being available from the project repository, I find it hard to see why the author of the component would not be happy to see his work used in this way. Certainly it is possible that this Python version of n_descramble() might be distributed contrary to the terms or even the spirit of VLC's GPL, but no-one is heading off to create a proprietary closed-source version of VLC (who knows which "Smart" TV manufacturers or mobile app devs have already done that?). All in all, it's a whole different case from, say, someone making a new version of one of @89z's OSL3 Go projects without making the source available, etc.

As the PR text says and as further explained in my previous comment, the implementation was deliberately arranged to stay close to the original for ease of implementation (oh, and a Lua refresh since my last encounter with it in Wireshark 10 years ago) and for mutual benefit in case either version should be enhanced or fixed. Otherwise, to the extent that the program logic is determined by the YT challenge mini-language (as evidenced by the pytube implementation) copyright protection would not apply despite it being a derived work. Deliberate reinvention of the wheel to comply with, or work around, the letter of a software licence benefits no-one and makes open source look bad.

That said, I am no lawyer and I assume neither are you - I have no idea how either side of these argument would hold up in an actual court. So it would be best if @linkfanel is willing to dual licence his implementation under Unlicense

Which would be fine, or just, as was my initial understanding of @linkfanel's original intervention, to confirm that a Python version of n_descramble() can be distributed under Unlicense, though I apologise if I read too much into it. Obviously we have no interest in distributing the Lua code, unless requested to do so.

@0xallie
Copy link
Contributor

0xallie commented Nov 3, 2021

That's a rationalization. You are saying "well, I only broke copyright on a small amount of code, so its OK".

No, that's called fair use under copyright law.

@dirkf
Copy link
Contributor Author

dirkf commented Nov 3, 2021

Indeed, I recommend study of the discussion of derived works that I linked in my first comment, in particular:

In no case does copyright protection for an original work of authorship extend to any idea, procedure, process, system, method of operation, concept, principle, or discovery, regardless of the form in which it is described, explained, illustrated, or embodied in such work.

However, in this case English or French copyright law would probably apply.

But the extent to which a routine, whose logic, constrained to meet a pre-existing 'method of operation', has been translated into a different language and run-time API, might be considered to infringe the copyright of the original is not, should not and need not be the issue.

Although we appear to like the same sort of licences, @89z has completely failed to grasp the situation, which is that the PR was prepared on the understanding that the VLC n_descramble() function had been offered as a model for yt-dl to use. While youtube.lua is offered for distribution under VLC's GPL licence, that is not exclusive but at the author's choice.

There is no point trying to infer what my intentions are regarding this since I have none until I see @linkfanel's response to this discussion.

@gaming-hacker
Copy link

I doubt that this is related. This issue has been seen before and it has been put down to server-side weirdness. You could iterate the playlist download, perhaps using --download-archive, if this is a regular problem.

Just now I tried for the mentioned playlist and got items 1-4 of 17 with no error before I gave up.

Hmm, interesting I compile youtube-dl with python 3.9+ and had to set

export PYTHONHTTPSVERIFY=0

and seems to work.

@linkfanel
Copy link

Geez, people, please calm down: you're arguing the mergeability of a merge request to an unmaintained project that is probably not going get merged anyway.

Copyright isn't rocket science, nor the exclusive prerogative of lawyers (don't play into the game of those who'd like to make you believe so because it's their vested interest). Very little of this is unclear or subject to interpretation. It's true that ideas or algorithms aren't copyrightable: copyright is about protecting the particular creative expression of such ideas. When you read this merge request, you see and recognize VLC's youtube.lua's code line for line, just translated into Python: it is a derivative work of that copyrighted creative expression, of the particular way that code was made and written. When you read pytube's cipher.py, you don't see or recognize youtube.lua's code: the code looks completely different, it's written and organized differently, it even makes different algorithmic choices; and in all likelihood it wasn't derivated from the way youtube.lua's code was written (especially if written long before it), and thus is not a derivative work.

These differences also illustrate just how the writing of this kind of code is not constrained to the given descrambling algorithm, and where the part of coprightable creative expression lies. Now it's true that these two pieces of code are similar insofar as they do the same job, but as we said neither that nor the fact that they would be derived from the same analysis makes this fall under the purview of copyright protection; you'd rather be looking at something like patents in this case.

(And the mere validity of software patents is a controversial topic, it's simply rejected under some jurisdiction. And then anyway, they have a different nature: copyright is an intellectual property right, so a more natural and inalienable kind of right, and its protection is automatic; whereas patents are a willful and deliberate contract registered with the state, that you may or may not want to enter into, for you get granted protection and remuneration for your work only in exchange of publishing and making it available to the public for the benefit and advancement of society. IIRC Coca-Cola's coke recipe is not patented: it's kept a trade secret instead.)

But back to copyright, neither youtube.lua, this merge request or pytube would be a derivative work of Google's javascript. First, copyright laws usually include provisions asserting rights to reverse-engineering and interoperability. Then, YouTube's piece of javascript contains little creative expression: it's minified output, of presumably randomly-generated code, and whatever creative choice it made is to feature redundant and obfuscated code for simple array transformations. youtube.lua doesn't follow those choices or derive itself from that expression, since on the contrary it identifies and implements the transformations in their simplest. Last but not least, YouTube's javascript and third-party code emulating it don't even do the same job, they don't work on the same level, which is one of the criteria for the fair use doctrine (just to mention one relevant factor under one jurisdiction).

I guess you could assert fair use of youtube.lua for this merge request as well... In my opinion it doesn't apply. And that's not really one of the two options in the mandatory merge request checklist - although technically I think you could claim fair use and then license it as your own work under Unlicense. But the fact that this is an item on a mandatory checklist would indicate that this project considers it not something not-so-significant, probably okay, that can be brushed off.

Generally I would prefer open source projects to use copyleft. The permissive licences offer too much leeway to commercial interests. Further, yt-dl's Unlicense is something of a chocolate teapot among licences: it would be no problem if (say) VLC wished to adopt (say, again) the recently enhanced JSInterp in a reverse derivation, but then it is unclear whether yt-dl could make use of enhancements made to the adopted version without seeking permission each time.

It's not unclear: they could not. This is a well-known issue with code reuse between software projects with differing licenses.

In my original intervention I meant to put forward an alternative design, and gave the example of VLC's youtube.lua to illustrate and show it was a proven concept in a notable software application. I did not mean to encourage copying that very code, or reusing it without proper regard for licensing matters. There is also the perhaps unlikely concern that I haven't been the only contributor to youtube.lua, and perhaps my new code could be a derivative work of previous other people's and/or collective GPL code, and then I can't unilaterally relicense it.

I actually do believe in the principles of the GPL and copyleft, and I'm not keen to relicense. But I think this is all moot anyway. If and when youtube-dl gets maintained again, and they don't fancy writing their own fix themselves and are happy to merge an existing merge request, and for some reason they choose not to merge or build on the javascript interpreter improvements, and in the meantime there has been no other merge request with an actually compatible license such as one based on pytube's code, or another original implementation of the concept that linking youtube.lua was meant to prove, and my code is still needed - then we can cross that bridge when we get to it.

@0xallie
Copy link
Contributor

0xallie commented Nov 3, 2021

It's not unclear: they could not. This is a well-known issue with code reuse between software projects with differing licenses.

This is not entirely true. AFAIK, youtube-dl could say "the code is dual-licensed under Unlicense and GPLv3, but the nsig decryption code is only available under GPLv3". But of course that would go against the spirit of the project and could cause headaches.

@rautamiekka
Copy link
Contributor

Geez, people, please calm down: you're arguing the mergeability of a merge request to an unmaintained project that is probably not going get merged anyway.

[...]

I think that just proved it ain't so simple.

@dirkf
Copy link
Contributor Author

dirkf commented Nov 5, 2021

Regarding @linkfanel's response, I'm grateful for the clarification, though @rautamiekka does make a good point about it. No doubt an acceptable replacement for the problem code will be offered for merge if @linkfanel doesn't wish to bless the existing version. Even if I might be more a mpv user myself, it has always been a great boon to be able to upgrade a Mac's QuickTime player to VLC and I'm sure we all fully respect the project's contribution to media and software freedom.

It was never intended to copy the work youtube.lua, only a specific functional component, that would distributed on terms that informally meet the goals of copyleft, since the source of any modified version would generally be distributed with the whole product (though it could be possible to make a byte-code distribution that would bypass that). Had I expected that this would be a problem, I would obviously made a different implementation of the function (even if the translated code might have been used as a starting point) that did not deliberately reflect the style of the original. In the end we are just extracting pieces of player JS and interpreting them using the challenge mini-language according to the analysis provided here. I reiterate my comment about the bad look that deliberate reinvention of the wheel to comply with, or work around, the letter of an open source software licence gives, but it is the copyright holder's choice. Equally, the Unlicense was not my choice and the PR template probably ought to ask for a more nuanced affirmation than "I am the original author of this code and I am willing to release it under Unlicense".

@nyuszika7h said:

This is not entirely true. AFAIK, youtube-dl could say "the code is dual-licensed under Unlicense and GPLv3, but the nsig decryption code is only available under GPLv3". But of course that would go against the spirit of the project and could cause headaches.

Indeed, GPL has a concept of a "compatible" licence: "subordinate" would probably be a better term. This is a licence that allows its protected work to be distributed under GPL, and the Unlicense is an example. This so-called compatibility is very definitely not a reflexive relation, with GPL licenses strictly dominating all others.

I would probably consider doing that if yt-dl were my project and for some reason I couldn't arrange for the JSInterp solution, but that would have to be some big reason. It might also be possible to distribute the n_descramble() code alone under GPL. Then the level of headache is section 2(b) of GPLv2 which is amplified thus:

These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it.

This section of GPLv2 addresses the case of a work that is substantially derived from a protected Program, with some additional software provided by the licensee and requires the whole thing to be distributed under GPLv2, but makes no exception allowing the reuse of a small part of a protected Program in the context of a much larger work not distributed under GPLv2, even if that should happen to be free software and delivered as source code. The Program's owner must give permission (section 10), or the reuse must count as fair use in the applicable jurisdiction (England and Wales? France? the Earth?).

Excepting fair use, the n_descramble() code would have, at a minimum, to be its own module, separately distributed and installed, and the yt-dl side would have to test for the module and punt if it was not found.

To understand "at a minimum", one has to rely on the guidance around GPLv2, which like the licence itself focuses on traditional compile-link development, a bit surprising considering how important elisp was/is to the originators.

A Python module comprising a single exported function might fall into the case where a "main program dynamically links plug-ins, but the communication between them is limited to invoking the ‘main’ function of the plug-in with some options and waiting for it to return", said to be a "borderline case" for applying GPLv2 to distribution of the main program as well; or it might match the case of a Perl module (after all, Python is the read-write equivalent of Perl ...), where "you must release the [calling] program in a GPL-compatible way". None of this is actually stated in the licence text itself ... hello, counsel.

@ShadowJonathan

This comment was marked as off-topic.

@0xallie
Copy link
Contributor

0xallie commented Nov 5, 2021

This is why MPL is better than GPL, it would only ever apply to the single file that contains the nsig code and not infect others. So even commercial reuses of youtube-dl would only have to distribute any modifications they've made to nsig.py or whatever it's called. Though I suppose dual-licensing the rest of the code kind of achieves the same.

@pukkandan
Copy link
Contributor

@ilovecomputers
Copy link

I just wanted to quickly download one video, but I didn't want to mess with my local environment, so I used pipx to temporarily run this PR: pipx run --spec git+https://github.com/ytdl-org/youtube-dl@refs/pull/30184/merge youtube-dl "URL".

Got around the throttle 😁

@HerrCraziDev
Copy link

Any progress on this PR? It would be great to have unthrottled downloads in stable youtube-dl, eh.

@dirkf
Copy link
Contributor Author

dirkf commented Jan 18, 2022

Don't hold your breath, but there is reason to expect a new release sooner rather than never, and this will be at the top of the list, I guess.

@dirkf
Copy link
Contributor Author

dirkf commented Jan 30, 2022

Closed with merge of 57044ea..af9e725.

@etale-cohomology
Copy link

etale-cohomology commented Jun 9, 2022

So assuming you can compute the n value, what do you do with it then? Do you replace the n parameter of the video URL and then call that URL?

Eg. here's one such video (or audio) URL together with its n argument, which you get by sending a GET request to the youtube.com/watch?v={videoId}:

https://rr2---sn-uqx2-aphk.googlevideo.com/videoplayback?expire=1654808853&ei=tQyiYr6uEc3mwASSy4ioAQ&ip=2001%3A1388%3A80d%3A9aeb%3Adacb%3A8aff%3Afe57%3A16d2&id=o-AJKmS9rCP7FbTDAte1Kh1Zqi8MMOEcGbPLHSSGI-eiO3&itag=249&source=youtube&requiressl=yes&mh=Zk&mm=31%2C26&mn=sn-uqx2-aphk%2Csn-bg0eznek&ms=au%2Conr&mv=m&mvi=2&pl=49&initcwndbps=1011250&spc=4ocVC_oxfF0aw7kBJLmsIzyX-NxGxJY&vprv=1&mime=audio%2Fwebm&ns=mZ7SFi2MAZ5oNr82ldqZI3AG&gir=yes&clen=9646870&dur=1529.081&lmt=1617191571853584&mt=1654786866&fvip=2&keepalive=yes&fexp=24001373%2C24007246&c=WEB&txp=5411222&n=XqgVawS3rFACEwYS9&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cspc%2Cvprv%2Cmime%2Cns%2Cgir%2Cclen%2Cdur%2Clmt&sig=AOq0QJ8wRAIgbxE0G4z9Pbf8Mg3P74YtzdDNiLGf2tquuBFi--Za3zgCIGlyTBUa26xIoZTwExzIawUo_p2AoRtAqZq795VwiCBf&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=AG3C_xAwRgIhAPUNhr_33-0mD4YLlLvW-_pgbpBxKg4XRUrPrN506_rDAiEAmrpgKRZONvD_7iOBDUy8qVG5ESGqQRC7i_mIQEeecbg%3D

@dirkf
Copy link
Contributor Author

dirkf commented Jun 9, 2022

We can, and we do. Read the code of the PR.

Also, will algebraic cohomology be able to supply enough sheaves to relieve the wheat shortage, if topological cohomology can't do the job?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment