Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
cda.pl Broken Extractor #24458
cda.pl Broken Extractor #24458
Comments
|
Experiencing the same problem |
|
Same thing here for URL: https://www.cda.pl/video/83970839 [debug] System config: [] |
|
Same problem - win10 `youtube-dl https://www.cda.pl/video/250195141/vfilm -v ValueError: unknown url type: 'GH2Hdah%5D452%5DA%3D%5E%263b%23%29%3BE38FgE%3F6gv%3Fsu%2B3p%5E%60dgd%60fcd%60d%5EG%3D3a%606g4cb3f4_f3b_7af4hg33afccff4_' C:\Users\XxxXx>youtube-dl -U |
|
Nop, no encoded path to file, I believe - shouldn't be complicated to fix |
|
I have same problem:
|
|
I've managed to decrypt the file from player_data, see my work here: |
|
Implemented into extractor, see PR #24518 |
|
In line: |
|
@cb1986ster I already noticed their ridiculous cringy way to break the extractor again and checked the new code again. Really funny, although it took me 15 minutes to break down. But unfortunately it looks like ytdl-org collaborators are not interested into accepting nor reviewing the PR for now as they locked down the conversation for further comments regarding my work. |
|
@dstftw can we get any update on what is going on with the PR? If you do not intend on merging the fix then be at least transparent about the information why you locked down the PR. And I can see that while we're waiting here, other fixes are being merged and reviewed. |
|
CDA unfortunately keeps breaking the extractor by adding more obfuscation crap, it's now a cat and mouse game until CDA gives up and starts using DRM on non-premium videos. |
|
I think, that they can track changes in this repository. @divadsn Because comments in PR are locked:
|
|
@bato3 yes I am aware of the unsolved changes in the PR. But the goal would be to scrape the player.js on the go and extract the current "joke" list before extracting the video, kinda like the extractor for soundcloud does to get the client id. |
|
As I look quickly, this soundcloud uses an API to update the key. (Although I may not have noticed something.) Extracting it from player.js Pulling it out of player.js is like shooting a mosquito cannon. This is not some dynamically generated key. All they need to do is change the file structure a bit and everything is falling apart... We either risk a regular expression, or we need to apply an openload domain approach: update, after they change. |
|
Soundcloud extractor pulls the client id using a regular expression from any possible javascript file, which can be seen here: https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/soundcloud.py#L277 We have two options: Implement such similar regex to grab the list from player.js or have a file containing all "jokes" hosted on github pages that will be updated after they change. |
|
I think that until changes in player.js have the features of automatic actions, there is nothing to play with extracting this using regular expressions. Especially that we are not looking for a textual constant and its value, but operate on an obfluscated code. === Edit And before we start writing more advanced code, we must check that CDA is observing changes in the repo. I'm afraid that you will need to run some url decoding service. Or some small autoupdate python module. |
|
If you need any hosting for decoding service or autoupdate service let me know, i'll provide it for free. |
|
Well it ain't abandoned, I still keep a maintained list of words in my own
demo project, I plan to potentially change the PR so the list will be
pulled from a text file independently of youtube-dl releases.
niedz., 2 sie 2020, 18:27 użytkownik Cezary Drożak <notifications@github.com>
napisał:
… Since the previous PR has been abandoned, I have created a new PR for
this. Hopefully they won't change their script so fast this time.
As for observing changes in the repo, I don't think that we can do much
more than play cat and mouse game.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#24458 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGZZWBZDAXQRNX5NFUOXKI3R6WHXVANCNFSM4LST4GBQ>
.
|
|
It's on my GitHub, so it's free for everyone to use.
pon., 3 sie 2020, 14:34 użytkownik Cezary Drożak <notifications@github.com>
napisał:
… GitHub shows origin branch as unknown repository in the pull request, so
I thought that the branch has been removed.
I think that it would be better to merge the PR, either yours or mine, for
an ad-hoc solution.
If you maintain a project with a list of words, then I don't understand
why you won't share it, so someone else could open a PR adding support for
pulling word list from your repository.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#24458 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGZZWB4B4JA5PYNDPOMQFOLR62VEJANCNFSM4LST4GBQ>
.
|
|
Fixed on my github: https://github.com/dkja/youtube-dl ;) I'm not python programmer, but... |
Checklist
Verbose log
Description
My guess is that they have encrypted displayed urls to base64 but i can be wrong.