Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cda.pl Broken Extractor #24458

Open
MaybeYesMaybeNot opened this issue Mar 24, 2020 · 20 comments
Open

cda.pl Broken Extractor #24458

MaybeYesMaybeNot opened this issue Mar 24, 2020 · 20 comments

Comments

@MaybeYesMaybeNot
Copy link

@MaybeYesMaybeNot MaybeYesMaybeNot commented Mar 24, 2020

Checklist

  • I'm reporting a broken site support
  • I've verified that I'm running youtube-dl version 2020.03.24
  • I've checked that all provided URLs are alive and playable in a browser
  • I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • I've searched the bugtracker for similar issues including closed ones

Verbose log

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'https://www.cda.pl/video/14954728b', u'--verbose']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2020.03.24
[debug] Python version 2.7.5 (CPython) - Linux-3.10.0-1062.12.1.el7.x86_64-x86_64-with-centos-7.7.1908-Core
[debug] exe versions: none
[debug] Proxy map: {}
[CDA] 14954728b: Downloading webpage
[CDA] 14954728b: Downloading 360p version information
[CDA] 14954728b: Downloading 720p version information
[debug] Default format spec: best/bestvideo+bestaudio
[debug] Invoking downloader on u'G86C_bg%5D452%5DA%3D%5EE8%3A%23Hp08hAy8gtczBCurF8%5E%60dgd_hffec%5ED557a6_afae4%605g572gggbc56gdd2e3%6024'
Traceback (most recent call last):
  File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/local/bin/youtube-dl/__main__.py", line 19, in <module>
  File "/usr/local/bin/youtube-dl/youtube_dl/__init__.py", line 474, in main
  File "/usr/local/bin/youtube-dl/youtube_dl/__init__.py", line 464, in _real_main
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 2019, in download
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 808, in extract_info
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 863, in process_ie_result
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 1644, in process_video_result
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 1926, in process_info
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 1865, in dl
  File "/usr/local/bin/youtube-dl/youtube_dl/downloader/common.py", line 366, in download
  File "/usr/local/bin/youtube-dl/youtube_dl/downloader/http.py", line 341, in real_download
  File "/usr/local/bin/youtube-dl/youtube_dl/downloader/http.py", line 109, in establish_connection
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 2238, in urlopen
  File "/usr/lib64/python2.7/urllib2.py", line 423, in open
    protocol = req.get_type()
  File "/usr/lib64/python2.7/urllib2.py", line 285, in get_type
    raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type: G86C_bg%5D452%5DA%3D%5EE8%3A%23Hp08hAy8gtczBCurF8%5E%60dgd_hffec%5ED557a6_afae4%605g572gggbc56gdd2e3%6024

Description

My guess is that they have encrypted displayed urls to base64 but i can be wrong.

@fraunos
Copy link

@fraunos fraunos commented Mar 24, 2020

Experiencing the same problem

@someziggyman
Copy link

@someziggyman someziggyman commented Mar 25, 2020

Same thing here for URL: https://www.cda.pl/video/83970839
unknown url type error in the log.

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'-f', u'best', u'https://www.cda.pl/video/83970839']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2020.03.24
[debug] Python version 2.7.16 (CPython) - Darwin-18.7.0-x86_64-i386-64bit
[debug] exe versions: none
[debug] Proxy map: {}
[CDA] 83970839: Downloading webpage
[CDA] 83970839: Downloading 360p version information
[CDA] 83970839: Downloading 720p version information
[CDA] 83970839: Downloading 1080p version information
[debug] Invoking downloader on u'GH2H_gf%5D452%5DA%3D%5Eg_d6%21xrsGH5%3C0d%7D_u%3Bgu58%5E%60dgd%60ed_bc%5E95baah3hc3fhdg457h7he43ff2cfe_gdhf'
Traceback (most recent call last):
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/usr/local/bin/youtube-dl/main.py", line 19, in
File "/usr/local/bin/youtube-dl/youtube_dl/init.py", line 474, in main
File "/usr/local/bin/youtube-dl/youtube_dl/init.py", line 464, in _real_main
File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 2019, in download
File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 808, in extract_info
File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 863, in process_ie_result
File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 1644, in process_video_result
File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 1926, in process_info
File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 1865, in dl
File "/usr/local/bin/youtube-dl/youtube_dl/downloader/common.py", line 366, in download
File "/usr/local/bin/youtube-dl/youtube_dl/downloader/http.py", line 341, in real_download
File "/usr/local/bin/youtube-dl/youtube_dl/downloader/http.py", line 109, in establish_connection
File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 2238, in urlopen
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 421, in open
protocol = req.get_type()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 283, in get_type
raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type: GH2H_gf%5D452%5DA%3D%5Eg_d6%21xrsGH5%3C0d%7D_u%3Bgu58%5E%60dgd%60ed_bc%5E95baah3hc3fhdg457h7he43ff2cfe_gdhf

@Kondzio18
Copy link

@Kondzio18 Kondzio18 commented Mar 25, 2020

Same problem - win10

`youtube-dl https://www.cda.pl/video/250195141/vfilm -v
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['https://www.cda.pl/video/250195141/vfilm', '-v', '-F']
[debug] Encodings: locale cp1250, fs mbcs, out cp852, pref cp1250
[debug] youtube-dl version 2020.03.24
[debug] Python version 3.4.4 (CPython) - Windows-10-10.0.18362
[debug] exe versions: ffmpeg 4.1, ffprobe 3.2
[debug] Proxy map: {}
[CDA] 250195141: Downloading webpage
[CDA] 250195141: Downloading 480p version information
WARNING: Unable to download webpage: HTTP Error 404:
WARNING: [CDA] Unable to download 480p version information
[CDA] 250195141: Downloading 720p version information
WARNING: Unable to download webpage: HTTP Error 404:
WARNING: [CDA] Unable to download 720p version information
[CDA] 250195141: Downloading 1080p version information
WARNING: Unable to download webpage: HTTP Error 404:
WARNING: [CDA] Unable to download 1080p version information
Traceback (most recent call last):
File "main.py", line 19, in
File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpjwbwqymm\build\youtube_dl_init_.py", line 474, in main
File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpjwbwqymm\build\youtube_dl_init_.py", line 464, in _real_main
File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpjwbwqymm\build\youtube_dl\YoutubeDL.py", line 2019, in download
File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpjwbwqymm\build\youtube_dl\YoutubeDL.py", line 808, in extract_info
File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpjwbwqymm\build\youtube_dl\YoutubeDL.py", line 863, in process_ie_result
File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpjwbwqymm\build\youtube_dl\YoutubeDL.py", line 1582, in process_video_result
File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpjwbwqymm\build\youtube_dl\YoutubeDL.py", line 1396, in _calc_headers
File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpjwbwqymm\build\youtube_dl\YoutubeDL.py", line 1408, in _calc_cookies
File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpjwbwqymm\build\youtube_dl\utils.py", line 2149, in sanitized_Request
File "C:\Python\Python34\lib\urllib\request.py", line 267, in init
origin_req_host = request_host(self)
File "C:\Python\Python34\lib\urllib\request.py", line 293, in full_url
def data(self):
File "C:\Python\Python34\lib\urllib\request.py", line 322, in _parse

ValueError: unknown url type: 'GH2Hdah%5D452%5DA%3D%5E%263b%23%29%3BE38FgE%3F6gv%3Fsu%2B3p%5E%60dgd%60fcd%60d%5EG%3D3a%606g4cb3f4_f3b_7af4hg33afccff4_'

C:\Users\XxxXx>youtube-dl -U
youtube-dl is up-to-date (2020.03.24)
`

@stachuman
Copy link

@stachuman stachuman commented Mar 26, 2020

Nop, no encoded path to file,
when searching in page source for: pb-video-player-content - getting video class with src=full path to mp4 (not encoded)

I believe - shouldn't be complicated to fix

@Mikolaj98p
Copy link

@Mikolaj98p Mikolaj98p commented Mar 28, 2020

I have same problem:

./youtube-dl -f hd https://www.cda.pl/video/294230158 -v
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-f', u'hd', u'https://www.cda.pl/video/294230158', u'-v']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2020.03.24
[debug] Python version 2.7.16 (CPython) - Darwin-19.4.0-x86_64-i386-64bit
[debug] exe versions: rtmpdump 2.4
[debug] Proxy map: {}
[CDA] 294230158: Downloading webpage
ERROR: Unable to download webpage: HTTP Error 429: Too Many Requests (caused by HTTPError()); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
  File "./youtube-dl/youtube_dl/extractor/common.py", line 627, in _request_webpage
    return self._downloader.urlopen(url_or_request)
  File "./youtube-dl/youtube_dl/YoutubeDL.py", line 2238, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 435, in open
    response = meth(req, response)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 548, in http_response
    'http', request, response, code, msg, hdrs)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 473, in error
    return self._call_chain(*args)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 407, in _call_chain
    result = func(*args)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 556, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
@divadsn
Copy link

@divadsn divadsn commented Mar 28, 2020

I've managed to decrypt the file from player_data, see my work here:
https://gist.github.com/divadsn/e1e7691b0bc6bb88a0d3680912bb0842

@divadsn
Copy link

@divadsn divadsn commented Mar 28, 2020

Implemented into extractor, see PR #24518

@cb1986ster
Copy link

@cb1986ster cb1986ster commented Apr 4, 2020

In line:
https://github.com/ytdl-org/youtube-dl/pull/24518/files#diff-ef1160d73154dc227734eb1d150ff7e9R150
also can check video['file'].endswith('_XDDD') should be fine :)

@divadsn
Copy link

@divadsn divadsn commented Apr 4, 2020

@cb1986ster I already noticed their ridiculous cringy way to break the extractor again and checked the new code again. Really funny, although it took me 15 minutes to break down.

But unfortunately it looks like ytdl-org collaborators are not interested into accepting nor reviewing the PR for now as they locked down the conversation for further comments regarding my work.

@divadsn
Copy link

@divadsn divadsn commented Apr 7, 2020

@dstftw can we get any update on what is going on with the PR? If you do not intend on merging the fix then be at least transparent about the information why you locked down the PR. And I can see that while we're waiting here, other fixes are being merged and reviewed.

@divadsn
Copy link

@divadsn divadsn commented Apr 12, 2020

CDA unfortunately keeps breaking the extractor by adding more obfuscation crap, it's now a cat and mouse game until CDA gives up and starts using DRM on non-premium videos.

@ytdl-org ytdl-org deleted a comment from olesio Apr 14, 2020
@bato3
Copy link
Contributor

@bato3 bato3 commented Apr 15, 2020

I think, that they can track changes in this repository.

@divadsn Because comments in PR are locked:

  • change quotes from " to ' in whole file (and test it by flake8)
  • they added more suffixes, which is why I'm thinking about declaring a joke list on the beginning of the file or even use a regular expression a = re.sub(r'_[a-zA-Z]+$', '', a)

["_XDDD", "_CDA", "_ADC", "_CXD", "_QWE", "_Q5"]

@divadsn
Copy link

@divadsn divadsn commented Apr 15, 2020

@bato3 yes I am aware of the unsolved changes in the PR.

But the goal would be to scrape the player.js on the go and extract the current "joke" list before extracting the video, kinda like the extractor for soundcloud does to get the client id.

@bato3
Copy link
Contributor

@bato3 bato3 commented Apr 16, 2020

As I look quickly, this soundcloud uses an API to update the key. (Although I may not have noticed something.) Extracting it from player.js Pulling it out of player.js is like shooting a mosquito cannon.

This is not some dynamically generated key. All they need to do is change the file structure a bit and everything is falling apart...

We either risk a regular expression, or we need to apply an openload domain approach: update, after they change.

@divadsn
Copy link

@divadsn divadsn commented Apr 16, 2020

Soundcloud extractor pulls the client id using a regular expression from any possible javascript file, which can be seen here: https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/soundcloud.py#L277

We have two options: Implement such similar regex to grab the list from player.js or have a file containing all "jokes" hosted on github pages that will be updated after they change.

@bato3
Copy link
Contributor

@bato3 bato3 commented Apr 16, 2020

I think that until changes in player.js have the features of automatic actions, there is nothing to play with extracting this using regular expressions.

Especially that we are not looking for a textual constant and its value, but operate on an obfluscated code.

=== Edit

And before we start writing more advanced code, we must check that CDA is observing changes in the repo.

I'm afraid that you will need to run some url decoding service. Or some small autoupdate python module.

@Sylvvvia
Copy link

@Sylvvvia Sylvvvia commented Apr 20, 2020

If you need any hosting for decoding service or autoupdate service let me know, i'll provide it for free.

@divadsn
Copy link

@divadsn divadsn commented Aug 3, 2020

@selfisekai selfisekai mentioned this issue Aug 4, 2020
1 of 1 task complete
@divadsn
Copy link

@divadsn divadsn commented Aug 4, 2020

@dkja
Copy link

@dkja dkja commented Aug 19, 2020

Fixed on my github: https://github.com/dkja/youtube-dl ;) I'm not python programmer, but...

@dkja dkja mentioned this issue Aug 19, 2020
5 of 9 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.