Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsupported url: canvas.be #7079

Closed
timsomers opened this issue Oct 6, 2015 · 23 comments
Closed

Unsupported url: canvas.be #7079

timsomers opened this issue Oct 6, 2015 · 23 comments

Comments

@timsomers
Copy link

@timsomers timsomers commented Oct 6, 2015

$ youtube-dl http://www.canvas.be/video/radio-gaga/najaar-2015/internaat-turnhout --verbose
[debug] System config: []
[debug] User config: []
[debug] Command-line args: ['http://www.canvas.be/video/radio-gaga/najaar-2015/internaat-turnhout', '--verbose']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2014.08.05
[debug] Python version 2.7.9 - Linux-3.16.0-4-amd64-x86_64-with-debian-8.2
[debug] Proxy map: {}
[generic] internaat-turnhout: Requesting header
WARNING: Falling back on generic information extractor.
[generic] internaat-turnhout: Downloading webpage
[generic] internaat-turnhout: Extracting information
ERROR: Unsupported URL: http://www.canvas.be/video/radio-gaga/najaar-2015/internaat-turnhout; please report this issue on https://yt-dl.org/bug . Be sure to call youtube-dl with the --verbose flag and include its complete output. Make sure you are using the latest version; type  youtube-dl -U  to update.
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/youtube_dl/extractor/generic.py", line 457, in _real_extract
    doc = parse_xml(webpage)
  File "/usr/lib/python2.7/dist-packages/youtube_dl/utils.py", line 1417, in parse_xml
    return xml.etree.ElementTree.XML(s.encode('utf-8'), **kwargs)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1300, in XML
    parser.feed(text)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1642, in feed
    self._raiseerror(v)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
    raise err
ParseError: not well-formed (invalid token): line 8, column 373
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/youtube_dl/YoutubeDL.py", line 516, in extract_info
    ie_result = ie.extract(url)
  File "/usr/lib/python2.7/dist-packages/youtube_dl/extractor/common.py", line 170, in extract
    return self._real_extract(url)
  File "/usr/lib/python2.7/dist-packages/youtube_dl/extractor/generic.py", line 752, in _real_extract
    raise ExtractorError('Unsupported URL: %s' % url)
ExtractorError: Unsupported URL: http://www.canvas.be/video/radio-gaga/najaar-2015/internaat-turnhout; please report this issue on https://yt-dl.org/bug . Be sure to call youtube-dl with the --verbose flag and include its complete output. Make sure you are using the latest version; type  youtube-dl -U  to update.
@lodev
Copy link

@lodev lodev commented Oct 8, 2015

No idea how to program this, but I know how to get to the files...

The pages contain an ID called "data-video". For the example of the URL mentioned above (http://www.canvas.be/video/radio-gaga/najaar-2015/internaat-turnhout), this ID is:

mz-ast-6c57420b-c6cf-42e3-8188-541a57adabab

Calling https://mediazone.vrt.be/api/v1/canvas/assets/[id], eg. https://mediazone.vrt.be/api/v1/canvas/assets/mz-ast-6c57420b-c6cf-42e3-8188-541a57adabab returns a JSON file which contains multiple targetURLs. In some cases, like the one in this submission, this contains a direct link to a mp4 file:
{"type":"PROGRESSIVE_DOWNLOAD","url":"http://download.stream.vrt.be/mediazone_canvas/2015/09/mz-ast-3f110b7b-8125-4a6e-a961-bb7b17dd4cde-1/video_1296.mp4"}
The JSON respons also contains a title that might be suitable as base for a filename.

Not all JSONs contain this progressive download link. For example https://mediazone.vrt.be/api/v1/canvas/assets/mz-ast-2581fa4d-ce85-4833-a6ca-2acc51b742a8 does not, but it does contain a playlist.m3u8 link that youtube-dl can handle.

So, for files on canvas.be, youtube-dl should:

I guess this should be easy enough for python wizards... if no one is willing/able to work on this I'll try to learn how to do it myself.

@TomGijselinck
Copy link
Contributor

@TomGijselinck TomGijselinck commented Oct 9, 2015

I was actually looking into this issue but this info helps a lot. I only found de wowza stream which I couldn't handle with youtube-dl. With the JSON file it's indeed easy enough. I'll try to implement it further tomorrow. Thanks for that extra info!

@lodev
Copy link

@lodev lodev commented Oct 9, 2015

No problem. Charles debugging proxy was a great help :)

@midas02
Copy link
Contributor

@midas02 midas02 commented Nov 26, 2015

Should be easy enough to implement as this site is using the same streaming engine as deredactie.be and sporza.be, which have already been implemented. So the site name can be added to the existing extractor.

@TomGijselinck
Copy link
Contributor

@TomGijselinck TomGijselinck commented Nov 27, 2015

I opened a pull request #7145 which works fine, but they wont merge it.

@lodev
Copy link

@lodev lodev commented Nov 27, 2015

Can confirm @TomGijselinck's patch works. Apparently the CI tests fail. (I see a lot of errors, but for other extractors.)

One issue I have is that when downloading RTMP streams, the resulting file is missing some headers. It plays fine in VLC, but not in Quicktime. Pushing it through Handbrake helps. This may be beyond the scope of this patch though.

@lodev
Copy link

@lodev lodev commented Jan 4, 2016

@TomGijselinck New year, new resolutions... Let's try to get this accepted again :)

One issue I've noticed with your patch is that the downloads are not fully working directly, if I run the result through Handbrake the video does work. When I manually download the videos from the m3u8 URL found through the API this step is not necessary.

@TomGijselinck
Copy link
Contributor

@TomGijselinck TomGijselinck commented Jan 4, 2016

@lodev I would love to see that happen! 😄

That's strange. When I use it to download any video on canvas.be I can play the video without any problem or intermediary steps.

But do you think that could be the reason it doesn't get accepted?

@midas02
Copy link
Contributor

@midas02 midas02 commented Jan 4, 2016

Ah eindelijk, ik vroeg me al af wanneer jullie wat haar op jullie tanden gingen tonen. Over het al dan niet kunnen afspelen van videobestanden, ik ben door ervaring een klein beetje een expert geworden in geripte video. Naargelang de videospeler (software/OS/hardware) kunnen er verschillende redenen zijn waarom iets niet wil afspelen. Sommige lusten bepaalde headers niet, andere wel. Jullie kunnen mij altijd privé contacteren als jullie problemen hebben met een bestand.

@midas02
Copy link
Contributor

@midas02 midas02 commented Jan 4, 2016

And in English then. To all, as per my previous comments. I would strongly suggest to adapt the existing extractor for VRT and NOT to create a new one. VRT, the Flemish broadcaster, is operating three brands/websites (deredactie.be, sporza.be and canvas.be, plus the defunct cobra.be). All of these use the same technology and streaming engine, and have been enabled in a single extractor, vrt.py:
_VALID_URL = r'https?://(?:deredactie|sporza|cobra).be/cm/(?:[^/]+/)+(?P[^/]+)/*'

So there is no point in creating another one. Not only are you reinventing the wheel, it will cause additional admin work in future when fixes will be needed. Please have a look at the existing extractor, and see what needs modifying. Shouldn't be too hard.

@lodev
Copy link

@lodev lodev commented Jan 4, 2016

@midas02 I agree this could be added to the existing extractor. However, the method to find the video ID on canvas.be is totally different, see my comment on Oct. 9, so creating a separate one for canvas will only help with future debugging.

Regarding the files: Grabbing the playlist.m3u8 and passing it as an argument to the existing youtube-dl gives a fully compliant mp4 file that plays fine in Quicktime. This patch results in files that miss some headers.

@midas02
Copy link
Contributor

@midas02 midas02 commented Jan 4, 2016

Most people I know have problems with ripped video and Quicktime. Typically a QT issue. VLC shouldn't have a problem playing it. Using ffmpeg to convert it from mp4 to mkv usually helps to get the headers right.

@lodev
Copy link

@lodev lodev commented Jan 4, 2016

@mias02 the valid files are on the server, and I've shown a way to get them. No reason why youtube-dl should download broken ones.

@midas02
Copy link
Contributor

@midas02 midas02 commented Jan 4, 2016

Almost correct. You're not really downloading the source file, you're raking them in through an adaptive protocol in ten second chunks. It is then up to the software to merge them again, and attach the right headers. I understand the issue, but that's where QT seems to be much more picky than other softwares. I'm having similar issues with my LG TV, but QT is by far more annoying to deal with.

@lodev
Copy link

@lodev lodev commented Jan 4, 2016

Indeed, I stand corrected. Still, youtube-dl has the ability to deliver proper files, users don't expect the need for another step :)

@TomGijselinck
Copy link
Contributor

@TomGijselinck TomGijselinck commented Jan 4, 2016

@lodev Which exact url do you provide to youtube_dl to get a fully compliant mp4 file? I'll try to include it when I fix those other issues tomorrow. 😉

@xplorr
Copy link

@xplorr xplorr commented Jan 12, 2016

EDIT: offtopic

@xplorr
Copy link

@xplorr xplorr commented Jan 12, 2016

I try to answer this topic first request from lodev "if no one is willing/able to work on this I'll try to learn how to do it myself.", but moderator seems to erase my comments. Have no idea why.
I just want to say, I scripted this for windows. So if anyone is interested to see my script, drop me a message.

@dstftw
Copy link
Collaborator

@dstftw dstftw commented Jan 12, 2016

This issue tracker is only supposed to be used for youtube-dl related problems and discussions but not closed source 3rdparty software.

@xplorr
Copy link

@xplorr xplorr commented Jan 12, 2016

Strange: comment "lodev commented on Oct 9, 2015" and following comments, are all about canvas.be streams, not youtube-dl. I just try to answer the question.

@lodev
Copy link

@lodev lodev commented Jan 12, 2016

@TomGijselinck sorry I missed this notification. For instance, when I provide youtube-dl http://vod.stream.vrt.be/mediazone_canvas_geo/_definst_/smil:2015/12/mz-ast-63ce26bd-85cb-469c-9c56-65b4cd793a7e-1/video.smil/playlist.m3u8 I get an mp4 that works out of the box in Quicktime. When I try the URL it comes from ( http://www.canvas.be/video/david-bowie-five-years/david-bowie-five-years) I get one I first have to pass through handbrake.

@lodev
Copy link

@lodev lodev commented Jan 12, 2016

@xplorr all of those comments are in the context of this youtube-dl extractor though.

@dstftw
Copy link
Collaborator

@dstftw dstftw commented Jan 14, 2016

canvas will be supported in the next version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants
You can’t perform that action at this time.