Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
[Clarification-needed] Brightcove code for Kijk.nl. #6243
Comments
since the urls is in an html file it may contain escape sequences like
It's only searched if the url in |
See its definition: https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py#L1129-L1143, it just embeds extra info in a url so the extractor can use it |
|
I guess I can skip to |
|
A good friend saw my posts here and pointed me to another project's source-code. This code was more straight forward and in the end I could recover the 'manifest-playlist'-url with just 1 Xidel command-line:
I guess the Brightcove-code has become so complex because of all the added extractors and development in general. But if I can recover the m3u8 with just 1 command-line, it makes me wonder if the Brightcove-code isn't too complex? |
|
With this code you've handled just a subset of possible brightcove embeds. If this happens to be used on kijk that doesn't mean different embeds can't be used somewhere else. |
|
Alright. Understood. Thank you, dstftw |
Although I'm creating an "Issue" here, it's actually some specific Python code explanation I'm looking for.
I'm creating a good old batchscript to download videos / extract video-urls from npo.nl, rtlxl.nl and kijk.nl. For the first 2 it's already working, but I'm having a hard time understanding the code for kijk.nl:
https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/generic.py#L216
https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/generic.py#L1152
https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/brightcove.py#L194
Let's consider http://www.kijk.nl/sbs6/wegmisbruikers/videos/ouhfqhGaSYBE/aflevering-304 as input.
First of all, what does
'url': smuggle_url(bc_url, {'Referer': url})(#L1158 of generic.py), andsmuggle_urlin particular do?With Xidel I can use XQuery in my batchscript and with its help I managed to extract the url from
<meta property="og:video" content="{.}" />(#L197 of brightcove.py:url_m = re.search(...,webpage)):But since I have very little experience with Python, I just don't understand what happens then.
It obviously checks whether the url contains
"playerKey", or"videoId", but before that, what doesurl = unescapeHTML(url_m.group(1))actually do?And then
matches = re.findall(...,webpage)... I guess it's searching for the xml string<object class="BrightcoveExperience">{params}</object>, but this string doesn't appear at all in the html-code of http://www.kijk.nl/sbs6/wegmisbruikers/videos/ouhfqhGaSYBE/aflevering-304.If anyone can explain the url-extraction to me, I'd be very grateful, because as you can see, I need some clarification. :)