Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KUL university website Toledo (using Kaltura) #25236

Open
wvhulle opened this issue May 11, 2020 · 5 comments
Open

KUL university website Toledo (using Kaltura) #25236

wvhulle opened this issue May 11, 2020 · 5 comments

Comments

@wvhulle
Copy link

@wvhulle wvhulle commented May 11, 2020

  • I'm reporting a new site support request
  • I've verified that I'm running youtube-dl version 2020.05.08
  • I've checked that all provided URLs are alive and playable in a browser
  • I've checked that none of provided URLs violate any copyrights
  • I've searched the bugtracker for similar site support requests including closed ones

Example URLs

Description

The website uses account credentials that are given to every student. I cannot provide them because they are private, but they are of the form "r(some number)" and the password is chosen by the user. I use a netscape cookie file to authenticate.

I have tried to write an extractor, wrote a regex to match on the website title r"https?://(?:.+?\.)?kuleuven\.be/.*" but got stuck in writing the actual extraction function. While reading the source code of the pages I saw Kaltura is used. So I tried to make use of the Kaltura functions, but they do not seem to apply to the format of this page?

In one of the other issues it is mentioned that you can reference a Kaltura id directly. Where can I find this id?

@wvhulle
Copy link
Author

@wvhulle wvhulle commented May 11, 2020

So I figured out how to download single videos. There is an id and some kind of password in the link for the single videos. The command in bash is youtube-dl -f '[protocol=m3u8_native]' kaltura:2375821:1_nqkn36ju and works. Now my next question is: how do I write an extract function that does this automatically for the playlists?

@wvhulle
Copy link
Author

@wvhulle wvhulle commented May 11, 2020

The playlists are on pages like this:

To get the different videos, for example from the url
https://p.cygnus.cc.kuleuven.be/webapps/blackboard/content/listContent.jsp?course_id=_890561_1&content_id=_26919476_1
you have to click on the buttons

From that point I could use the kaltura extractor to extract each individual file. But how do I loop over all the urls?

@wvhulle
Copy link
Author

@wvhulle wvhulle commented May 11, 2020

Based on the extractor nzz.py and the apparent kaltura id 2375821 of KUL I have written:

class ToledoIE(InfoExtractor):
    _VALID_URL = r"https?://(?:.+?\.)?kuleuven\.be/(?:[^/]+/)*/(?:[^/]+/)*sp/(?P<id>\d+)/thumbnail.*"
    def _real_extract(self, url):
            page_id = self._match_id(url)
            webpage = self._download_webpage(url, page_id)

            entries = []
            for player_element in re.findall(
                    r'(<[^>]+class="kalturaPlayer[^"]*"[^>]*>)', webpage):
                player_params = extract_attributes(player_element)
                if player_params.get('data-type') not in ('kaltura_singleArticle',):
                    self.report_warning('Unsupported player type')
                    continue
                entry_id = player_params['data-id']
                entries.append(self.url_result(
                    'kaltura:2375821:' + entry_id, 'Kaltura', entry_id))

            return self.playlist_result(entries, page_id)

It seems like only the regex of the url not right yet.

@tobiascornille
Copy link

@tobiascornille tobiascornille commented Oct 7, 2020

@wvhulle Any progress on this by any chance?

@wvhulle
Copy link
Author

@wvhulle wvhulle commented Oct 7, 2020

My regex expression skills were too bad to properly extract the info. I also have to make the authentication with the university website work. In the end i resorted to the plugin 'VideoDownloadHelper'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.