-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[Panopto] Add Panopto extractors #13449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
No test cases are included as I am not aware of any publicly available Panopto recordings that this extractor will work with. Supports downloading individual recordings or entire folders recursively. Folders are seperated with a ' -- ' in the playlist title. Cookies are likely required to use this extractor specifically their .ASPXAUTH cookie which can be obtained from your browser after logging in. --write-all-thumbnails can be used to download PowerPoint slides if they are not included as a video stream. Suggested output format is 'out/%(playlist)s/%(title)s.%(ext)s'
and fix all issues. |
Testing may be impossible for the Folder extractor, or I'm just doing it wrong. With the current test we enter a catch-22 where it claims we need an 'ext' entry to continue testing but upon adding that it claims it expected None.
|
I've fixed all the issues noted by QuantifiedCode. I've also added tests for the Panopto extractor that both work on my machine but fail the Travis test. I have no idea why this is. Please let me know if there are any further places where conventions aren't being followed. As noted in one of my commits:
I've still included the faulty PanoptoFolder test for review and reproduction of this issue. Thanks! |
youtube_dl/extractor/panopto.py
Outdated
| class PanoptoIE(PanoptoBaseIE): | ||
| """Extracts a single Panopto video including all available streams.""" | ||
|
|
||
| _VALID_URL = r'^https?:\/\/(?P<org>[a-z0-9]+)\.hosted\.panopto.com\/Panopto\/Pages\/Viewer\.aspx\?id=(?P<id>[a-f0-9-]+)' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to escape \.
youtube_dl/extractor/panopto.py
Outdated
| for c in contribs: | ||
| display_name = c.get('DisplayName') | ||
| if display_name is not None: | ||
| s += '{}, '.format(display_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{} does not work in python 2.6.
| result['entries'] = streams | ||
|
|
||
| # We already know Delivery exists since we need it for stream extraction | ||
| contributors = delivery_info['Delivery'].get('Contributors') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read coding conventions and fix all optional fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how this is at odds with the coding conventions. Are you referring to the use of ['Delivery']? The Delivery key must exist assuming that code is ever executed because the key is needed to retrieve non-optional fields earlier. It would fail at line 120:
for this_stream in delivery_info['Delivery']['Streams']:Because we need ['Delivery'] (and more specifically ['Delivery']['Streams']) to even find the non-optional information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dstftw Is this still an issue?
|
The Travis errors appear to be from ffmpeg not being available: Once ffmpeg is installed it fixed the above error but the tests still failed with MD5 mismatches. It looks like the $ python2.7 -m youtube_dl --test https://demo.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=26b3ae9e-4a48-4dcc-96ba-0befba08a0fb
$ wc -c Panopto\ for\ Business-26b3ae9e-4a48-4dcc-96ba-0befba08a0fb.mp4
13663 Panopto for Business-26b3ae9e-4a48-4dcc-96ba-0befba08a0fb.mp4
$ md5sum Panopto\ for\ Business-26b3ae9e-4a48-4dcc-96ba-0befba08a0fb.mp4
76d6bc8e500b1e47b53541514e3d1ea6 Panopto for Business-26b3ae9e-4a48-4dcc-96ba-0befba08a0fb.mp4or macOS: $ python2.7 -m youtube_dl --test https://demo.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=26b3ae9e-4a48-4dcc-96ba-0befba08a0fb
$ gwc -c Panopto\ for\ Business-26b3ae9e-4a48-4dcc-96ba-0befba08a0fb.mp4
13679 Panopto for Business-26b3ae9e-4a48-4dcc-96ba-0befba08a0fb.mp4
$ gmd5sum Panopto\ for\ Business-26b3ae9e-4a48-4dcc-96ba-0befba08a0fb.mp4
06fb292a3510aa5bc5f0c950fe58c9f7 Panopto for Business-26b3ae9e-4a48-4dcc-96ba-0befba08a0fb.mp4Different lengths means both platforms will generate different MD5 sums during testing. If I truncate both to 10241 bytes I get the same MD5 but neither platform will test correctly against that MD5. It's worth adding that if I select |
|
Looks like the issue is with youtube-dl assuming that the
So we have two different versions of FFmpeg on two different platforms with FFmpeg's Edit: Even when I brought the FFmpeg versions closer together (roughly 19 days apart) the MD5s were still different. I brought both up to |
This should prevent issues like that in #13449 where the FFmpeg downloader was not reliably trimming to the correct length.
|
Hello, any update on this? Thank you for working on it. |
|
Hey Manuel, I haven't touched this or tried to use it since 2017. I imagine that it may need to be updated if Panopto has changed their (semi-private) API. This was waiting on another round of reviews, I believe I had addressed all of the outstanding comments. As I mentioned earlier there seemed to be a bug in the way youtube-dl was doing its tests against FFmpeg, relying on the If there are users willing to test this patch and perhaps even work on it I would certainly appreciate it, and I'm sure the maintainers would too. |
| _TESTS = [ | ||
| { | ||
| 'url': 'https://demo.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=26b3ae9e-4a48-4dcc-96ba-0befba08a0fb', | ||
| 'md5': 'e8e6ef6b0572dd5985f5f8c3e096f717', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 'md5': 'e8e6ef6b0572dd5985f5f8c3e096f717', | |
| 'md5': '048dab5c2f8ab97f2bd75ab4cf3f463a', |
Maybe this video was changed, this now seems to be the correct hash.
| 'title': this_stream['Tag'], | ||
| 'formats': [], | ||
| } | ||
| if 'StreamHttpUrl' in this_stream: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if 'StreamHttpUrl' in this_stream: | |
| if 'StreamHttpUrl' in this_stream and this_stream['StreamHttpUrl'] is not None: | |
It looks like this now can be null/none
| new_stream['formats'].append({ | ||
| 'url': this_stream['StreamHttpUrl'], | ||
| }) | ||
| if 'StreamUrl' in this_stream: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if 'StreamUrl' in this_stream: | |
| if 'StreamUrl' in this_stream and this_stream['StreamUrl'] is not None: |
Maybe the same change as for StreamHttpUrl should also be done for StreamUrl? I haven't seen any cases where this is None/null, but it seems like a good idea.
|
I tested this PR and I was able to get things working (just test/test_download.py and an example private video) after a few changes (see review). |
| m3u8_formats = self._extract_m3u8_formats(this_stream['StreamUrl'], video_id, 'mp4') | ||
| self._sort_formats(m3u8_formats) | ||
| new_stream['formats'].extend(m3u8_formats) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| m3u8_formats = self._extract_m3u8_formats(this_stream['StreamUrl'], video_id, 'mp4') | |
| self._sort_formats(m3u8_formats) | |
| new_stream['formats'].extend(m3u8_formats) | |
| get_stream_url_matcher = lambda mid : re.compile(r'^https?://.*?/.*?/.*?' + mid + r'\?InvocationID.*') | |
| if '_M3U8_MATCHER' not in self.__dict__: | |
| self._M3U8_MATCHER = get_stream_url_matcher(r'\.hls/master\.m3u8') | |
| if '_MP4_MATCHER' not in self.__dict__: | |
| self._MP4_MATCHER = get_stream_url_matcher(r'\.mp4') | |
| if self._M3U8_MATCHER.match(this_stream['StreamUrl']): | |
| m3u8_formats = self._extract_m3u8_formats(this_stream['StreamUrl'], video_id, 'mp4') | |
| self._sort_formats(m3u8_formats) | |
| new_stream['formats'].extend(m3u8_formats) | |
| elif self._MP4_MATCHER.match(this_stream['StreamUrl']): | |
| new_stream['formats'].append({'url': this_stream['StreamUrl']}) | |
| else: | |
| raise ExtractorError('Unexpected StreamUrl format') |
In some cases, it looks like StreamUrl is a single mp4 file instead of m3u8 (when there is only a single video I assume?). I don't know how fragile this regex is, but it passes all of my tests.
|
https://github.com/jstrieb/panopto-download I just found about this that might help. I looked briefly through the PR and saw the links looked like they were hardcoded. They are available through the RSS feed of each folder currently. |
Interestingly, the links in the RSS feed don't require any authentication, so it should be possible to convert from the folder URL to the RSS feed url and download all the videos without requiring cookies. Unfortunately, I don't think its possible to get the RSS feed url directly from a viewer video url, so cookies are needed when downloading a single video. For my personal use cases, using the RSS feed directly is perfect. I set up a daemon which periodically runs youtube-dl on the RSS feed and sends notifications when a new video is downloaded. See this script from my dotfiles. This doesn't require this PR at all. For the use case of downloading a single video, I think that using the RSS feed would increase complexity. For the folder use case, I think that using the RSS should be considerably cleaner, but I don't know how that would work for recursive folder downloads. Also, I haven't tested recursive folder downloads, so it is possible that they are broken right now. I probably won't write this change for a few reasons, but I think this would be reasonably easy to do (aside from subfolders perhaps). |
|
Reviving this in yt-dlp: yt-dlp/yt-dlp#2908 |
Based on ytdl-org/youtube-dl#13449 Closes #1946 Authored by: coletdjnz, kmark
|
@dirkf Hi! Any chance to get this merged? Thanks! |
|
it's merged in yt-dlp btw |
|
We should back-port the yt-dlp extractor unless it does something weird wrt yt-dl. |
|
iirc there were some bugs in core that had to be fixed for panopto to work completely, so those will need to be backported too |
|
at the risk of sounding like a broken record, if you compare youtube-dl commits and bufixes/new features to yt-dlp, youtube-dl as (mostly) controlled by @dirkf will always be behind in a project that has to keep up with constantly changing unofficial APIs |
|
@lebdron I digress (rant): from what I can tell, unnecessary energy is spent backporting, but that is the reality of having a better maintained fork compete with the formerly undisputed project and fracturing the community. Especially when the project insists on supporting python2 and the 0.1% of people who choose run youtube-dl on an ancient embedded device that somehow doesn't have python3 but those people put the burden on contributors. |
Please follow the guide below
xinto all the boxes [ ] relevant to your pull request (like that [x])Before submitting a pull request make sure you have:
In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:
What is the purpose of your pull request?
Description of your pull request and other information
No test cases are included as I am not aware of any publicly
available Panopto recordings that this extractor will work with.
Supports downloading individual recordings or entire folders
recursively. Folders are seperated with a ' -- ' in the playlist
title.
Cookies are likely required to use this extractor specifically their
.ASPXAUTH cookie which can be obtained from your browser after
logging in.
--write-all-thumbnails can be used to download PowerPoint slides if
they are not included as a video stream.
Suggested output format is 'out/%(playlist)s/%(title)s.%(ext)s'