Skip to content

Conversation

@kmark
Copy link

@kmark kmark commented Jun 21, 2017

Please follow the guide below

  • You will be asked some questions, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your pull request (like that [x])
  • Use Preview tab to see how your pull request will actually look like

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

No test cases are included as I am not aware of any publicly
available Panopto recordings that this extractor will work with.

Supports downloading individual recordings or entire folders
recursively. Folders are seperated with a ' -- ' in the playlist
title.

Cookies are likely required to use this extractor specifically their
.ASPXAUTH cookie which can be obtained from your browser after
logging in.

--write-all-thumbnails can be used to download PowerPoint slides if
they are not included as a video stream.

Suggested output format is 'out/%(playlist)s/%(title)s.%(ext)s'

No test cases are included as I am not aware of any publicly
available Panopto recordings that this extractor will work with.

Supports downloading individual recordings or entire folders
recursively. Folders are seperated with a ' -- ' in the playlist
title.

Cookies are likely required to use this extractor specifically their
.ASPXAUTH cookie which can be obtained from your browser after
logging in.

--write-all-thumbnails can be used to download PowerPoint slides if
they are not included as a video stream.

Suggested output format is 'out/%(playlist)s/%(title)s.%(ext)s'
@dstftw
Copy link
Collaborator

dstftw commented Jun 21, 2017

  1. Read

youtube-dl coding conventions

and fix all issues.
2. You have to provide account credentials and concrete example URL for testing.

kmark added 7 commits July 6, 2017 20:11
Testing may be impossible for the Folder extractor, or I'm just
doing it wrong. With the current test we enter a catch-22 where it
claims we need an 'ext' entry to continue testing but upon adding
that it claims it expected None.
@kmark
Copy link
Author

kmark commented Jul 7, 2017

I've fixed all the issues noted by QuantifiedCode. I've also added tests for the Panopto extractor that both work on my machine but fail the Travis test. I have no idea why this is. Please let me know if there are any further places where conventions aren't being followed.

As noted in one of my commits:

Testing may be impossible for the Folder extractor, or I'm just
doing it wrong. With the current test we enter a catch-22 where it
claims we need an 'ext' entry to continue testing but upon adding
that it claims it expected None.

I've still included the faulty PanoptoFolder test for review and reproduction of this issue.

Thanks!

class PanoptoIE(PanoptoBaseIE):
"""Extracts a single Panopto video including all available streams."""

_VALID_URL = r'^https?:\/\/(?P<org>[a-z0-9]+)\.hosted\.panopto.com\/Panopto\/Pages\/Viewer\.aspx\?id=(?P<id>[a-f0-9-]+)'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to escape \.

for c in contribs:
display_name = c.get('DisplayName')
if display_name is not None:
s += '{}, '.format(display_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{} does not work in python 2.6.

result['entries'] = streams

# We already know Delivery exists since we need it for stream extraction
contributors = delivery_info['Delivery'].get('Contributors')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read coding conventions and fix all optional fields.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how this is at odds with the coding conventions. Are you referring to the use of ['Delivery']? The Delivery key must exist assuming that code is ever executed because the key is needed to retrieve non-optional fields earlier. It would fail at line 120:

for this_stream in delivery_info['Delivery']['Streams']:

Because we need ['Delivery'] (and more specifically ['Delivery']['Streams']) to even find the non-optional information.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dstftw Is this still an issue?

@kmark
Copy link
Author

kmark commented Jul 8, 2017

The Travis errors appear to be from ffmpeg not being available:

DownloadError: ERROR: m3u8 download detected but ffmpeg or avconv could not be found. Please install one.

Once ffmpeg is installed it fixed the above error but the tests still failed with MD5 mismatches. It looks like the --test option isn't working properly on Linux:

$ python2.7 -m youtube_dl --test https://demo.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=26b3ae9e-4a48-4dcc-96ba-0befba08a0fb
$ wc -c Panopto\ for\ Business-26b3ae9e-4a48-4dcc-96ba-0befba08a0fb.mp4
13663 Panopto for Business-26b3ae9e-4a48-4dcc-96ba-0befba08a0fb.mp4
$ md5sum Panopto\ for\ Business-26b3ae9e-4a48-4dcc-96ba-0befba08a0fb.mp4
76d6bc8e500b1e47b53541514e3d1ea6  Panopto for Business-26b3ae9e-4a48-4dcc-96ba-0befba08a0fb.mp4

or macOS:

$ python2.7 -m youtube_dl --test https://demo.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=26b3ae9e-4a48-4dcc-96ba-0befba08a0fb
$ gwc -c Panopto\ for\ Business-26b3ae9e-4a48-4dcc-96ba-0befba08a0fb.mp4
13679 Panopto for Business-26b3ae9e-4a48-4dcc-96ba-0befba08a0fb.mp4
$ gmd5sum Panopto\ for\ Business-26b3ae9e-4a48-4dcc-96ba-0befba08a0fb.mp4
06fb292a3510aa5bc5f0c950fe58c9f7  Panopto for Business-26b3ae9e-4a48-4dcc-96ba-0befba08a0fb.mp4

Different lengths means both platforms will generate different MD5 sums during testing. If I truncate both to 10241 bytes I get the same MD5 but neither platform will test correctly against that MD5.

It's worth adding that if I select -f 0 to avoid M3U8/ffmpeg stuff then both platforms properly truncate to 10241 bytes.

@kmark
Copy link
Author

kmark commented Jul 8, 2017

Looks like the issue is with youtube-dl assuming that the -fs option of FFmpeg will limit the file size to 10241 bytes in a deterministic fashion. According to the documentation it doesn't guarantee a final file size and even claims the download will be larger by some undefined degree:

Set the file size limit, expressed in bytes. No further chunk of bytes is written after the limit is exceeded. The size of the output file is slightly more than the requested file size.

So we have two different versions of FFmpeg on two different platforms with FFmpeg's -fs option producing two different final file sizes. When I bring the FFmpeg versions closer together on both platforms (latest or near-latest from master) the file sizes are equal again. Unless I'm misunderstanding something about youtube-dl, I would put this in the bug category.

Edit: Even when I brought the FFmpeg versions closer together (roughly 19 days apart) the MD5s were still different. I brought both up to N-86744-gfe92422 and only then was I able to get identical MD5s.

kmark referenced this pull request in kmark/youtube-dl Jul 8, 2017
This should prevent issues like that in #13449 where the FFmpeg
downloader was not reliably trimming to the correct length.
@manuel-uberti
Copy link

Hello, any update on this? Thank you for working on it.

@kmark
Copy link
Author

kmark commented Aug 22, 2020

Hey Manuel, I haven't touched this or tried to use it since 2017. I imagine that it may need to be updated if Panopto has changed their (semi-private) API. This was waiting on another round of reviews, I believe I had addressed all of the outstanding comments.

As I mentioned earlier there seemed to be a bug in the way youtube-dl was doing its tests against FFmpeg, relying on the -fs option to behave deterministically across versions, which it does not. I opened up a PR to try and address this but did not receive any feedback on the second iteration of the patch. I am not sure if this is still an outstanding issue.

If there are users willing to test this patch and perhaps even work on it I would certainly appreciate it, and I'm sure the maintainers would too.

_TESTS = [
{
'url': 'https://demo.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=26b3ae9e-4a48-4dcc-96ba-0befba08a0fb',
'md5': 'e8e6ef6b0572dd5985f5f8c3e096f717',

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'md5': 'e8e6ef6b0572dd5985f5f8c3e096f717',
'md5': '048dab5c2f8ab97f2bd75ab4cf3f463a',

Maybe this video was changed, this now seems to be the correct hash.

'title': this_stream['Tag'],
'formats': [],
}
if 'StreamHttpUrl' in this_stream:
Copy link

@rgreenblatt rgreenblatt Sep 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if 'StreamHttpUrl' in this_stream:
if 'StreamHttpUrl' in this_stream and this_stream['StreamHttpUrl'] is not None:

It looks like this now can be null/none

new_stream['formats'].append({
'url': this_stream['StreamHttpUrl'],
})
if 'StreamUrl' in this_stream:
Copy link

@rgreenblatt rgreenblatt Sep 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if 'StreamUrl' in this_stream:
if 'StreamUrl' in this_stream and this_stream['StreamUrl'] is not None:

Maybe the same change as for StreamHttpUrl should also be done for StreamUrl? I haven't seen any cases where this is None/null, but it seems like a good idea.

@rgreenblatt
Copy link

I tested this PR and I was able to get things working (just test/test_download.py and an example private video) after a few changes (see review).

Comment on lines +131 to +133
m3u8_formats = self._extract_m3u8_formats(this_stream['StreamUrl'], video_id, 'mp4')
self._sort_formats(m3u8_formats)
new_stream['formats'].extend(m3u8_formats)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
m3u8_formats = self._extract_m3u8_formats(this_stream['StreamUrl'], video_id, 'mp4')
self._sort_formats(m3u8_formats)
new_stream['formats'].extend(m3u8_formats)
get_stream_url_matcher = lambda mid : re.compile(r'^https?://.*?/.*?/.*?' + mid + r'\?InvocationID.*')
if '_M3U8_MATCHER' not in self.__dict__:
self._M3U8_MATCHER = get_stream_url_matcher(r'\.hls/master\.m3u8')
if '_MP4_MATCHER' not in self.__dict__:
self._MP4_MATCHER = get_stream_url_matcher(r'\.mp4')
if self._M3U8_MATCHER.match(this_stream['StreamUrl']):
m3u8_formats = self._extract_m3u8_formats(this_stream['StreamUrl'], video_id, 'mp4')
self._sort_formats(m3u8_formats)
new_stream['formats'].extend(m3u8_formats)
elif self._MP4_MATCHER.match(this_stream['StreamUrl']):
new_stream['formats'].append({'url': this_stream['StreamUrl']})
else:
raise ExtractorError('Unexpected StreamUrl format')

In some cases, it looks like StreamUrl is a single mp4 file instead of m3u8 (when there is only a single video I assume?). I don't know how fragile this regex is, but it passes all of my tests.

@jxu
Copy link
Contributor

jxu commented Sep 30, 2020

https://github.com/jstrieb/panopto-download I just found about this that might help. I looked briefly through the PR and saw the links looked like they were hardcoded. They are available through the RSS feed of each folder currently.

@rgreenblatt
Copy link

https://github.com/jstrieb/panopto-download I just found about this that might help. I looked briefly through the PR and saw the links looked like they were hardcoded. They are available through the RSS feed of each folder currently.

Interestingly, the links in the RSS feed don't require any authentication, so it should be possible to convert from the folder URL to the RSS feed url and download all the videos without requiring cookies. Unfortunately, I don't think its possible to get the RSS feed url directly from a viewer video url, so cookies are needed when downloading a single video.

For my personal use cases, using the RSS feed directly is perfect. I set up a daemon which periodically runs youtube-dl on the RSS feed and sends notifications when a new video is downloaded. See this script from my dotfiles. This doesn't require this PR at all.

For the use case of downloading a single video, I think that using the RSS feed would increase complexity.

For the folder use case, I think that using the RSS should be considerably cleaner, but I don't know how that would work for recursive folder downloads. Also, I haven't tested recursive folder downloads, so it is possible that they are broken right now. I probably won't write this change for a few reasons, but I think this would be reasonably easy to do (aside from subfolders perhaps).

@coletdjnz
Copy link
Contributor

Reviving this in yt-dlp: yt-dlp/yt-dlp#2908

pukkandan pushed a commit to yt-dlp/yt-dlp that referenced this pull request Mar 8, 2022
Based on ytdl-org/youtube-dl#13449
Closes #1946
Authored by: coletdjnz, kmark
@lebdron
Copy link
Contributor

lebdron commented Jul 19, 2022

@dirkf Hi! Any chance to get this merged? Thanks!

@jxu
Copy link
Contributor

jxu commented Jul 19, 2022

it's merged in yt-dlp btw

@dirkf
Copy link
Contributor

dirkf commented Jul 19, 2022

We should back-port the yt-dlp extractor unless it does something weird wrt yt-dl.

@lebdron lebdron mentioned this pull request Jul 19, 2022
11 tasks
@coletdjnz
Copy link
Contributor

iirc there were some bugs in core that had to be fixed for panopto to work completely, so those will need to be backported too

@jxu
Copy link
Contributor

jxu commented Jul 31, 2022

at the risk of sounding like a broken record, if you compare youtube-dl commits and bufixes/new features to yt-dlp, youtube-dl as (mostly) controlled by @dirkf will always be behind in a project that has to keep up with constantly changing unofficial APIs

@lebdron
Copy link
Contributor

lebdron commented Jul 31, 2022

@jxu no worries, I am working on a backport #31097 so that there will be at least some implementation

@jxu
Copy link
Contributor

jxu commented Jul 31, 2022

@lebdron I digress (rant): from what I can tell, unnecessary energy is spent backporting, but that is the reality of having a better maintained fork compete with the formerly undisputed project and fracturing the community. Especially when the project insists on supporting python2 and the 0.1% of people who choose run youtube-dl on an ancient embedded device that somehow doesn't have python3 but those people put the burden on contributors.

@dirkf
Copy link
Contributor

dirkf commented Jul 31, 2022

@jxu can feel free not to spend any effort back-porting code to yt-dl, and should probably ignore yt-dl altogether.

Meanwhile thx @lebdron: please include "closes # 13449" in your PR text (but without the space after the #).

@dirkf dirkf closed this Aug 1, 2023
@dirkf dirkf added the defunct PR source branch is not accessible label Oct 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

defunct PR source branch is not accessible pending-fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants