Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downloader for slide streams #343

Closed
wants to merge 13 commits into from
Closed

Conversation

fstirlitz
Copy link
Contributor

@fstirlitz fstirlitz commented May 23, 2021

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense

What is the purpose of your pull request?

  • Improvement
  • New feature

This implements a downloader for slideshow-like streams and uses it to download slides from Mediasite presentations and also thumbnail streams found in DASH manifests.

The hardest part was probably picking the format for the downloaded slides. I needed a format that is:

  • self-contained (in a single file)
  • easy to generate
  • can losslessly accommodate stand-alone images in disparate formats in a single file
  • can be readily opened by widely available software

Given these constraints, my only option was… MHTML. This is admittedly an unorthodox choice for a media format, and it should be noted that basically the only major software that supports it now are Blink-based browsers. Nevertheless, thanks to the HTML stub included in the file, the slideshow in the file can be readily viewed in those. I would have even included an interactive player, if it weren’t for the fact that Chromium doesn’t run scripts in MHTML files, so it would have been a considerable effort to implement something that wouldn’t even be usable; thus, I limited myself to including some basic CSS to make the slideshow passively viewable in a browser. Should the user want to extract the images, they will have no problem finding software libraries to process MIME multipart messages.

This addresses:

@pukkandan
Copy link
Member

For the time-being, please add a from __future__ import unicode_literals to the new file so that the rest of the tests can run. That line can be removed later once I remove that test

@pukkandan
Copy link
Member

pukkandan commented May 23, 2021

I would have even included an interactive player, if it weren’t for the fact that Chromium doesn’t run scripts in MHTML files, so it would have been a considerable effort to implement something that wouldn’t even be usable;

It can be done with just css animations by setting the animation times using duration. Obviously, I'm not saying you should add it


Given these constraints, my only option was… MHTML

MHTML is fine imo. The other options would be:

  1. mjpeg: But I dont think there is any motion-png. So we'll need to convert everything to jpeg which is not the greatest idea
  2. Normal HTML. It would be possible to base64 encode all the images. The only advantage over MHTML would be better compatibility. (Why can't firefox just support mhtml 😢)
  3. zip: Many image viewers, pdf viewers and video players support zip/tar files that contain images. One such relatively popular format is cbz. EPUB is also this idea, but using html instead of just images. But these wont give us the flexibility to add animation in the future
  4. PPTS: OpenXML Presentation can also be used in theory, though it is quite complex to generate

I'm not very familiar with mhtml specs, so expect a full review to take a while

@fstirlitz
Copy link
Contributor Author

mjpeg: But I dont think there is any motion-png. So we'll need to convert everything to jpeg which is not the greatest idea

There is, it’s called MNG. It even has support for JPEG compression too. But it would require re-framing JPEG files into JNG chunks, would be considerably hard to generate and it’s not very widely supported.

Before settling on MHTML, I considered putting JPEGs into an MPEG-4 container (complicated to generate), an AVI container (generating RIFF chunks requires knowing the length beforehand, so it would require a second pass over the file), multipart/x-mixed-replace (doesn’t allow specifying durations, limited to MJPEG in practice) and using the concat FFmpeg muxer to generate the file (would only support a single format).

HTML with base64-encoded images might have worked just as well or even better (at least we’d have scripting back), but the ⁴⁄₃ overhead…

@Zirro
Copy link
Contributor

Zirro commented May 24, 2021

There is, it’s called MNG. It even has support for JPEG compression too. But it would require re-framing JPEG files into JNG chunks, would be considerably hard to generate and it’s not very widely supported.

There's also APNG - an extension to the PNG format proposed as an alternative to MNG - which has (after way too many years) managed to achieve widespread support in modern browsers and FFmpeg. It sounds like it might be just what you're looking for.

@fstirlitz
Copy link
Contributor Author

APNG would require not only re-framing, but also re-encoding, which I specifically wanted to avoid.

@Zirro
Copy link
Contributor

Zirro commented May 24, 2021

From pukkandan's earlier comment - "But I dont think there is any motion-png. So we'll need to convert everything to jpeg which is not the greatest idea" - I was under the impression that ideal format would be lossless (in addition to the other requirements). Whether it makes the conversion more challenging to perform on the technical level, I would've expected FFmpeg to be able to handle it. However, I recognise that there likely are intricacies to this which are unfamiliar to me.

Copy link
Member

@pukkandan pukkandan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have concerns regarding implementing slides as a "video format" (ie. acodec=none, vcodec=jpeg)

  1. This interferes with the format selection. If the user gives -f wv (or even worse -f wv+wa), they are definitely not requesting the slides. Similar issue exists for using -S
  2. Merger should not try to merge slides with video/audio. Similarly, other postprocessors should also recognize this as not being a standard video/audio format and not try to add metadata, recode, remux etc.

I can think of 2 solutions this to this:

  1. Add a new key in the infodict for this, similar to thumbnails/subtitles and corresponding cli --write option. This is the easiest solution since we don't need to modify any of the existing code
  2. Add a new category slides(?) similar to video/audio. This would however require changes to the format selector code, and most (if not all) postprocessors

Whichever solution we decide to implement, it would be helpful if it is easily expandable in the future to allow features like ytdl-org/youtube-dl#14951 without adding yet another type of data


I would also like the user to be able to easily download the images separately without them being embedded into a single mhtml file. This is already somewhat accomplished by using --keep-fragments, but without the correct file extension. This feature is not necessary to be implemented within this PR though.

yt_dlp/downloader/mhtml.py Outdated Show resolved Hide resolved
yt_dlp/downloader/mhtml.py Outdated Show resolved Hide resolved
yt_dlp/downloader/mhtml.py Outdated Show resolved Hide resolved
yt_dlp/downloader/mhtml.py Outdated Show resolved Hide resolved
yt_dlp/downloader/mhtml.py Outdated Show resolved Hide resolved
yt_dlp/downloader/mhtml.py Outdated Show resolved Hide resolved
yt_dlp/extractor/common.py Outdated Show resolved Hide resolved
yt_dlp/extractor/common.py Outdated Show resolved Hide resolved
yt_dlp/extractor/mediasite.py Outdated Show resolved Hide resolved
yt_dlp/extractor/mediasite.py Outdated Show resolved Hide resolved
@fstirlitz
Copy link
Contributor Author

fstirlitz commented Jun 3, 2021

Doesn’t a negative preference/quality value mitigate problems with format selection? This is actually part of a more general problem, which Mediasite already hits with some regularity: we don’t handle sites which offer multiple independent, but synchronised streams very well. If I understand correctly, #347 likewise ultimately boils down to this.

Add a new key in the infodict for this, similar to thumbnails/subtitles and corresponding cli --write option. This is the easiest solution since we don't need to modify any of the existing code

I actually thought about doing the opposite and integrating subtitles into the formats list. These are after all synchronised with the other media streams and can be considered a first-class alternative medium of presentation. It’s exactly how FFmpeg treats them, too. (Poster thumbnails, on the other hand, don’t fit into this; they are more akin to metadata tags.)

With regard to postprocessor support, this would not have been a problem had I generated an MJPEG stream encapsulated in a Matroska or MP4 container, as I originally planned. Which I think only emphasises that slide sequences should be considered first-class media streams like the others.

@fstirlitz fstirlitz force-pushed the slideshows branch 2 times, most recently from 6639111 to 387feda Compare June 3, 2021 08:46
This downloader is intended to be used for streams that consist of a
timed sequence of stand-alone images, such as slideshows or thumbnail
streams.
@pukkandan
Copy link
Member

Doesn’t a negative preference/quality value mitigate problems with format selection?

When using best, yes. But not when using worst

This is actually part of a more general problem, which Mediasite already hits with some regularity: we don’t handle sites which offer multiple independent, but synchronised streams very well. If I understand correctly, #347 likewise ultimately boils down to this.

#347 is a different issue. There, the extractor simply ignores /sub or /dub part of the URL and extracts everything. I have a pretty good idea how it can be fixed, but don't have any means of testing it

Back to our issue; As I understand, multi_video is meant precisely to handle completely seperate content that is served together

Add a new key in the infodict for this, similar to thumbnails/subtitles and corresponding cli --write option. This is the easiest solution since we don't need to modify any of the existing code

I actually thought about doing the opposite and integrating subtitles into the formats list. These are after all synchronised with the other media streams and can be considered a first-class alternative medium of presentation. It’s exactly how FFmpeg treats them, too. (Poster thumbnails, on the other hand, don’t fit into this; they are more akin to metadata tags.)

If we were to consider subtitles/slides as "formats", we are effectively expanding the scope of yt-dlp to allow downloading of subs and slides without any associated video/audio. Not that this is necessarily a bad thing, but is something to think about when deciding on which method to use.

Also, if we were to treat slides as normal formats, the mediastream implementation is technically incorrect since the slides and the video are infact totally seperate content (unlike the images from mpd) and should therefore be treated as a seperate video (using multi_video)

I am still leaning towards treating slides seperately than formats if only just for the fact that it is easier to implement and since I don't see any practical benefit to making them first-class citizens.


Btw, I've directly pushed some small changes since I thought it would be easier than reviewing. If you dislike any of these changes, let me know

@pukkandan
Copy link
Member

Also, I already told you this in another PR, but please try to avoid force-pushing once the review process has started. It makes it harder for me to keep track of the changes you are making

@fstirlitz
Copy link
Contributor Author

From the documentation:

   _type "multi_video" indicates that there are multiple videos that
   form a single show, for examples multiple acts of an opera or TV episode.
   It must have an entries key like a playlist and contain all the keys
   required for a video at the same time.

This implies that multi_video is for streams meant to be played in sequence (i.e. concatenated). Mediasite, however, serves multiple streams and plays them simultaneously in sync with each other (i.e. multiplexed). As such, it is meaningful (note: this is different from ‘technically possible’) to want to merge them into a single file with the + operator, unlike with the use case mentioned above for multi_video. It’s not much different from offering a single video with multiple alternative audio tracks. In fact, this was the initial use case for ytdl-org/youtube-dl#6454. That site has since switched video hosts, and there is no longer a separate slide or alternative audio stream; but when it did, there was a video-only slides stream and an audio-only translation stream in addition to the main video showing the speaker speaking their mother tongue. Merging them allowed the user to switch video and audio streams in the player on the fly. Had I chosen a different container format, it would have been possible here, too.

@Lesmiscore
Copy link
Contributor

Lesmiscore commented Jun 7, 2021

@pukkandan

Add a new key in the infodict for this, similar to thumbnails/subtitles and corresponding cli --write option. This is the easiest solution since we don't need to modify any of the existing code

I agree to this idea. I think there should be a mode for writing out slides/storyboards, since they are absolutely different.

Edit: probably not absolutely, but different though

@fstirlitz
Copy link
Contributor Author

Another problem with the ‘separate key’ approach is that since storyboards are often embedded in DASH streaming manifests (and I guess they may also appear in other streaming manifests), supporting them uniformly everywhere might require the same kind of API churn that #247 had to involve, with all the _extract_*_formats methods renamed to _extract_*_formats_and_subtitles and warnings when the old API had to drop streams.

@pukkandan
Copy link
Member

Mediasite, however, serves multiple streams and plays them simultaneously in sync with each other (i.e. multiplexed). As such, it is meaningful (note: this is different from ‘technically possible’) to want to merge them into a single file with the + operator, unlike with the use case mentioned above for multi_video

This is a good point. But I am not sure how to improve this situation (without massive changes ofc). If you have any suggestions, we can discuss it (in a new issue preferably)

Another problem with the ‘separate key’ approach is that since storyboards are often embedded in DASH streaming manifests (and I guess they may also appear in other streaming manifests), supporting them uniformly everywhere might require the same kind of API churn that #247 had to involve, with all the extract_formats methods renamed to extract_formats_and_subtitles and warnings when the old API had to drop streams.

This can be worked around by letting the extractor add the storyboards as just another format and then plucking it out into a new key. I had the same idea for subtitles, but I never mentioned it since you had already replaced the functions

But in that case, it may be cleaner to just directly treat it as a format in that case, although it is a lot more work. Assuming we are going by that approach, how would we handle an option to download the images seperately? With the other approach, I was thinking something like --write-slides --slide-format [mhtml|image].

With the "formats method", we would need 2 separate formats and separate selectors (eg: bestslides, bestimages) for this, no? Or, perhaps we should fold this feature into --keep-fragments and just ensure the fragments have correct thumbnail?

pukkandan added a commit that referenced this pull request Jun 12, 2021
Necessary for #343.

* They are identified by `vcodec=acodec='none'`
* These formats show as the worst in `-F`
* Any postprocessor that expects audio/video will be skipped
* `b*` and all related selectors will skip such formats
* This commit also does not add any selector for downloading such formats. They have to be explicitly requested by the `format_id`. Implementation of a selector is left for when #389 is resolved
@pukkandan
Copy link
Member

I have added minimum support for images format in 8326b00. With this, formats with acodec and vcodec = 'none' will be treated as images. The selectors best, worst, best* etc will never select such a format. The user will have to explicitly specify it with the format id. We can make proper selectors for this when we address #389

With that, this should be now ready for merge. I'll wait for your reply however in case you want to make any other changes

@fstirlitz
Copy link
Contributor Author

A priori, acodec and vcodec both being none doesn’t necessarily have to mean an image stream, so that representation is not ideal. And I would still rather think of them as extremely bare-bones video streams. The preprocessor troubles may have been addressed simply by declaring mhtml is not a format supported by FFmpeg, and the selectors could be made ignore negative-preference streams (or maybe some other sentinel value). But if it’s a temporary hack anyway… (famous last words)

@pukkandan
Copy link
Member

A priori, acodec and vcodec both being none doesn’t necessarily have to mean an image stream

How so?

And I would still rather think of them as extremely bare-bones video streams.

How is it a barebones video stream? It is not a video format and no video player / video processing tool will recognize it as such

@fstirlitz
Copy link
Contributor Author

MHTML is not a video format. But the image stream itself is not too hard to squeeze into one, as mentioned before. The DASH storyboards should be easy to mux into an MJPEG stream in an MP4 container using FFmpeg’s concat pseudo-demuxer; perhaps we may add that some day. (Mediasite slideshows too, but at the cost of some lossiness.) It would be strange to consider storyboards both a video and a non-video stream, depending on the format it’s downloaded as. (And FFmpeg considers even still images to be ‘video’ formats.)

Subtitles were mentioned before as another timed media stream that is neither audio or video. FFmpeg also supports abstract timed data streams (TTML subtitles are currently only parsed as those). Though subtitles are already special-cased and I doubt we might ever deal with more abstract data, the inference ‘if it’s neither audio nor video, then it’s an image sequence’ does not seem particularly, ahem, sound.

@pukkandan
Copy link
Member

the inference ‘if it’s neither audio nor video, then it’s an image sequence’ does not seem particularly, ahem, sound.

fair enough, but a more generalized implementation will need to be rewritten again when implementing #389. Hence why I don't want to put any more effort into it

@fstirlitz
Copy link
Contributor Author

Okay then. Like I said, it’s not that bad, especially for a stop-gap measure.

pukkandan pushed a commit that referenced this pull request Jun 13, 2021
This downloader is intended to be used for streams that consist of a
timed sequence of stand-alone images, such as slideshows or thumbnail
streams

This can be used for implementing:

ytdl-org/youtube-dl#4974 (comment)
ytdl-org/youtube-dl#4540 (comment)
ytdl-org/youtube-dl#11185 (comment)

ytdl-org/youtube-dl#9868
ytdl-org/youtube-dl#14951


Authored by: fstirlitz
@pukkandan
Copy link
Member

I have merged this as 2 commits - one for the downloader, and another for mediasite

@pukkandan pukkandan closed this Jun 13, 2021
@fstirlitz
Copy link
Contributor Author

Postponing DASH then?

@pukkandan
Copy link
Member

No, it is in cdb19aa

I only split out mediasite coz it is extractor specific and not really part of the core feature

@pukkandan
Copy link
Member

I should have mentioned the DASH storyboards in the commit message...

I tried adding a git-notes to it, but github doesn't seem to support displaying it anywhere

@fstirlitz fstirlitz deleted the slideshows branch July 11, 2021 14:06
nixxo pushed a commit to nixxo/yt-dlp that referenced this pull request Nov 22, 2021
Necessary for yt-dlp#343.

* They are identified by `vcodec=acodec='none'`
* These formats show as the worst in `-F`
* Any postprocessor that expects audio/video will be skipped
* `b*` and all related selectors will skip such formats
* This commit also does not add any selector for downloading such formats. They have to be explicitly requested by the `format_id`. Implementation of a selector is left for when yt-dlp#389 is resolved
nixxo pushed a commit to nixxo/yt-dlp that referenced this pull request Nov 22, 2021
This downloader is intended to be used for streams that consist of a
timed sequence of stand-alone images, such as slideshows or thumbnail
streams

This can be used for implementing:

ytdl-org/youtube-dl#4974 (comment)
ytdl-org/youtube-dl#4540 (comment)
ytdl-org/youtube-dl#11185 (comment)

ytdl-org/youtube-dl#9868
ytdl-org/youtube-dl#14951


Authored by: fstirlitz
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants