Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[extractor/goplay] Fix GoPlay downloads #6654

Merged
merged 20 commits into from Feb 19, 2024
Merged

Conversation

alard
Copy link
Contributor

@alard alard commented Mar 27, 2023

IMPORTANT: PRs without the template will be CLOSED

Description of your pull request and other information

This PR fixes two issues with the GoPlay extractor for recent broadcasts:

  1. Adds support for the Google streaming platform (dai.google.com with server-side ad insertion, SSAI);
  2. Fixes the download of multi-period MPD streams by merging the periods.

Point 1 is fairly straightforward: the metadata JSON provides a new ssai field that points to the manifest.mpd. See also this recent change in the Retrospect Kodi addon.

Point 2 is more difficult. The GoPlay manifests split programmes in multiple periods (e.g., three parts with ads in between). The current MPD implementation ignores the period division and lists all streams in a single list of formats. As a result, there are usually three audio and three video streams with the highest quality. By default, yt-dlp downloads only one audio and one video stream: this is only one period of the programme, and sometimes the audio does not match the video.

This PR fixes this issue at the GoPlay extractor level, by merging the fragments lists of the same format in different periods. The end result is a formats list with only one stream for each format that combines all periods.

It might be better to solve this at the MPD download level instead of in the GoPlay extractor. I assume that this is a general problem for any MPD manifest with multiple periods. But that is a separate issue and might require larger changes in how yt-dlp works.

Fixes #6235.

Template

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

@KevinDenys
Copy link

I tested it using: https://www.goplay.be/video/assisen/assisen-s1/assisen-s1-aflevering-6#autoplay

image

python3 -m yt_dlp https://www.goplay.be/video/assisen/assisen-s1/assisen-s1-aflevering-6\#autoplay --username removed --password removed -f MERGED-093786da-a9ba-4c80-ac74-ff043139a312

It only downloaded 22min 44seconds (full episode is 43:44)

@pukkandan
Copy link
Member

pukkandan commented Mar 27, 2023

But that is a separate issue and might require larger changes in how yt-dlp works.

As far as I can tell, you just need to move _merge_mpd_periods into _parse_mpd_formats_and_subtitles. Am I missing something?

@KevinDenys
Copy link

But that is a separate issue and might require larger changes in how yt-dlp works.

As far as I can tell, you just need to move _merge_mpd_periods into _parse_mpd_formats_and_subtitles. Am I missing something?

More info about the 'part' splitting here #6235 (comment)

@alard
Copy link
Contributor Author

alard commented Mar 27, 2023

But that is a separate issue and might require larger changes in how yt-dlp works.

As far as I can tell, you just need to move _merge_mpd_periods into _parse_mpd_formats_and_subtitles. Am I missing something?

@pukkandan Yes, that would work in this case, but this would also change it for all other extractors that use that code. Is merging periods something that should happen everywhere? It might make sense, but I don't know if there are cases where you'd like to have the periods separately. I also don't know how this would work for livestreams.

In short: yes, it would work for GoPlay and other cases, but I don't know enough about MPD manifests or yt-dlp to say if you're missing something. But I'd be happy to move it to _parse_mpd_formats_and_subtitles.

@alard
Copy link
Contributor Author

alard commented Mar 27, 2023

I tested it using: https://www.goplay.be/video/assisen/assisen-s1/assisen-s1-aflevering-6#autoplay

python3 -m yt_dlp https://www.goplay.be/video/assisen/assisen-s1/assisen-s1-aflevering-6\#autoplay --username removed --password removed -f MERGED-093786da-a9ba-4c80-ac74-ff043139a312

It only downloaded 22min 44seconds (full episode is 43:44)

Strange. Can you see how many fragments it downloads? 22:44 is the length of the first period, according to the manifest.

For me it shows [dashsegments] Total fragments: 1292, which is the sum of the three periods, and gives me a 43:44 file. Does it work better if you also download the audio? (e.g., -f wa+wv?)

@KevinDenys
Copy link

[download] 1.9% of ~ 672.59MiB at 1.38MiB/s ETA 08:47 (frag 24/1292)

Looks like its also 1292 fragments

Then I checked the file using VLC

VLC is indeed showing the full 44min when scrolling to the end. But Mac details + Mac internal video player only shows 22min(see SS)

image

@alard
Copy link
Contributor Author

alard commented Mar 27, 2023

I see. Maybe there is something strange when you concatenate these streams. I only have VLC, for which it works, and ffmpeg also doesn't show any issues with the file.

I've now changed the code to skip the init.mp4 initialization file in the follow-up periods. It's always at the start of the fragments of a period, so maybe your player runs into issues if that appears in the middle of the combined file. Does it work better now?

@KevinDenys
Copy link

I see. Maybe there is something strange when you concatenate these streams. I only have VLC, for which it works, and ffmpeg also doesn't show any issues with the file.

I've now changed the code to skip the init.mp4 initialization file in the follow-up periods. It's always at the start of the fragments of a period, so maybe your player runs into issues if that appears in the middle of the combined file. Does it work better now?

[GoPlay] Logging in
[GoPlay] Authenticating username
[GoPlay] Authenticating password
[GoPlay] Extracting URL: https://www.goplay.be/video/assisen/assisen-s1/assisen-s1-aflevering-6
[GoPlay] assisen-s1-aflevering-6: Downloading webpage
[GoPlay] 0622232b-19ac-4efc-9134-db561771ef62: Downloading JSON metadata
[GoPlay] 0622232b-19ac-4efc-9134-db561771ef62: Downloading JSON metadata
[GoPlay] 0622232b-19ac-4efc-9134-db561771ef62: Downloading MPD manifest
ERROR: An extractor error has occurred. (caused by KeyError('url')); please report this issue on https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using yt-dlp -U
File "D:\yt\yt-dlp\yt_dlp\extractor\common.py", line 694, in extract
ie_result = self._real_extract(url)
File "D:\yt\yt-dlp\yt_dlp\extractor\goplay.py", line 116, in _real_extract
formats = self._merge_mpd_periods(formats)
File "D:\yt\yt-dlp\yt_dlp\extractor\goplay.py", line 152, in _merge_mpd_periods
if len(mf['fragments']) == 0 or 'init.mp4' not in ff['url']:
KeyError: 'url'

@alard
Copy link
Contributor Author

alard commented Mar 28, 2023

That might be ad-related. With my Dutch IP address I don't get any ads. With a Belgian IP the manifest did include some ads, which are served differently and do not have the url in the fragments. I also saw a much longer list of formats.

The new version should skip any fragment that doesn't have a url key, so you should only get the main DASH stream.

@KevinDenys
Copy link

It fixed the download issue, but VLC is now showing 00:00 as video length. While scrolling trough VLC it suddenly shows 41:09 but there are a ton of artifcats

image

@KevinDenys
Copy link

I ran the file trough avidemux and put the correct end time and saved it. Now opening the file in VLC is showing 41:09 and no more artificats. Is there a way to give it the correct time length in yt-dlp? I guess its some sort of metadata?

@KevinDenys
Copy link

@pukkandan pointed me to the YoutubeDL.py file
Changed
ffmpeg_fixup(info_dict.get('is_live') and downloader == 'DashSegmentsFD', 'Possible duplicate MOOV atoms', FFmpegFixupDuplicateMoovPP)
To
ffmpeg_fixup(downloader == 'dashsegments', 'Possible duplicate MOOV atoms', FFmpegFixupDuplicateMoovPP)

And after downloadin the file it did the fixup and now the correct timestamp is shown also the artifacts are gone

@bashonly
Copy link
Member

bashonly commented Mar 28, 2023

So instead of omitting the duplicate init fragments, we can just run the FixupDuplicateMoov postprocessor. The above patch was for testing purposes though, and is not viable since it would impact all DASH downloads (e.g. regular youtube videos, etc). Maybe there could be a new format field added for YoutubeDL.process_info to check? multi_period_dash or something?

I think the problem may have been caused by the dash init fragments containing duration metadata and other information that interferes with playback. Ideally the DASH periods would be downloaded as separate videos and then properly concatenated with ffmpeg. But if joining all the periods' fragments into one video and then fixing up with ffmpeg afterward works, then that'll do.

@alard
Copy link
Contributor Author

alard commented Mar 28, 2023

OK, new approach, based on @bashonly's comment: using _extract_mpd_formats_and_subtitles with a new multi_period option to return the periods as entries in a multi_video item. For the GoPlay stream this downloads each period separately and then merges them afterwards using the existing post-processing.

@alard alard force-pushed the f-goplay branch 4 times, most recently from 9a94da0 to fbd8374 Compare March 28, 2023 22:26
@alard
Copy link
Contributor Author

alard commented Mar 29, 2023

I'm not on Discord so I haven't followed the discussion there. I'm happy to switch to the first solution if that's the preferred option. However, to me the second option feels cleaner (apart from the method names and actual implementation).

Concatenating the fragments list based on the format is messy. It's a hack that works for GoPlay, but it is far from perfect:

  • It assumes that the format is the same for all periods. That probably works in most cases, and it does here, but I'm not sure this is necessarily true for all MPD manifests.
  • It assumes that the main stream always has the best format. If a pre-roll ad has a higher resolution than the main video, the ad will be downloaded because it is "the best format".
  • There's no way to remove ads: if they have the same format as the main video they are automatically included.
  • There are apparently some technical issues with concatenating the video fragments in this way, which requires post-processing with ffmpeg.
  • If you merge the periods in common.py, it will change the behavior for all extractors. Because the period structure of the manifest is removed, it is very difficult to separate the parts later.

So if this is the preferred option, I think it would be best to do the merging of the fragments in the GoPlay extractor so it doesn't affect anything else.

On the contrary, the second option has several advantages:

  • It keeps the period information, so it is a much cleaner representation of the structure of the manifest.
  • It works for any combination of formats and periods.
  • It is easy to select or ignore specific periods (such as ads) in the extractor.
  • Implementing this as a multi_video (a single show with multiple parts, according to the documentation) seems logical and reuses the post-processing code that already exists: there's even an option to download separate parts or have them concatenated.
  • You can do this in a backwards-compatible way that doesn't require changes in existing extractors, but does give you the flexibility for extractors that can use it.

I'll change the method name so we can see how that looks. If that doesn't help, we can always go back to the first option.

@KevinDenys
Copy link

I pulled your version and tested it

To be honest, now it gives back a playlist of 3 parts with each its own 1000k,1500k,2300k,4300k output... and is a hassle to download a certain format.

image

I prefered the old method where it merged the fragments. The only thing that needed to happen was run the 'ffmpeg' fix. (see #6654 (comment)).

Not sure what the correct aproach is here..

@KevinDenys
Copy link

That was quick! Pulled it and tested it.

Merges are working great, but I don't see the ffmpeg fix getting triggered. I checked the downloader and its 'dashsegments' and not 'DashSegmentsFD' so I guess that's why it isn't triggered

image

@alard
Copy link
Contributor Author

alard commented Mar 31, 2023

For the formats, you could use something like --format 'bv[width=1024]+ba', that would work for all periods.

But sure, it might be useful to merge the formats in this case. Perhaps something like this? First parse the MPD manifest with the correct period structure, then make an extractor-specific decision on whether and how to merge the formats.

@KevinDenys
Copy link

dash.py: FD_NAME = 'dashsegments' so I guess the check should be

(info_dict.get('is_live') or info_dict.get('is_dash_periods')) and downloader == 'dashsegments'

@alard
Copy link
Contributor Author

alard commented Mar 31, 2023

Yes, I suppose so. But then it also didn't run for is_live streams.

@KevinDenys
Copy link

KevinDenys commented Mar 31, 2023

Yes, I suppose so. But then it also didn't run for is_live streams.

Yea I am wondering, I can't find any reference to DashSegmentsFD (except the class name) so I guess that issue was already in the code

@KevinDenys
Copy link

Pull the latest code and did a quick test

image

fixupduplicatemoov is getting triggered now 🥇

@Millio345
Copy link

Thank you for all the effort; Tried with a few different videos and seems to be working well

@alard alard marked this pull request as ready for review April 4, 2023 06:56
@pukkandan pukkandan mentioned this pull request Jul 20, 2023
10 tasks
@bashonly bashonly mentioned this pull request Aug 6, 2023
11 tasks
@alard
Copy link
Contributor Author

alard commented Aug 30, 2023

@pukkandan If there's anything I can do to make this easier to review, please let me know!

@bashonly bashonly requested review from bashonly and removed request for KevinDenys October 17, 2023 20:39
@Tuurkevg

This comment was marked as duplicate.

@Heidyiam

This comment was marked as duplicate.

@pukkandan pukkandan removed their request for review December 30, 2023 07:27
yt_dlp/extractor/common.py Outdated Show resolved Hide resolved
yt_dlp/extractor/common.py Outdated Show resolved Hide resolved
yt_dlp/extractor/common.py Outdated Show resolved Hide resolved
yt_dlp/extractor/common.py Outdated Show resolved Hide resolved
yt_dlp/extractor/common.py Outdated Show resolved Hide resolved
yt_dlp/extractor/goplay.py Outdated Show resolved Hide resolved
yt_dlp/extractor/goplay.py Outdated Show resolved Hide resolved
yt_dlp/extractor/goplay.py Show resolved Hide resolved
yt_dlp/extractor/goplay.py Outdated Show resolved Hide resolved
yt_dlp/extractor/common.py Outdated Show resolved Hide resolved
@alard
Copy link
Contributor Author

alard commented Feb 4, 2024

@pukkandan Thanks for the review. I've made the changes you suggested.

An open issue is the support for subtitles in multi-period streams. There was a warning for this (triggered on streams with more than one period and at least one subtitle), which I have now removed. But maybe it should be put back.

The reason this warning was here is that the combination of multi-period streams with subtitles might not yet work. For the audio/video streams, the fragments lists are concatenated to download one big stream. For subtitles, there is currently no such thing: they will show up as separate files.

I have not yet found a streaming URL with multi-period subtitles where I can test this, so I have no idea what will happen. Some MPD manifests seem to list the same subtitle file in each period and rely on the time codes to work over multiple periods. In the current implementation, I think the single subtitle file might work with the concatenated video. But I am not entirely sure, hence the warning.

If the subtitles are split over multiple files, then I think you'd have to download all of them, and I suspect you may have to correct the timestamps as well. This is speculation, I have no examples. At least the subtitles will now all show up in the list.

Maybe it would be good to keep the warning that multi-period streams with subtitles are untested, so people can submit bug reports and examples to find these edge cases in the wild?

@alard alard requested a review from pukkandan February 4, 2024 02:42
Copy link
Member

@pukkandan pukkandan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls avoid force-pushing

yt_dlp/extractor/common.py Outdated Show resolved Hide resolved
yt_dlp/extractor/common.py Outdated Show resolved Hide resolved
yt_dlp/extractor/common.py Outdated Show resolved Hide resolved
@alard
Copy link
Contributor Author

alard commented Feb 8, 2024

A temporary failure in one of the tests, but otherwise it should be fine.

@alard
Copy link
Contributor Author

alard commented Feb 19, 2024

@pukkandan Is there anything else that needs to be done before this can be merged?

pukkandan pushed a commit that referenced this pull request Feb 19, 2024
@pukkandan pukkandan merged commit 7e90e34 into yt-dlp:master Feb 19, 2024
6 checks passed
aalsuwaidi pushed a commit to aalsuwaidi/yt-dlp that referenced this pull request Apr 21, 2024
aalsuwaidi pushed a commit to aalsuwaidi/yt-dlp that referenced this pull request Apr 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
site-bug Issue with a specific website
Projects
None yet
Development

Successfully merging this pull request may close these issues.

goplay extractor broken
8 participants