Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CBCPlayer #7484

Merged
merged 37 commits into from Aug 12, 2023
Merged

CBCPlayer #7484

merged 37 commits into from Aug 12, 2023

Conversation

trainman261
Copy link
Contributor

@trainman261 trainman261 commented Jul 1, 2023

IMPORTANT: PRs without the template will be CLOSED

Description of your pull request and other information

Main Goal

Add subtitle and additional format support.

Details of the changes

Currently, the ThePlatformIE just tries to extract the video directly, as well as the JSON information. This sometimes misses certain formats and subtitles: since often the direct video is the maximum quality, it doesn't currently allow downloading lower quality formats in some cases. The information about subtitles also sometimes doesn't exist here.
As a solution to this, the m3u8 playlist file is also analyzed. This can help find additional formats as well as subtitles.

Related issues that came out of this improvement

For one, there's the issue of changing ThePlatformIE, as it's an IE that other IEs rely on. There is no overly difficult way to systematically ensure that these changes don't break other IE's. I've done my best to make sure that the changes don't affect the functionality, but it's hard to be sure. See #7483 .
Then there the issue of checking m3u8 files. In this PR, it is handled by the ThePlatformIE, but it seems to me like this should be in the parent InfoExtractor class. Mind you, here, I'm doing it by checking the file extension, but for the parent class, it would make sense to check the first line of the m3u8 file. See #7482 .

Fixes #

Template

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Copilot Summary

🤖 Generated by Copilot at 9f5d60f

Summary

🧪🎬🚫

Improved extraction of CBC videos and added subtitles and chapters support. Added m3u8 and subtitles support for ThePlatform videos. Updated some test cases in cbc.py and theplatform.py.

Sing, O Muse, of the skillful coder who refined
The CBCPlayerIE extractor, and with cunning mind
Added subtitles and chapters, as the heroes of old
Carved their glorious deeds on tablets of gold.

Walkthrough

  • Add thumbnail, chapters, and duration fields to a test case and override format sorting for CBCPlayerIE (link, link)
  • Add a new test case for subtitles and chapters extraction for CBCPlayerIE (link)
  • Update skip messages for outdated or unavailable videos in CBCPlayerIE and ThePlatformIE tests (link, link, link, link)
  • Download and parse m3u8 file for additional formats and subtitles in ThePlatformIE._real_extract (link)

…at finding the direct mp4 file, but as far as I can tell, there is no direct VTT file accessible on theplatform. As an alternative, this uses the subtitles from the m3u8 file while still keeping the direct mp4 file instead of stitching together the bits declared in the m3u8 file for the video.
… but also further formats available via m3u8. Also added a few comments to clarify things.
…umentation and don't work anymore. Someone who knows their way around these sites that use theplatform should fix this, so that proper regression testing can be done for theplatform.
…ound it works much better, and as far as I can tell there are no regressions anymore.
yt_dlp/extractor/theplatform.py Outdated Show resolved Hide resolved
yt_dlp/extractor/theplatform.py Outdated Show resolved Hide resolved
yt_dlp/extractor/theplatform.py Outdated Show resolved Hide resolved
@bashonly bashonly added site-enhancement Feature request for some website pending-fixes PR has had changes requested labels Jul 1, 2023
@trainman261
Copy link
Contributor Author

Updated some things.
Current status as to requested changes:

  1. As mentioned, I fully agree that truncating the query string is, as @bashonly mentioned, blunt and inexact. I haven't found a better method though. I remain open to suggestions.
  2. It now sends a HEADRequest and checks to make sure it actually serves an m3u8 file. If it doesn't, it doesn't download and carries on. As a side effect, in that case, it also doesn't show the "downloading m3u8 information" message, which makes more sense as well.
  3. It now properly checks the file extension. It also now uses single quotes.

I've also done some regression tests - at least as far as I could without messing with other IEs. As tested, the following tests worked previously:

  • AENetworks
  • CBCGem
  • CBCGemLive
  • NBCSportsStream
  • NBCStations
  • NBCSportsVPlayer (only working with CPython 3.8 as well as pypy 3.7, not working with CPython 3.7)

They all still work with the changes in this PR. Mind you, many of these tests are skipped or not very in-depth. Also, this is 6 tests out of a total of 36, as far as I can tell, so not great coverage. I've noticed a number of the failed tests are just because some fields are missing in the test data. So... should I fix these other tests before it gets merged?

@pukkandan
Copy link
Member

should I fix these other tests before it gets merged?

It'd be great to fix them, but you don't have to.

@trainman261
Copy link
Contributor Author

OK, I'll try to see what I can do.

@trainman261
Copy link
Contributor Author

Should I maybe make a seperate PR for fixing up the tests on other sites? They are really unrelated to this PR other than that I'm trying to get it fixed to improve regression testing for this PR.

@trainman261
Copy link
Contributor Author

Stupid question, I'm trying to get the generic extractor tests up and running. It gives me the error AssertionError: {'cookies'} is not false : Missing keys in test definition: cookies . Problem is, the cookie is different every time. How would one go about resolving this?

@bashonly
Copy link
Member

It's a core bug. Apply this patch to the run tests, but don't commit it

diff --git a/yt_dlp/YoutubeDL.py b/yt_dlp/YoutubeDL.py
index 1a2f42fe9..f4a80ea11 100644
--- a/yt_dlp/YoutubeDL.py
+++ b/yt_dlp/YoutubeDL.py
@@ -572,7 +572,7 @@ class YoutubeDL:
         'width', 'height', 'aspect_ratio', 'resolution', 'dynamic_range', 'tbr', 'abr', 'acodec', 'asr', 'audio_channels',
         'vbr', 'fps', 'vcodec', 'container', 'filesize', 'filesize_approx', 'rows', 'columns',
         'player_url', 'protocol', 'fragment_base_url', 'fragments', 'is_from_start',
-        'preference', 'language', 'language_preference', 'quality', 'source_preference',
+        'preference', 'language', 'language_preference', 'quality', 'source_preference', 'cookies',
         'http_headers', 'stretched_ratio', 'no_resume', 'has_drm', 'extra_param_to_segment_url', 'hls_aes', 'downloader_options',
         'page_url', 'app', 'play_path', 'tc_url', 'flash_version', 'rtmp_live', 'rtmp_conn', 'rtmp_protocol', 'rtmp_real_time'
     }

@pukkandan
Copy link
Member

Should I maybe make a seperate PR for fixing up the tests on other sites? They are really unrelated to this PR other than that I'm trying to get it fixed to improve regression testing for this PR.

Let it be in this. I can split it later if needed.

@trainman261
Copy link
Contributor Author

trainman261 commented Jul 20, 2023

Aha OK, thanks!
And OK, will put it into this PR. To make things easier on my end, I've pushed the individual fixes for IE tests to different branches. I'll then merge everything together into this PR.

@pukkandan
Copy link
Member

pukkandan commented Jul 20, 2023

To make things easier on my end, I've pushed the individual fixes for IE tests to different branches.

I replied under the assumption it is easier for you to have everything in one PR. But if the other way is easier for you, I don't mind that either - you can open another PR with all the test fixes if you want to.

That aside, is there a reason you haven't addressed the requested changes for this PR yet? A Ive aid before, you don't need to necessarily fix the other tests to get this merged.

@trainman261
Copy link
Contributor Author

I thought I caught all the points in https://github.com/yt-dlp/yt-dlp/pull/7484#issuecomment-1634724162? Or did I miss something?
Currently working on fixing up the other tests (quite a few of them, and I didn't have too much time over the last year), so I can do somewhat more comprehensive regression testing with my changes. Once I've done that and I've ensured there are no regressions, I'll update this PR and hopefully it will be merge ready.

…tion, since a 404 otherwise would cause a crash. If no errors ensue checking the extension, it proceeds processing the data."

This reverts commit 36253f2.
@bashonly bashonly removed the pending-fixes PR has had changes requested label Jul 29, 2023
trainman261 and others added 2 commits July 29, 2023 21:30
Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>
@bashonly
Copy link
Member

you can ignore the failing linter, doesn't have anything to do with this PR

@trainman261
Copy link
Contributor Author

OK. Doing a final regression test. On CPython 3.8 everything worked, currently testing CPython 3.7 then PyPy and a final flake8 check.

@trainman261
Copy link
Contributor Author

trainman261 commented Jul 29, 2023

OK, all tests are through, including with f82aa89, no regressions anymore, with CPython 3.8, CPython 3.7 and PyPy 3.7. Flake8 also doesn't flag anything anymore again.
As to https://github.com/yt-dlp/yt-dlp/pull/7484#discussion_r1278355607, does my change meet expectations or is some more slimming down required?
Anything else I've missed or is that it for this PR?
Signing off for tonight.

@bashonly
Copy link
Member

As to #7484 (comment), does my change meet expectations or is some more slimming down required?

imo it's a given that HLS bitrate info is unreliable and comment could be shortened to what was suggested

the rest LGTM

@bashonly bashonly added the pending-review PR needs a review label Jul 29, 2023
@trainman261
Copy link
Contributor Author

OK. Let me know if you need anything else.

@trainman261
Copy link
Contributor Author

Here's a stupid question: in commit 9e34e40 you replaced the truncated query with a more exact query. How did you figure out that mbr=true&manifest=m3u works? Was it just trial and error? I had tried a number of things and nothing I tried worked.
Reason why I'm asking (other than pure curiosity): maybe it's possible to extract the subtitles file in one piece instead of via HLS streaming. It seems kind of ridiculous that the video gets downloaded as a whole, but the relatively tiny subtitle file gets downloaded in a couple hundred pieces. I had also tried a number of things, but nothing I tried worked.

@bashonly
Copy link
Member

bashonly commented Aug 3, 2023

How did you figure out that mbr=true&manifest=m3u works? Was it just trial and error?

@trainman261 It's how the NBC sites get direct m3u8 responses instead of SMIL/etc. But it doesn't work for all sites, hence the error/warning with the ScrippsNetworks test and why we needed to make sure it is completely non-fatal (and not requested unnecessarily)

Reason why I'm asking (other than pure curiosity): maybe it's possible to extract the subtitles file in one piece instead of via HLS streaming. It seems kind of ridiculous that the video gets downloaded as a whole, but the relatively tiny subtitle file gets downloaded in a couple hundred pieces. I had also tried a number of things, but nothing I tried worked.

I don't think ThePlatform provides a direct subtitles link. It's either via m3u8, mpd or SMIL

@pukkandan pukkandan assigned coletdjnz and bashonly and unassigned coletdjnz Aug 6, 2023
@dirkf
Copy link
Contributor

dirkf commented Aug 7, 2023

BTW check test/helper.py for test syntax to catch values that may change, either by just requiring a type, or by specifying a regexp, a partial match, or a min/max value. Upstream commits 2ced5a79128f53faad94dc494d05925eb957c414 (lambda conditions) and e67e52a8f8fd7e76253e416da76570af8da200d0 (variable IDs) may also be helpful.

@trainman261
Copy link
Contributor Author

@dirkf Thanks for answering a question I was too embarrassed to ask 😄 Isn't necessary for this PR, but for other things I was trying to figure out how to do just that.

@trainman261
Copy link
Contributor Author

I noticed the pending-review tag. Is there something that still needs to be done?

@bashonly bashonly removed the pending-review PR needs a review label Aug 9, 2023
@bashonly bashonly merged commit 339c339 into yt-dlp:master Aug 12, 2023
13 checks passed
@trainman261 trainman261 deleted the cbcplayer branch August 14, 2023 20:39
@trainman261 trainman261 restored the cbcplayer branch September 5, 2023 19:28
aalsuwaidi pushed a commit to aalsuwaidi/yt-dlp that referenced this pull request Apr 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
site-enhancement Feature request for some website
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants