Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[extractor/sbs] Overhaul extractor for new APIs #6839

Merged
merged 9 commits into from Apr 18, 2023

Conversation

vidiot720
Copy link
Contributor

Description of your pull request and other information

From dirkf's PR: Australian provider SBS has changed its hosting arrangements and APIs, breaking the existing extractor.

The PR is intended to deal with the new APIs.

Some specialisations of extractor methods are included:

  • _download_webpage_handle() detects geo-restriction
  • _extract_m3u8_formats() defaults to the native downloader.

That PR was developed for upstream, so adjusted for deprecations and other legacy compatibility inclusions. It also addresses issue with extracting and converting dxfp subtitles to other formats, since SBS deliver their UTF-8 encoded files with encoding='UTF-16', in error. The workaround is included in utils.py. Note that it will not be triggered where a file actually encoded with UTF-16 is downloaded.

Fixes #6543. Adapt dirkf's PR ytdl-org/youtube-dl#31880, with adjustments:

  • Use media['name'] for title;
  • Support urls including 'tv-program', and add a test url;
  • Remove deprecated sort_formats() call;
  • Support for capture and return of subtitles, with UTF-16 coding issue work-around when converting dfxp subs to srt.

See also ytdl-org/youtube-dl#31841.

There are a few TODOs such as handling metadata for episode titles more nicely, and the downloaded subtitles (if not converted) will still have the wrong encoding. However these can be worked-around and don't cause exceptions during yt-dlp processing, so have raised this PR in order to get core extractor functions back ASAP.

Boilerplate: bug fix, derived code, own tweaks.

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense See dirkf's release in ytdl-org/youtube-dl#31880.

What is the purpose of your pull request?

* Use media['name'] for title;
* Support urls including 'tv-program', and add a test url;
* Remove deprecated sort_formats() call;
* Support for capture and return of subtitles, with UTF-16 coding issue work-around when converting dfxp subs to srt.
yt_dlp/extractor/sbs.py Outdated Show resolved Hide resolved
yt_dlp/extractor/sbs.py Outdated Show resolved Hide resolved
yt_dlp/extractor/sbs.py Outdated Show resolved Hide resolved
yt_dlp/extractor/sbs.py Outdated Show resolved Hide resolved
yt_dlp/extractor/sbs.py Outdated Show resolved Hide resolved
yt_dlp/extractor/sbs.py Outdated Show resolved Hide resolved
yt_dlp/extractor/sbs.py Outdated Show resolved Hide resolved
yt_dlp/utils.py Outdated Show resolved Hide resolved
yt_dlp/extractor/sbs.py Show resolved Hide resolved
@bashonly bashonly added site-bug Issue with a specific website pending-fixes PR has had changes requested labels Apr 17, 2023
…rought over from upstream PR:

* Inline single-use helper functions and remove unneeded temporary objects
* Leverage more powerful traverse_obj() features and eliminate extra helper functions
* Stream-line and improve geo-blocking handling
* Correctly set 'episode' for named episodes, instead of 'Episode 1', where available.
* Re-lint and clean up unneeded imports.
@vidiot720 vidiot720 requested a review from bashonly April 17, 2023 12:36
yt_dlp/extractor/sbs.py Outdated Show resolved Hide resolved
yt_dlp/extractor/sbs.py Outdated Show resolved Hide resolved
yt_dlp/extractor/sbs.py Outdated Show resolved Hide resolved
* More inlining of single items, and extra blank line removed
* Simplify episode setting code
* Simplify season number setting; removed extraneous partOfSeries path
@pukkandan pukkandan removed the pending-fixes PR has had changes requested label Apr 17, 2023
vidiot720 and others added 2 commits April 17, 2023 23:14
Co-authored-by: pukkandan <pukkandan.ytdlp@gmail.com>
Co-authored-by: pukkandan <pukkandan.ytdlp@gmail.com>
yt_dlp/extractor/sbs.py Show resolved Hide resolved
yt_dlp/extractor/sbs.py Outdated Show resolved Hide resolved
Updates from bashonly:
* sort class above function
* for livestreams, `livestream` is `True` in catalogue

Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>
@vidiot720 vidiot720 mentioned this pull request Apr 17, 2023
11 tasks
yt_dlp/extractor/sbs.py Outdated Show resolved Hide resolved
yt_dlp/extractor/sbs.py Outdated Show resolved Hide resolved
Copy link
Member

@bashonly bashonly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested my fixes to the geo-bypass/geo-restricted code and everything's working now

@bashonly bashonly merged commit 6a765f1 into yt-dlp:master Apr 18, 2023
11 checks passed
@vidiot720 vidiot720 deleted the sbs-2023-changes branch April 19, 2023 00:23
@bossanova808
Copy link

Thank you very much for this, things working very well again.

@VampiricAlien
Copy link

@vidiot720 or @bashonly
I am trying to grab Audio Described file but since this update I am unable to find or download anything with more then one audio file attached to it and wanted to know if it could be because of this update?

-f all[vcodec=none] --audio-multistream or --audio-multistream or --audio-multistreams', '-f', 'bestvideo+mergeall[vcodec=none] all gives Requested format is not available. Use --list-formats for a list of available formats

The program pages says AD and other shows that is supposed to support it is Home Is Where The Art Is Series 1 (Ep.11) but --list-formats doesn't show any other audio.

`[SBS] Extracting URL: https://www. sbs. com.au /ondemand/movie/ the-lost-city-of-melbourne/2264088643618

[info] Available formats for 2264088643618:
ID EXT RESOLUTION FPS │ FILESIZE TBR PROTO │ VCODEC ACODEC
────────────────────────────────────────────────────────
hls-439 mp4 398x224 25 │ ~272.42MiB 439k m3u8 │ avc1.4D401E mp4a.40.2
hls-870 mp4 640x360 25 │ ~540.06MiB 871k m3u8 │ avc1.4D401E mp4a.40.2
hls-1419 mp4 1024x576 25 │ ~880.17MiB 1419k m3u8 │ avc1.4D4029 mp4a.40.2
hls-1981 mp4 1280x720 25 │ ~ 1.20GiB 1981k m3u8 │ avc1.4D4029 mp4a.40.2`

@vidiot720
Copy link
Contributor Author

The program pages says AD and other shows that is supposed to support it is Home Is Where The Art Is Series 1 (Ep.11) but --list-formats doesn't show any other audio.

The on-demand program pages are a bit misleading to include the AD symbol; per https://www.sbs.com.au/aboutus/audio-description-services/#faqs-about-audio-description,

Is audio description available on SBS On Demand and SBS’s online services?

Audio description is not currently available on SBS On Demand or SBS’s online and social media services.

Currently, audio description is only available via SBS’s broadcast television. If you are watching SBS programs on SBS On Demand, including streaming SBS channels live, audio description will not be available. We suggest that you switch to broadcast television, if possible, in order to hear audio description.

I'd keep an eye on the FAQ to see if AD becomes available on On Demand; it's probably non-trivial for SBS to add support for multiple audio streams since an interface for stream selection would need to be added. There may be some hope given the use of "Currently, ...".

aalsuwaidi pushed a commit to aalsuwaidi/yt-dlp that referenced this pull request Apr 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
site-bug Issue with a specific website
Projects
None yet
Development

Successfully merging this pull request may close these issues.

All downloads from SBS broken
5 participants