Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BBC Sounds] Tracklist Extraction #7788

Merged
merged 4 commits into from Sep 16, 2023

Conversation

garret1317
Copy link
Collaborator

@garret1317 garret1317 commented Aug 7, 2023

IMPORTANT: PRs without the template will be CLOSED

Description of your pull request and other information

This PR adds tracklist extraction for BBC Sounds.
In doing so, it moves the BBC Sounds extraction from BBCCoUkIE to BBCIE, changing the metadata extracted slightly.

I've greatly enjoyed having tracklists extracted as chapters for another radio extractor I've written - it lets me see what's playing without having to use shazam etc, and I can skip past songs I don't like. It also works quite well for skipping news breaks. The BBC Sounds website and app have tracklists available, with full start and end times - but this isn't being extracted.

Looking at the site's HTML, I saw a big JSON blob in window.__PRELOADED_STATE__ with all the programme's metadata, including the tracklist.
Ctrl-F'ing for __PRELOADED_STATE__ in the extractor, I saw that BBCIE had a section that handled this, but it didn't seem to be using it for Sounds. Instead, it was handled by something in BBCCoUkIE.
I didn't see the metadata that I needed available in BBCCoUkIE's network requests, so I removed the sounds/play/ from BBCCoUkIE's _VALID_URL regex.
Now BBCIE handles it with the existing preloaded state code + new tracklist code.

I don't think this breaks any other pages, but I can't be sure as a lot of the URLs in the tests are long gone.
The tests that weren't for dead links didn't seem to be broken by this, at least. (mostly title changes)

The metadata that's extracted has changed. The main changes I noticed were:

  • Programme, Episode is now Programme - Episode
  • the description is now the long, full length, version
  • the station name is extracted as the uploader
  • there are a few more formats

So this will doubtless break someone's workflow.

Template

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Copilot Summary

馃 Generated by Copilot at f007598

Summary

馃帶馃敡馃摐

Improved BBC Sounds extraction and cleaned up BBC code. Added support for BBCIE to handle BBC Sounds URLs and audio chapters, and removed unused code and tests from BBCCoUkIE.

To handle the BBC Sounds links
The BBCIE extractor thinks
It parses the tracklist
For audio chapters to list
And imports some utils for kinks

Walkthrough

  • Import join_nonempty and traverse_obj functions from utils.py for string concatenation and nested value access (link, link)
  • Remove sounds/play/ pattern from BBCCoUkIE extractor and its test case, since it is now handled by BBCIE extractor (link, link)
  • Add a new test case for sounds/play/ URL to BBCIE extractor, demonstrating audio extraction and metadata (link)
  • Parse tracklist information from preload_state variable in BBCIE extractor, and assign it as chapters to the audio (link, link)

@bashonly bashonly added the site-enhancement Feature request for some website label Aug 22, 2023
yt_dlp/extractor/bbc.py Outdated Show resolved Hide resolved
@bashonly bashonly added the pending-fixes PR has had changes requested label Sep 14, 2023
garret1317 and others added 2 commits September 15, 2023 01:42
Co-Authored-By: bashonly <88596187+bashonly@users.noreply.github.com>
old one expired
its annoying because the ones that last the longest (world service mostly)
dont have tracklists
@bashonly bashonly added pending-review PR needs a review and removed pending-fixes PR has had changes requested pending-review PR needs a review labels Sep 15, 2023
@bashonly bashonly merged commit eda0e41 into yt-dlp:master Sep 16, 2023
13 checks passed
aalsuwaidi pushed a commit to aalsuwaidi/yt-dlp that referenced this pull request Apr 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
site-enhancement Feature request for some website
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants