[BBC Sounds] Tracklist Extraction #7788
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
IMPORTANT: PRs without the template will be CLOSED
Description of your pull request and other information
This PR adds tracklist extraction for BBC Sounds.
In doing so, it moves the BBC Sounds extraction from
BBCCoUkIE
toBBCIE
, changing the metadata extracted slightly.I've greatly enjoyed having tracklists extracted as chapters for another radio extractor I've written - it lets me see what's playing without having to use shazam etc, and I can skip past songs I don't like. It also works quite well for skipping news breaks. The BBC Sounds website and app have tracklists available, with full start and end times - but this isn't being extracted.
Looking at the site's HTML, I saw a big JSON blob in
window.__PRELOADED_STATE__
with all the programme's metadata, including the tracklist.Ctrl-F'ing for
__PRELOADED_STATE__
in the extractor, I saw thatBBCIE
had a section that handled this, but it didn't seem to be using it for Sounds. Instead, it was handled by something inBBCCoUkIE
.I didn't see the metadata that I needed available in
BBCCoUkIE
's network requests, so I removed thesounds/play/
fromBBCCoUkIE
's_VALID_URL
regex.Now
BBCIE
handles it with the existing preloaded state code + new tracklist code.I don't think this breaks any other pages, but I can't be sure as a lot of the URLs in the tests are long gone.
The tests that weren't for dead links didn't seem to be broken by this, at least. (mostly title changes)
The metadata that's extracted has changed. The main changes I noticed were:
Programme, Episode
is nowProgramme - Episode
So this will doubtless break someone's workflow.
Template
Before submitting a pull request make sure you have:
In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:
What is the purpose of your pull request?
Copilot Summary
馃 Generated by Copilot at f007598
Summary
馃帶馃敡馃摐
Improved BBC Sounds extraction and cleaned up BBC code. Added support for
BBCIE
to handle BBC Sounds URLs and audio chapters, and removed unused code and tests fromBBCCoUkIE
.Walkthrough
join_nonempty
andtraverse_obj
functions fromutils.py
for string concatenation and nested value access (link, link)sounds/play/
pattern fromBBCCoUkIE
extractor and its test case, since it is now handled byBBCIE
extractor (link, link)sounds/play/
URL toBBCIE
extractor, demonstrating audio extraction and metadata (link)preload_state
variable inBBCIE
extractor, and assign it as chapters to the audio (link, link)