[BBC Sounds] Tracklist Extraction #7788

garret1317 · 2023-08-07T23:32:45Z

IMPORTANT: PRs without the template will be CLOSED

Description of your pull request and other information

This PR adds tracklist extraction for BBC Sounds.
In doing so, it moves the BBC Sounds extraction from BBCCoUkIE to BBCIE, changing the metadata extracted slightly.

I've greatly enjoyed having tracklists extracted as chapters for another radio extractor I've written - it lets me see what's playing without having to use shazam etc, and I can skip past songs I don't like. It also works quite well for skipping news breaks. The BBC Sounds website and app have tracklists available, with full start and end times - but this isn't being extracted.

Looking at the site's HTML, I saw a big JSON blob in window.__PRELOADED_STATE__ with all the programme's metadata, including the tracklist.
Ctrl-F'ing for __PRELOADED_STATE__ in the extractor, I saw that BBCIE had a section that handled this, but it didn't seem to be using it for Sounds. Instead, it was handled by something in BBCCoUkIE.
I didn't see the metadata that I needed available in BBCCoUkIE's network requests, so I removed the sounds/play/ from BBCCoUkIE's _VALID_URL regex.
Now BBCIE handles it with the existing preloaded state code + new tracklist code.

I don't think this breaks any other pages, but I can't be sure as a lot of the URLs in the tests are long gone.
The tests that weren't for dead links didn't seem to be broken by this, at least. (mostly title changes)

The metadata that's extracted has changed. The main changes I noticed were:

Programme, Episode is now Programme - Episode
the description is now the long, full length, version
the station name is extracted as the uploader
there are a few more formats

So this will doubtless break someone's workflow.

Template

Before submitting a pull request make sure you have:

At least skimmed through contributing guidelines including yt-dlp coding conventions
Searched the bugtracker for similar pull requests
Checked the code with flake8 and ran relevant tests

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

I am the original author of this code and I am willing to release it under Unlicense
I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Fix or improvement to an extractor (Make sure to add/update tests)
New extractor (Piracy websites will not be accepted)
Core bug fix/improvement
New feature (It is strongly recommended to open an issue first)

Copilot Summary

`🤖 Generated by Copilot at f007598`

Summary

🎧🔧📜

Improved BBC Sounds extraction and cleaned up BBC code. Added support for BBCIE to handle BBC Sounds URLs and audio chapters, and removed unused code and tests from BBCCoUkIE.

To handle the BBC Sounds links
The BBCIE extractor thinks
It parses the tracklist
For audio chapters to list
And imports some utils for kinks

Walkthrough

Import join_nonempty and traverse_obj functions from utils.py for string concatenation and nested value access (link, link)
Remove sounds/play/ pattern from BBCCoUkIE extractor and its test case, since it is now handled by BBCIE extractor (link, link)
Add a new test case for sounds/play/ URL to BBCIE extractor, demonstrating audio extraction and metadata (link)
Parse tracklist information from preload_state variable in BBCIE extractor, and assign it as chapters to the audio (link, link)

yt_dlp/extractor/bbc.py

Co-Authored-By: bashonly <88596187+bashonly@users.noreply.github.com>

old one expired its annoying because the ones that last the longest (world service mostly) dont have tracklists

Authored by: garret1317

garret1317 added 2 commits August 7, 2023 11:10

[BBC] extract tracklist as chapters

4627841

add BBC Sounds test to BBCIE, remove from BBCCoUKIE

f007598

bashonly added the site-enhancement Feature request for some website label Aug 22, 2023

bashonly requested changes Sep 14, 2023

View reviewed changes

yt_dlp/extractor/bbc.py Outdated Show resolved Hide resolved

bashonly added the pending-fixes PR has had changes requested label Sep 14, 2023

garret1317 and others added 2 commits September 15, 2023 01:42

bbc sounds tracklists: inline non-fatal version

e746628

Co-Authored-By: bashonly <88596187+bashonly@users.noreply.github.com>

bbc sounds tracklists: update test

b648691

old one expired its annoying because the ones that last the longest (world service mostly) dont have tracklists

bashonly approved these changes Sep 15, 2023

View reviewed changes

bashonly added pending-review PR needs a review and removed pending-fixes PR has had changes requested pending-review PR needs a review labels Sep 15, 2023

coletdjnz approved these changes Sep 16, 2023

View reviewed changes

bashonly merged commit eda0e41 into yt-dlp:master Sep 16, 2023
13 checks passed

aalsuwaidi pushed a commit to aalsuwaidi/yt-dlp that referenced this pull request Apr 21, 2024

[ie/bbc] Extract tracklist as chapters (yt-dlp#7788)

5c75abe

Authored by: garret1317

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BBC Sounds] Tracklist Extraction #7788

[BBC Sounds] Tracklist Extraction #7788

garret1317 commented Aug 7, 2023 •

edited

[BBC Sounds] Tracklist Extraction #7788

[BBC Sounds] Tracklist Extraction #7788

Conversation

garret1317 commented Aug 7, 2023 • edited

Description of your pull request and other information

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

What is the purpose of your pull request?

🤖 Generated by Copilot at f007598

Summary

Walkthrough

garret1317 commented Aug 7, 2023 •

edited

`🤖 Generated by Copilot at f007598`