Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GlobalPlayer #31688

Closed
5 tasks done
Treadspear opened this issue Feb 25, 2023 · 6 comments · Fixed by #32138
Closed
5 tasks done

GlobalPlayer #31688

Treadspear opened this issue Feb 25, 2023 · 6 comments · Fixed by #32138
Labels
site-support-request Add extractor(s) for a new domain

Comments

@Treadspear
Copy link

Treadspear commented Feb 25, 2023

Checklist

  • I'm reporting a new site support request
  • I've verified that I'm running youtube-dl version 2021.12.17
  • I've checked that all provided URLs are alive and playable in a browser
  • I've checked that none of provided URLs violate any copyrights
  • I've searched the bugtracker for similar site support requests including closed ones

Example URLs

Description

Paywall: NO
Region Blocked: UNKNOWN
Region: GREAT BRITAIN
Credentials Required: YES
Username: no-reply00456temp@treadspear.com
Password: no-reply00456temp@treadspear.com
Registration: https://www.globalplayer.com/register/

I am a new to the youtube-dl project and I am very impressed so far. I would like to make a script to archive some podcasts on the Global Player. Does anyone have some recommendations I can try to do this?

Thank you,

@Treadspear Treadspear added the site-support-request Add extractor(s) for a new domain label Feb 25, 2023
@Vangelis66
Copy link

Their site requires an account/log-in for audio and/or video content to start playing... That doesn't mean the same is true for youtube-dl 😉 ...
Since this is a UK-based service, some content might be geo-fenced from overseas...

I find that the generic extractor can handle the "single video" (actually, it's audio-only) URI the OP provided:

yt-dl -v -c --no-part --write-description --write-thumbnail --add-metadata --embed-thumbnail "https://www.globalplayer.com/podcasts/episodes/7Drf561/" => 

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--ffmpeg-location', '.\\FFmpeg', '--external-downloader-args', '-v 8 -stats', '-v', '-c', '--no-part', '--write-description', '--write-thumbnail', '--add-metadata', '--embed-thumbnail', 'https://www.globalplayer.com/podcasts/episodes/7Drf561/']
[debug] Encodings: locale cp1253, fs mbcs, out cp737, pref cp1253
[debug] youtube-dl version 2023.02.24.43044
[debug] Python version 3.4.4 (CPython) - Windows-Vista-6.0.6003-SP2
[debug] exe versions: ffmpeg n5.2-dev-2245-N-109649-gab8cde6, ffprobe n5.2-dev-2245-N-109649-gab8cde6, phantomjs 2.1.1, rtmpdump 2.4
[debug] Proxy map: {}
[generic] 7Drf561: Requesting header
WARNING: Falling back on generic information extractor.
[generic] 7Drf561: Downloading webpage
[generic] 7Drf561: Extracting information
[debug] Default format spec: bestvideo+bestaudio/best
WARNING: There's no description to write.
[debug] Invoking downloader on 'https://dax.captivate.fm/637d3b2b-41eb-440d-ab2c-76d95c427ee2/LBC-UK-08022023-035818-08022023-043316.mp3?aw_0_1st.showid=4fb1b093-d258-4b00-ab19-4ffa3debb6cc\\u0026aw_0_1st.episodeid=185ae231-a952-41e9-9d29-eafe15192903'
[download] Destination: Steve Allen - The Whole Show - Podcast _ Global Player-u0026aw_0_1st.mp3
[download] 100% of 126.09MiB in 02:35
[ffmpeg] Adding metadata to 'Steve Allen - The Whole Show - Podcast _ Global Player-u0026aw_0_1st.mp3'
[debug] ffmpeg command line: ".\FFmpeg\ffmpeg" -y -loglevel "repeat+info" -i "file:Steve Allen - The Whole Show - Podcast _ Global Player-u0026aw_0_1st.mp3" -ccopy -metadata "purl=https://www.globalplayer.com/podcasts/episodes/7Drf561/" -metadata "title=Steve Allen - The Whole Show - Podcast | Global Player" -metadata "artist=www.globalplayer.com" "file:Steve Allen - The Whole Show - Podcast _ Global Player-u0026aw_0_1st.temp.mp3"
[embedthumbnail] There aren't any thumbnails to embed

Overseas location, thus not being geo-fenced for that particular item...
FWIW, a proper IE has to be authored if one wants:
a) more meaningful/correct filenames (e.g. including radio station name, date of first broadcast, etc.)
b) complete description/file metadata
c) cover/thumbnail...

@dirkf
Copy link
Contributor

dirkf commented Feb 25, 2023

The page tested above has:

  1. ld+json that we ought to be extracting once the common code recognises a PodcastEpisode (also a RadioEpisode) as a sort of Episode
  2. almost the same metadata in og:... <meta> tags
  3. Next.js hydration JSON that could be extracted by a dedicated extractor.

Currently, I expect (without actually testing) that this catch-all in the generic extractor is finding the media link in the hydration JSON:

            found = filter_video(re.findall(
                r'[^A-Za-z0-9]?(?:file|video_url)["\']?:\s*["\'](http(?![^\'"]+\.[0-9]+[\'"])[^\'"]+)["\']', webpage))

@Vangelis66
Copy link

... It's not only MP3 podcasts that are offered via GlobalPlayer, it's a full-fledged catch-up service for (most) radio programmes broadcast over the Global Group (commercial) radio network in the UK 😉 ...

AOD links, e.g. for CapitalFM, are like below:

https://www.globalplayer.com/catchup/capital/uk/episodes/2zGrsPHALvZrD1F7DB837dwSFp/

You'll notice the different structure of the URI compared to the "podcast one" found in OP; catch-up has a limited availability time frame of only 7 days after first broadcast, audio is served as a lowly HE-AACv1@48kbps encode, inside the MP4 container (with Apple's .m4a file extension for audio):

yt-dl "https://www.globalplayer.com/catchup/capital/uk/episodes/2zGrsPHALvZrD1F7DB837dwSFp/" => 

[generic] 2zGrsPHALvZrD1F7DB837dwSFp: Requesting header
WARNING: Falling back on generic information extractor.
[generic] 2zGrsPHALvZrD1F7DB837dwSFp: Downloading webpage
[generic] 2zGrsPHALvZrD1F7DB837dwSFp: Extracting information
[download] Destination: The Sky VIP Official Big Top 40 _ Global Player-u0026aw_0_1st.m4a
[download] 100% of 61.13MiB in 02:02

Hopefully, support for non-podcast AOD can be added, too... 😄

@dirkf
Copy link
Contributor

dirkf commented Feb 25, 2023

The AOD examples lack the ld+json but the page structure is otherwise similar.

@Vangelis66
Copy link

FWIW, a proper IE has to be authored if one wants:
a) more meaningful/correct filenames (e.g. including radio station name, date of first broadcast, etc.)
b) complete description/file metadata
c) cover/thumbnail...

Recent "downstream" implementation:

yt-dlp/yt-dlp#6903

merged as yt-dlp/yt-dlp@3064766 😉 ...

@dirkf
Copy link
Contributor

dirkf commented Apr 28, 2023

Not a tricky back-port, but needs some core back-ports to be merged first:

$ pytest -k GlobalPlayer============================= test session starts ==============================
platform linux2 -- Python 2.7.18, pytest-4.6.11, py-1.11.0, pluggy-0.13.1
rootdir: /home/df/Documents/src/youtube-dl
collected 2621 items / 2613 deselected / 8 selected                            

test/test_download.py ........                                           [100%]

================== 8 passed, 2613 deselected in 6.22 seconds ===================
$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
site-support-request Add extractor(s) for a new domain
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants