Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BBC Bitesize links #19122

Closed
danielatlarge opened this issue Feb 3, 2019 · 4 comments
Closed

BBC Bitesize links #19122

danielatlarge opened this issue Feb 3, 2019 · 4 comments

Comments

@danielatlarge
Copy link

@danielatlarge danielatlarge commented Feb 3, 2019

I'm trying to download videos from BBC Bitesize. An example url would be https://www.bbc.com/bitesize/articles/zqghtyc
An actual PID for a video would be... p05f425d
I have tried youtube-dl -v "https://www.bbc.com/bitesize/articles/zqghtyc
and
https://www.bbc.co.uk/programmes/p05f425d
and
https://www.bbc.co.uk/programmes/zqghtyc

None of which work. Would love to get some help! Thank you so much.

youtube-dl -v "https://www.bbc.com/bitesize/articles/zqghtyc"
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'https://www.bbc.com/bitesize/articles/zqghtyc']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2019.01.30.1
[debug] Python version 2.7.15rc1 (CPython) - Linux-4.15.0-45-generic-x86_64-with-Ubuntu-18.04-bionic
[debug] exe versions: ffmpeg 3.4.4, ffprobe 3.4.4
[debug] Proxy map: {}
[bbc] zqghtyc: Downloading webpage
ERROR: no suitable InfoExtractor for URL https://www.bbc.co.uk/programmes/None
File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/usr/local/bin/youtube-dl/main.py", line 19, in
youtube_dl.main()
File "/usr/local/bin/youtube-dl/youtube_dl/init.py", line 472, in main
_real_main(argv)
File "/usr/local/bin/youtube-dl/youtube_dl/init.py", line 462, in _real_main
retcode = ydl.download(all_urls)
File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 2005, in download
url, force_generic_extractor=self.params.get('force_generic_extractor', False))
File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 804, in extract_info
return self.process_ie_result(ie_result, download, extra_info)
File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 865, in process_ie_result
extra_info=extra_info)
File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 827, in extract_info
self.report_error('no suitable InfoExtractor for URL %s' % url)
File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 621, in report_error
self.trouble(error_message, tb)
File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 583, in trouble
tb_data = traceback.format_list(traceback.extract_stack())

youtube-dl -v "https://www.bbc.co.uk/programmes/p05f47t4"
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'https://www.bbc.co.uk/programmes/p05f47t4']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2019.01.30.1
[debug] Python version 2.7.15rc1 (CPython) - Linux-4.15.0-45-generic-x86_64-with-Ubuntu-18.04-bionic
[debug] exe versions: ffmpeg 3.4.4, ffprobe 3.4.4
[debug] Proxy map: {}
[bbc.co.uk] p05f47t4: Downloading video page
[bbc.co.uk] p05f47t4: Downloading playlist JSON
[bbc.co.uk] p05f47t4: Downloading legacy playlist XML
ERROR: Unable to download XML: HTTP Error 404: Not Found (caused by HTTPError()); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type youtube-dl -U to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 605, in _request_webpage
return self._downloader.urlopen(url_or_request)
File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 2215, in urlopen
return self._opener.open(req, timeout=self._socket_timeout)
File "/usr/lib/python2.7/urllib2.py", line 435, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 548, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 473, in error
return self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 556, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)

Please follow the guide below

  • You will be asked some questions and requested to provide some information, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your issue (like this: [x])
  • Use the Preview tab to see what your issue will actually look like

Make sure you are using the latest version: run youtube-dl --version and ensure your version is 2019.01.30.1. If it's not, read this FAQ entry and update. Issues with outdated version will be rejected.

  • [ x] I've verified and I assure that I'm running youtube-dl 2019.01.30.1

Before submitting an issue make sure you have:

  • At least skimmed through the README, most notably the FAQ and BUGS sections
  • Searched the bugtracker for similar issues including closed ones
  • Checked that provided video/audio/playlist URLs (if any) are alive and playable in a browser

What is the purpose of your issue?

  • Bug report (encountered problems with youtube-dl)
  • Site support request (request for adding support for a new site)
  • Feature request (request for a new functionality)
  • Question
  • Other
@Vangelis66
Copy link

@Vangelis66 Vangelis66 commented Feb 7, 2019

@danielatlarge

At https://www.bbc.com/bitesize/articles/zqghtyc exists only one video clip (my method gets trickier when multiple clips reside in the same web page).

  1. First load Page Source for the bitesize page in question (how you do that depends on the browser used, usually a context menu (right-click) entry is available in most browsers...).
  2. Search for the string "pid":; its second instance is found inside a JSON block like below:
"headers":{"content-type":"application\/json"}},"body":{"type":"video-block","id":"zkf8mfr","title":"","caption":"","pid":"p03q2xx3","transcript":"","video":{"duration":"PT41S","holdingImage":"https:\/\/ichef.bbci.co.uk\/images\/ic\/$recipe\/p03q2xwb.jpg","mediaType":"video","title":"How to use the suffix -ly","vpid":"p05f425d"}
  1. Value of pid (Programme ID, PID) is p03q2xx3, THIS IS THE PID OF THE VIDEO YOU NEED!
    You can view clip details at https://www.bbc.co.uk/programmes/p03q2xx3.json
  2. You must feed yt-dl the found PID string via the programmes template, i.e.
    https://www.bbc.co.uk/programmes/p03q2xx3; this video clip is geo-fenced, accessible from only whitelisted UK IPs; I am located overseas, so

youtube-dl -F https://www.bbc.co.uk/programmes/p03q2xx3 =>

[bbc.co.uk] p03q2xx3: Downloading video page
[bbc.co.uk] p03q2xx3: Downloading playlist JSON
[bbc.co.uk] p05f425d: Downloading media selection XML
[bbc.co.uk] p05f425d: Downloading media selection XML
ERROR: bbc.co.uk returned error: geolocation

... but with a whitelisted UK HTTP proxy, things are much better 😜 :

youtube-dl --proxy="http://proxyhost:proxyport" --console-title --hls-prefer-native -c --no-part -f "stream-uk-iptv_streaming_concrete_combined_sd_mf_akamai_uk_hls-1836" "https://www.bbc.co.uk/programmes/p03q2xx3" -o "How to use the suffix -ly[p03q2xx3].mp4" --write-sub --convert-subs=srt --embed-subs --write-thumbnail --embed-thumbnail --add-metadata =>

[bbc.co.uk] p03q2xx3: Downloading video page
[bbc.co.uk] p03q2xx3: Downloading playlist JSON
[bbc.co.uk] p05f425d: Downloading media selection XML
[bbc.co.uk] p05f425d: Downloading m3u8 information
[bbc.co.uk] p05f425d: Downloading m3u8 information
[bbc.co.uk] p05f425d: Downloading m3u8 information
[bbc.co.uk] p05f425d: Downloading m3u8 information
[bbc.co.uk] p05f425d: Downloading MPD manifest
[bbc.co.uk] p05f425d: Downloading MPD manifest
[bbc.co.uk] p05f425d: Downloading m3u8 information
[bbc.co.uk] p05f425d: Downloading m3u8 information
[bbc.co.uk] p05f425d: Downloading m3u8 information
[bbc.co.uk] p05f425d: Downloading m3u8 information
[bbc.co.uk] p05f425d: Downloading m3u8 information
[bbc.co.uk] p05f425d: Downloading m3u8 information
[bbc.co.uk] p05f425d: Downloading MPD manifest
[bbc.co.uk] p05f425d: Downloading MPD manifest
[bbc.co.uk] p05f425d: Downloading m3u8 information
[bbc.co.uk] p05f425d: Downloading m3u8 information
[bbc.co.uk] p05f425d: Downloading captions
[bbc.co.uk] p05f425d: Downloading captions
[bbc.co.uk] p05f425d: Downloading captions
[info] Writing video subtitles to: How to use the suffix -ly[p03q2xx3].en.ttml
[bbc.co.uk] p05f425d: Downloading thumbnail ...
[bbc.co.uk] p05f425d: Writing thumbnail to: How to use the suffix -ly[p03q2xx3].
jpg
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 6
[download] Destination: How to use the suffix -ly[p03q2xx3].mp4
[download] 100% of 8.17MiB in 00:15
[ffmpeg] Fixing malformed AAC bitstream in "How to use the suffix -ly[p03q2xx3].
mp4"
[ffmpeg] Adding metadata to 'How to use the suffix -ly[p03q2xx3].mp4'
[ffmpeg] Converting subtitles
WARNING: You have requested to convert dfxp (TTML) subtitles into another format
, which results in style information loss
Deleting original file How to use the suffix -ly[p03q2xx3].en.ttml (pass -k to k
eep)
[ffmpeg] Embedding subtitles in 'How to use the suffix -ly[p03q2xx3].mp4'
Deleting original file How to use the suffix -ly[p03q2xx3].en.srt (pass -k to ke
ep)
[atomicparsley] Adding thumbnail to "How to use the suffix -ly[p03q2xx3].mp4"

Teach well! 😃

@danielatlarge
Copy link
Author

@danielatlarge danielatlarge commented Feb 10, 2019

Wow. Thanks a mil, Vangelis66! I had very little hope that someone would write. So glad you did. Now that I've got the video downloaded, I hope my students will be the better for it too!

I've got a vpn that I use but would much rather use an http proxy since it's cumbersome to find a server that isn't blocked. Will look for a whitelisted http proxy instead of having to manually connect to random servers all the time.

Teaching is brutal. I hope you're not in the profession. THanks again, mate! ;-)

@Vangelis66
Copy link

@Vangelis66 Vangelis66 commented Feb 10, 2019

Thanks a mil, Vangelis66! I had very little hope that someone would write. So glad you did

Err, you're quite welcome! I had received much help from strangers back in the day (mid 2000s) when I was clueless, but at a time when things on the internet were more civil and altruistic; so I still like to give back to others... Sadly, now everything's monetised and everyone likes to keep things for themselves (not without good reason, in some cases...).

I've got a vpn that I use but would much rather use an http proxy
since it's cumbersome to find a server that isn't blocked.
Will look for a whitelisted http proxy instead of having to
manually connect to random servers all the time.

The beeb have been relentless over, at least, the past two years at hunting down and blocking all commercial and free geo-location circumvention methods 😞 ; free and paid-for UK proxies are in the same boat as VPNs and SmartDNS services, i.e. being constantly blacklisted...

Teaching is brutal. I hope you're not in the profession.

... Sort of, but in the past; had been practising private Chemistry tutoring for Uni students in my late-20s - mid-30s, so to adults, not toddlers...

Returning on topic, ideally a BBC bitesize plugin could be created that would web scrape clip PIDs and then use the bbc plugin's logic to fetch to disk, but the devs are swamped with so many support requests that I won't hold my breath for such a plugin anytime soon... In all honesty, I think you had better close this issue...

@cesarandreslopez
Copy link

@cesarandreslopez cesarandreslopez commented Sep 8, 2019

@danielatlarge @Vangelis66 here is some code that will help you fetch the corresponding PIDs and will system execute youtube-dl.

Might be helpful for someone else, python based.

import requests
import os

sites = [
    'https://www.bbc.co.uk/bitesize/guides/zws8h39/video'
]


marker = '"chapterData"'
point_of_pid = 22
end_of_pid = 8
base = "https://www.bbc.co.uk/programmes/"

for url in sites:
    r = requests.get(url)
    page_source = r.text
    page_source = page_source.split('\n')
    print("\nURL:", url) 
    print("--------------------------------------")
    for row in page_source:
        if marker in row:
            entry = row.find(marker)
            print (entry)
            print ('----')
            begin = entry + point_of_pid
            end = begin + end_of_pid
            pid = row[begin:end] + ".json"
            print ( pid )
            url = base + pid
            print (" Downloading from " + url )
            cmd = "youtube-dl " + url
            os.system(cmd)
    print("--------------------------------------")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.