Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to download JSON metadata on Raywenderlich.com #24027

Open
5 tasks done
Stunner opened this issue Feb 12, 2020 · 1 comment
Open
5 tasks done

Unable to download JSON metadata on Raywenderlich.com #24027

Stunner opened this issue Feb 12, 2020 · 1 comment

Comments

@Stunner
Copy link

Stunner commented Feb 12, 2020

Checklist

  • I'm reporting a broken site support
  • I've verified that I'm running youtube-dl version 2020.01.24
  • I've checked that all provided URLs are alive and playable in a browser
  • I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • I've searched the bugtracker for similar issues including closed ones

Verbose log

$ youtube-dl --verbose https://www.raywenderlich.com/4743-beginning-rxswift
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'--verbose', u'https://www.raywenderlich.com/4743-beginning-rxswift']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2020.01.24
[debug] Python version 2.7.16 (CPython) - Darwin-19.2.0-x86_64-i386-64bit
[debug] exe versions: ffmpeg 4.2.2, ffprobe 4.2.2, rtmpdump 2.4
[debug] Proxy map: {}
[RayWenderlichCourse] 4743-beginning-rxswift: Downloading webpage
[download] Downloading playlist: Beginning RxSwift
[RayWenderlichCourse] playlist Beginning RxSwift: Collected 39 video ids (downloading 39 of them)
[download] Downloading video 1 of 39
[RayWenderlich] 4743-beginning-rxswift/1: Downloading webpage
[RayWenderlich] 4743-beginning-rxswift/1: Downloading JSON metadata
[vimeo] 266135871: Downloading webpage
[vimeo] 266135871: Extracting information
[vimeo] 266135871: Downloading JSON metadata
WARNING: Unable to download JSON metadata: HTTP Error 404: Not Found
[vimeo] 266135871: Downloading akfire_interconnect_quic m3u8 information
[vimeo] 266135871: Downloading fastly_skyfire m3u8 information
[vimeo] 266135871: Downloading akfire_interconnect_quic MPD information
[vimeo] 266135871: Downloading akfire_interconnect_quic MPD information
[vimeo] 266135871: Downloading fastly_skyfire MPD information
[vimeo] 266135871: Downloading fastly_skyfire MPD information
[debug] Default format spec: bestvideo+bestaudio/best
[download] Introduction-266135871.mp4 has already been downloaded and merged
[download] Downloading video 2 of 39
[RayWenderlich] 4743-beginning-rxswift/2: Downloading webpage
[RayWenderlich] 4743-beginning-rxswift/2: Downloading JSON metadata
[vimeo] 266136175: Downloading webpage
[vimeo] 266136175: Extracting information
[vimeo] 266136175: Downloading JSON metadata
WARNING: Unable to download JSON metadata: HTTP Error 404: Not Found
[vimeo] 266136175: Downloading akfire_interconnect_quic m3u8 information
[vimeo] 266136175: Downloading fastly_skyfire m3u8 information
[vimeo] 266136175: Downloading akfire_interconnect_quic MPD information
[vimeo] 266136175: Downloading akfire_interconnect_quic MPD information
[vimeo] 266136175: Downloading fastly_skyfire MPD information
[vimeo] 266136175: Downloading fastly_skyfire MPD information
[debug] Default format spec: bestvideo+bestaudio/best
[download] Hello RxSwift-266136175.mp4 has already been downloaded and merged
[download] Downloading video 3 of 39
[RayWenderlich] 4743-beginning-rxswift/3: Downloading webpage
[RayWenderlich] 4743-beginning-rxswift/3: Downloading JSON metadata
ERROR: Unable to download JSON metadata: HTTP Error 403: Forbidden (caused by HTTPError()); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 627, in _request_webpage
    return self._downloader.urlopen(url_or_request)
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 2237, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 435, in open
    response = meth(req, response)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 548, in http_response
    'http', request, response, code, msg, hdrs)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 473, in error
    return self._call_chain(*args)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 407, in _call_chain
    result = func(*args)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 556, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)

$

Description

This does require account credentials but I am unable to provide them as the account is a shared account and I am not the primary owner.

@ytdl-org ytdl-org deleted a comment from MacAlister88 Feb 18, 2020
@anhdle14
Copy link

After debugging I have found several issues with the current raywenderlich.py

  1. Currently, RWL, short for RaywenderLich, is using cookies sessions with the state to get USER_TOKEN on webpage.
    <script>
//<![CDATA[

      window.CAROLUS_ENV = {
        KERCHING_BASE_URL: "https://store.raywenderlich.com/",
        BETAMAX_BASE_URL: "https://videos.raywenderlich.com/api/v1",
        GUARDPOST_BASE_URL: "https://accounts.raywenderlich.com/v2",
        CONTENT_PERMISSIONS_REQUIRED_COOKIE_DOMAIN: ".raywenderlich.com",
        USER_TOKEN: "*"
      };
//]]>
</script>
  1. The 403 JSON error is coming from L106, L116. The correct way to get the JSON is:
GET /api/v1/videos/3712.json
Accept: application/json, text/javascript, */*; q=0.01
Authorization: Token $USER_TOKEN
Origin: https://www.raywenderlich.com
Referer: https://www.raywenderlich.com/
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.2 Safari/605.1.15
X-Requested-With: XMLHttpRequest
X-CSRF-Token: *

You can bypass by parse the USER_TOKEN with a parameter (--video-password) and get that value into raywenderlich.py

  1. The current implementation is using the thumbnailUrl in HTML's meta tags to get the lessonId. Apparently, there is no other way to get that 3712 except getting from the thumbnail. And there are videos that don't have a thumbnail.
<meta property="og:image" content="https://files.betamax.raywenderlich.com/attachments/videos/3712/f0a9b08b-3919-4b5a-aad7-40676ce0fa1f.png">

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants