Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YouTube] "Unable to extract yt inital data"-error with specific URL #28871

Open
5 tasks done
Zirro opened this issue Apr 26, 2021 · 3 comments
Open
5 tasks done

[YouTube] "Unable to extract yt inital data"-error with specific URL #28871

Zirro opened this issue Apr 26, 2021 · 3 comments

Comments

@Zirro
Copy link

Zirro commented Apr 26, 2021

Checklist

  • I'm reporting a broken site support
  • I've verified that I'm running youtube-dl version 2021.04.26
  • I've checked that all provided URLs are alive and playable in a browser
  • I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • I've searched the bugtracker for similar issues including closed ones

Verbose log

youtube-dl 'https://www.youtube.com/c/OkonomiyakiVtuberSUBEspañol/videos' --verbose --write-pages --print-traffic
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['https://www.youtube.com/c/OkonomiyakiVtuberSUBEspañol/videos', '--verbose', '--write-pages', '--print-traffic']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.04.26
[debug] Python version 3.9.4 (CPython) - macOS-10.15.7-x86_64-i386-64bit
[debug] exe versions: ffmpeg 4.4, ffprobe 4.4, phantomjs 2.1.1, rtmpdump 2.4
[debug] Proxy map: {}
[youtube:tab] OkonomiyakiVtuberSUBEspañol: Downloading webpage
send: b'GET /c/OkonomiyakiVtuberSUBEspa%C3%B1ol/videos HTTP/1.1\r\nHost: www.youtube.com\r\nUser-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3684.1 Safari/537.36\r\nAccept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\nAccept-Encoding: gzip, deflate\r\nAccept-Language: en-us,en;q=0.5\r\nConnection: close\r\n\r\n'
reply: 'HTTP/1.1 302 Found\r\n'
header: Content-Type: application/binary
header: X-Content-Type-Options: nosniff
header: Cache-Control: no-cache, no-store, max-age=0, must-revalidate
header: Pragma: no-cache
header: Expires: Mon, 01 Jan 1990 00:00:00 GMT
header: Date: Mon, 26 Apr 2021 01:03:07 GMT
header: Location: https://consent.youtube.com/m?continue=https%3A%2F%2Fwww.youtube.com%2Fc%2FOkonomiyakiVtuberSUBEspa%25C3%25B1ol%2Fvideos&gl=RO&m=0&pc=yt&uxe=23983172&hl=en&src=1
header: X-Frame-Options: SAMEORIGIN
header: Strict-Transport-Security: max-age=31536000
header: permissions-policy: ch-ua-full-version=*, ch-ua-platform=*, ch-ua-platform-version=*, ch-ua-arch=*, ch-ua-model=*
header: P3P: CP="This is not a P3P policy! See http://support.google.com/accounts/answer/151657?hl=en for more info."
header: Server: ESF
header: Content-Length: 0
header: X-XSS-Protection: 0
header: Set-Cookie: YSC=q49Md1VQC-c; Domain=.youtube.com; Path=/; Secure; HttpOnly; SameSite=none
header: Set-Cookie: CONSENT=PENDING+156; expires=Fri, 01-Jan-2038 00:00:00 GMT; path=/; domain=.youtube.com
header: Alt-Svc: h3-29=":443"; ma=2592000,h3-T051=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"
header: Connection: close
send: b'GET /m?continue=https%3A%2F%2Fwww.youtube.com%2Fc%2FOkonomiyakiVtuberSUBEspa%25C3%25B1ol%2Fvideos&gl=RO&m=0&pc=yt&uxe=23983172&hl=en&src=1 HTTP/1.1\r\nHost: consent.youtube.com\r\nCookie: CONSENT=PENDING+156; YSC=q49Md1VQC-c\r\nUser-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3684.1 Safari/537.36\r\nAccept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\nAccept-Encoding: gzip, deflate\r\nAccept-Language: en-us,en;q=0.5\r\nConnection: close\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Content-Type: text/html; charset=utf-8
header: Vary: Sec-Fetch-Dest, Sec-Fetch-Mode, Sec-Fetch-Site
header: x-ua-compatible: IE=edge
header: Cache-Control: no-cache, no-store, max-age=0, must-revalidate
header: Pragma: no-cache
header: Expires: Mon, 01 Jan 1990 00:00:00 GMT
header: Date: Mon, 26 Apr 2021 01:03:07 GMT
header: Content-Security-Policy: script-src 'report-sample' 'nonce-7J+R2Ji3qrmJQPONoJvzMQ' 'unsafe-inline';object-src 'none';base-uri 'self';report-uri /_/ConsentUi/cspreport;worker-src 'self'
header: Content-Security-Policy: script-src 'nonce-7J+R2Ji3qrmJQPONoJvzMQ' 'self' https://apis.google.com https://ssl.gstatic.com https://www.google.com https://www.gstatic.com https://www.google-analytics.com;report-uri /_/ConsentUi/cspreport
header: Cross-Origin-Resource-Policy: same-site
header: Content-Encoding: gzip
header: Server: ESF
header: X-XSS-Protection: 0
header: X-Frame-Options: SAMEORIGIN
header: X-Content-Type-Options: nosniff
header: Alt-Svc: h3-29=":443"; ma=2592000,h3-T051=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"
header: Connection: close
header: Transfer-Encoding: chunked
[youtube:tab] Saving request to OkonomiyakiVtuberSUBEspanol_https_-_consent.youtube.com_mcontinue=https%3A%2F%2Fwww.youtube.com%2Fc%2FOkonomiyakiVtuberSUBEspa%25C3%25B1ol%2Fvideos_gl=RO_m=0_pc=yt_uxe=23983172_hl=en_src=1.dump
ERROR: Unable to extract yt initial data; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Traceback (most recent call last):
  File "/usr/local/Cellar/youtube-dl/HEAD-9452056/libexec/lib/python3.9/site-packages/youtube_dl/YoutubeDL.py", line 806, in wrapper
    return func(self, *args, **kwargs)
  File "/usr/local/Cellar/youtube-dl/HEAD-9452056/libexec/lib/python3.9/site-packages/youtube_dl/YoutubeDL.py", line 827, in __extract_info
    ie_result = ie.extract(url)
  File "/usr/local/Cellar/youtube-dl/HEAD-9452056/libexec/lib/python3.9/site-packages/youtube_dl/extractor/common.py", line 534, in extract
    ie_result = self._real_extract(url)
  File "/usr/local/Cellar/youtube-dl/HEAD-9452056/libexec/lib/python3.9/site-packages/youtube_dl/extractor/youtube.py", line 2822, in _real_extract
    data = self._extract_yt_initial_data(item_id, webpage)
  File "/usr/local/Cellar/youtube-dl/HEAD-9452056/libexec/lib/python3.9/site-packages/youtube_dl/extractor/youtube.py", line 299, in _extract_yt_initial_data
    self._search_regex(
  File "/usr/local/Cellar/youtube-dl/HEAD-9452056/libexec/lib/python3.9/site-packages/youtube_dl/extractor/common.py", line 1012, in _search_regex
    raise RegexNotFoundError('Unable to extract %s' % _name)
youtube_dl.utils.RegexNotFoundError: Unable to extract yt initial data; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

Description

The URL https://www.youtube.com/c/OkonomiyakiVtuberSUBEspañol/videos results in the error "Unable to extract yt inital data". youtube-dl seems to get stuck on the EU privacy consent page instead of following the redirect. URLs leading to other channels aren't displaying the same problem. Here's the output from --write-pages:

OkonomiyakiVtuberSUBEspanol_https_-_consent.youtube.com_mcontinue=https%3A%2F%2Fwww.youtube.com%2Fc%2FOkonomiyakiVtuberSUBEspa%25C3%25B1ol%2Fvideos_gl=RO_m=0_pc=yt_uxe=23983172_hl=en_src=1.log

@coletdjnz
Copy link
Contributor

coletdjnz commented Apr 26, 2021

I think this has something to do with the cookies being cleared when the URL is escaped in YouTubeDLHandler:

youtube-dl/youtube_dl/utils.py

Lines 2602 to 2604 in 9452056

# Substitute URL if any change after escaping
if url != url_escaped:
req = update_Request(req, url=url_escaped)

The relevant CONSENT cookie lives in unredirected_hdrs in the Request object but are not copied over to the new object in update_Request. But I don't think you want to copy all of those over (thinking content-length header).

youtube-dl/youtube_dl/utils.py

Lines 3960 to 3977 in 9452056

def update_Request(req, url=None, data=None, headers={}, query={}):
req_headers = req.headers.copy()
req_headers.update(headers)
req_data = data or req.data
req_url = update_url_query(url or req.get_full_url(), query)
req_get_method = req.get_method()
if req_get_method == 'HEAD':
req_type = HEADRequest
elif req_get_method == 'PUT':
req_type = PUTRequest
else:
req_type = compat_urllib_request.Request
new_req = req_type(
req_url, data=req_data, headers=req_headers,
origin_req_host=req.origin_req_host, unverifiable=req.unverifiable)
if hasattr(req, 'timeout'):
new_req.timeout = req.timeout
return new_req

Possibly related: #28705

@0xced
Copy link
Contributor

0xced commented Dec 12, 2021

I found that this issue is fixed in the active yt-dlp fork.

@pukkandan
Copy link
Contributor

relevant patch if anyone is interested yt-dlp/yt-dlp@d255823

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants