Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tiktok:user] Failed to parse JSON #3776

Open
7 tasks done
rsamay opened this issue May 18, 2022 · 135 comments · May be fixed by #4996 or #9661
Open
7 tasks done

[tiktok:user] Failed to parse JSON #3776

rsamay opened this issue May 18, 2022 · 135 comments · May be fixed by #4996 or #9661
Labels
site-bug Issue with a specific website

Comments

@rsamay
Copy link

rsamay commented May 18, 2022

Checklist

Region

USA

Description

Starting earlier today, tiktok user pages started timing out. Downloading an individual video still works, but user pages don't.

For example (in the log below), yt-dlp.sh "https://www.tiktok.com/@derekbrunsonmma" -vU times out, but yt-dlp.sh "https://www.tiktok.com/@derekbrunsonmma/video/7098932076711284014" -vU works fine.

Verbose log

$ ~/yt-dlp/yt-dlp.sh "https://www.tiktok.com/@derekbrunsonmma" -vU
[debug] Command-line config: ['https://www.tiktok.com/@derekbrunsonmma', '-vU']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version 2022.05.18 [b14d52355] (source)
[debug] Lazy loading extractors is disabled
[debug] Plugins: ['SamplePluginIE', 'SamplePluginPP']
[debug] Git HEAD: 926ccc84e
[debug] Python version 3.8.10 (CPython 64bit) - Linux-5.4.0-1029-aws-x86_64-with-glibc2.29
[debug] Checking exe version: ffprobe -bsfs
[debug] Checking exe version: ffmpeg -bsfs
[debug] exe versions: ffmpeg 4.2.4, ffprobe 4.2.4
[debug] Optional libraries: Cryptodome-3.13.0, certifi-2019.11.28, mutagen-1.45.1, secretstorage-2.3.1, sqlite3-2.6.0, websockets-10.1
[debug] Proxy map: {}
Latest version: 2022.05.18, Current version: 2022.05.18
yt-dlp is up to date (2022.05.18)
[debug] [tiktok:user] Extracting URL: https://www.tiktok.com/@derekbrunsonmma
[tiktok:user] derekbrunsonmma: Downloading webpage
ERROR: [tiktok:user] derekbrunsonmma: Unable to download webpage: The read operation timed out (caused by timeout('The read operation timed out')); please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U
  File "/home/ubuntu/yt-dlp/yt_dlp/extractor/common.py", line 642, in extract
    ie_result = self._real_extract(url)
  File "/home/ubuntu/yt-dlp/yt_dlp/extractor/tiktok.py", line 629, in _real_extract
    webpage = self._download_webpage(url, user_name, headers={
  File "/home/ubuntu/yt-dlp/yt_dlp/extractor/common.py", line 933, in _download_webpage
    res = self._download_webpage_handle(
  File "/home/ubuntu/yt-dlp/yt_dlp/extractor/common.py", line 801, in _download_webpage_handle
    urlh = self._request_webpage(url_or_request, video_id, note, errnote, fatal, data=data, headers=headers, query=query, expected_status=expected_status)
  File "/home/ubuntu/yt-dlp/yt_dlp/extractor/common.py", line 786, in _request_webpage
    raise ExtractorError(errmsg, cause=err)

  File "/home/ubuntu/yt-dlp/yt_dlp/extractor/common.py", line 768, in _request_webpage
    return self._downloader.urlopen(url_or_request)
  File "/home/ubuntu/yt-dlp/yt_dlp/YoutubeDL.py", line 3596, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "/usr/lib/python3.8/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/usr/lib/python3.8/urllib/request.py", line 542, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/home/ubuntu/yt-dlp/yt_dlp/utils.py", line 1419, in https_open
    return self.do_open(
  File "/usr/lib/python3.8/urllib/request.py", line 1358, in do_open
    r = h.getresponse()
  File "/usr/lib/python3.8/http/client.py", line 1348, in getresponse
    response.begin()
  File "/usr/lib/python3.8/http/client.py", line 316, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.8/http/client.py", line 277, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.8/socket.py", line 669, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib/python3.8/ssl.py", line 1241, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib/python3.8/ssl.py", line 1099, in read
    return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out
@rsamay rsamay added site-bug Issue with a specific website triage Untriaged issue labels May 18, 2022
@rsamay
Copy link
Author

rsamay commented May 18, 2022

I should add that I've tried this from multiple machines with different IPs, so it's not that I've personally been blocked.

@werid
Copy link

werid commented May 18, 2022

Can confirm this report. Additionally, vm.tiktok.com URL's timeout also.

@cgentry1972

This comment was marked as duplicate.

@ruizlenato

This comment was marked as resolved.

@dirkf
Copy link
Contributor

dirkf commented May 18, 2022

Try --add-header 'user-agent:Mozilla/5.0' (aka --user-agent 'Mozilla/5.0').

Actual code changes are needed for the shortcut (and profile) URLs as the redirect extractor may not see the custom UA. For example, try this yt-dl PR, which currently finds the first 30 videos, and can download at least the first 10kB of at least the first item, with the problem URL, but says

WARNING: More videos are available but the current extractor doesn't know how to find them

@ruizlenato
Copy link

ruizlenato commented May 18, 2022

Try --add-header 'user-agent:Mozilla/5.0' (aka --user-agent 'Mozilla/5.0').

Actual code changes are needed for the shortcut URLs as the redirect extractor may not see the custom UA. For example, try this yt-dl PR, which currently finds the first 30 videos, and can download at least the first 10kB of at least the first item, with the problem URL, but says

WARNING: More videos are available but the current extractor doesn't know how to find them

❯ yt-dlp https://vm.tiktok.com/ZML3GMY6F --add-header 'user-agent:Mozilla/5.0' -vU
[debug] Command-line config: ['https://vm.tiktok.com/ZML3GMY6F', '--add-header', 'user-agent:Mozilla/5.0', '-vU']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, err utf-8, pref UTF-8
[debug] yt-dlp version 2022.04.08 [7884ade65]
[debug] Python version 3.10.4 (CPython 64bit) - Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
[debug] Checking exe version: ffmpeg -bsfs
[debug] Checking exe version: avconv -bsfs
[debug] Checking exe version: ffprobe -bsfs
[debug] Checking exe version: avprobe -bsfs
[debug] exe versions: none
[debug] Optional libraries: brotli, certifi, Cryptodome, mutagen, sqlite, websockets
[debug] Proxy map: {}
Latest version: 2022.05.18, Current version: 2022.04.08
ERROR: It looks like you installed yt-dlp with a package manager, pip or setup.py; Use that to update
[debug] [vm.tiktok] Extracting URL: https://vm.tiktok.com/ZML3GMY6F
[vm.tiktok] ZML3GMY6F: Downloading webpage
ERROR: [vm.tiktok] ZML3GMY6F: Unable to download webpage: The read operation timed out (caused by TimeoutError('The read operation timed out')); please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U
  File "/home/renato/.local/lib/python3.10/site-packages/yt_dlp/extractor/common.py", line 641, in extract
    ie_result = self._real_extract(url)
  File "/home/renato/.local/lib/python3.10/site-packages/yt_dlp/extractor/tiktok.py", line 894, in _real_extract
    new_url = self._request_webpage(
  File "/home/renato/.local/lib/python3.10/site-packages/yt_dlp/extractor/common.py", line 785, in _request_webpage
    raise ExtractorError(errmsg, cause=err)

  File "/home/renato/.local/lib/python3.10/site-packages/yt_dlp/extractor/common.py", line 767, in _request_webpage
    return self._downloader.urlopen(url_or_request)
  File "/home/renato/.local/lib/python3.10/site-packages/yt_dlp/YoutubeDL.py", line 3601, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "/usr/lib/python3.10/urllib/request.py", line 525, in open
    response = meth(req, response)
  File "/usr/lib/python3.10/urllib/request.py", line 634, in http_response
    response = self.parent.error(
  File "/usr/lib/python3.10/urllib/request.py", line 557, in error
    result = self._call_chain(*args)
  File "/usr/lib/python3.10/urllib/request.py", line 496, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.10/urllib/request.py", line 749, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/usr/lib/python3.10/urllib/request.py", line 519, in open
    response = self._open(req, data)
  File "/usr/lib/python3.10/urllib/request.py", line 536, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/usr/lib/python3.10/urllib/request.py", line 496, in _call_chain
    result = func(*args)
  File "/home/renato/.local/lib/python3.10/site-packages/yt_dlp/utils.py", line 1543, in https_open
    return self.do_open(functools.partial(
  File "/usr/lib/python3.10/urllib/request.py", line 1352, in do_open
    r = h.getresponse()
  File "/usr/lib/python3.10/http/client.py", line 1374, in getresponse
    response.begin()
  File "/usr/lib/python3.10/http/client.py", line 318, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.10/http/client.py", line 279, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.10/socket.py", line 705, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib/python3.10/ssl.py", line 1273, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib/python3.10/ssl.py", line 1129, in read
    return self._sslobj.read(len, buffer)
TimeoutError: The read operation timed out

Same problem :/

@dirkf
Copy link
Contributor

dirkf commented May 18, 2022

Indeed, that is a shortcut URL and so the UA command-line option doesn't solve the problem.

@ruizlenato

This comment was marked as resolved.

@paulescobar

This comment was marked as resolved.

@Lrv-dev

This comment was marked as resolved.

@kagutaba256

This comment was marked as duplicate.

@Fahad-BA

This comment was marked as duplicate.

@xavery
Copy link

xavery commented May 18, 2022

Workaround which just uses curl to get the redirect target and then passes it down to yt-dlp for the actual download :

curl -o /dev/null --silent https://vm.tiktok.com/foobar -w '%{redirect_url}' | yt-dlp -a -

Crude, but worked for me.

@rsamay
Copy link
Author

rsamay commented May 18, 2022

I think a lot of the comments in this thread are related to a different bug than I reported above. This has nothing to do with the vm.tiktok.com domain, [Edit: it might be related, but this fix for that issue does not fix the issue reported in the bug] and the proposed solution does not work for the bug as it is reported.

$ ~/yt-dlp/yt-dlp.sh "https://www.tiktok.com/@derekbrunsonmma" -vU --add-header 'user-agent:Mozilla/5.0'
[debug] Command-line config: ['https://www.tiktok.com/@derekbrunsonmma', '-vU', '--add-header', 'user-agent:Mozilla/5.0']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version 2022.05.18 [b14d52355] (source)
[debug] Lazy loading extractors is disabled
[debug] Plugins: ['SamplePluginIE', 'SamplePluginPP']
[debug] Git HEAD: 926ccc84e
[debug] Python version 3.8.10 (CPython 64bit) - Linux-5.4.0-1029-aws-x86_64-with-glibc2.29
[debug] Checking exe version: ffprobe -bsfs
[debug] Checking exe version: ffmpeg -bsfs
[debug] exe versions: ffmpeg 4.2.4, ffprobe 4.2.4
[debug] Optional libraries: Cryptodome-3.13.0, certifi-2019.11.28, mutagen-1.45.1, secretstorage-2.3.1, sqlite3-2.6.0, websockets-10.1
[debug] Proxy map: {}
Latest version: 2022.05.18, Current version: 2022.05.18
yt-dlp is up to date (2022.05.18)
[debug] [tiktok:user] Extracting URL: https://www.tiktok.com/@derekbrunsonmma
[tiktok:user] derekbrunsonmma: Downloading webpage
ERROR: [tiktok:user] derekbrunsonmma: Unable to download webpage: The read operation timed out (caused by timeout('The read operation timed out')); please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U
  File "/home/ubuntu/yt-dlp/yt_dlp/extractor/common.py", line 642, in extract
    ie_result = self._real_extract(url)
  File "/home/ubuntu/yt-dlp/yt_dlp/extractor/tiktok.py", line 629, in _real_extract
    webpage = self._download_webpage(url, user_name, headers={
  File "/home/ubuntu/yt-dlp/yt_dlp/extractor/common.py", line 933, in _download_webpage
    res = self._download_webpage_handle(
  File "/home/ubuntu/yt-dlp/yt_dlp/extractor/common.py", line 801, in _download_webpage_handle
    urlh = self._request_webpage(url_or_request, video_id, note, errnote, fatal, data=data, headers=headers, query=query, expected_status=expected_status)
  File "/home/ubuntu/yt-dlp/yt_dlp/extractor/common.py", line 786, in _request_webpage
    raise ExtractorError(errmsg, cause=err)

  File "/home/ubuntu/yt-dlp/yt_dlp/extractor/common.py", line 768, in _request_webpage
    return self._downloader.urlopen(url_or_request)
  File "/home/ubuntu/yt-dlp/yt_dlp/YoutubeDL.py", line 3596, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "/usr/lib/python3.8/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/usr/lib/python3.8/urllib/request.py", line 542, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/home/ubuntu/yt-dlp/yt_dlp/utils.py", line 1419, in https_open
    return self.do_open(
  File "/usr/lib/python3.8/urllib/request.py", line 1358, in do_open
    r = h.getresponse()
  File "/usr/lib/python3.8/http/client.py", line 1348, in getresponse
    response.begin()
  File "/usr/lib/python3.8/http/client.py", line 316, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.8/http/client.py", line 277, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.8/socket.py", line 669, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib/python3.8/ssl.py", line 1241, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib/python3.8/ssl.py", line 1099, in read
    return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

@dirkf
Copy link
Contributor

dirkf commented May 18, 2022

Your original problem URL is a profile page. Code changes are needed to fix this for yt-dlp.

The yt-dlp extractor bypasses the site's UA block for individual videos by not initially loading the video page, instead using a separate metadata URL if available.

@coletdjnz
Copy link
Member

coletdjnz commented May 18, 2022

Try --add-header 'user-agent:Mozilla/5.0' (aka --user-agent 'Mozilla/5.0').

Actual code changes are needed for the shortcut (and profile) URLs as the redirect extractor may not see the custom UA. For example, try this yt-dl PR, which currently finds the first 30 videos, and can download at least the first 10kB of at least the first item, with the problem URL, but says

WARNING: More videos are available but the current extractor doesn't know how to find them

--add-header or --user-agent won't override the user-agent that is set in the request by the extractor (which seems to be the one causing issues)

(we should probably look into changing that behaviour too)

@coletdjnz coletdjnz removed the triage Untriaged issue label May 18, 2022
@kjerk
Copy link

kjerk commented May 19, 2022

Did a little bit of triage on this, to add a bit of clarity to the cascading two issues:

Reverting entirely the tiktok.py file to the upstream branch version fixes the first and second problems because it goes back to extracting the urls from the webpage request instead of using the broken metadata api, but you reintroduce the limitation previous people mentioned where only the first ~30 videos are listed because there's no pagination.

@coletdjnz
Copy link
Member

coletdjnz commented May 19, 2022

and then disable the hardcoded "Connection"="close" header with a flag

We can't disable this because urllib doesn't support http keep alive / persistent connections. For us that requires #3668 (which this tiktok issue still exists on even with persistent connections)

edit: unless you are meaning certain user agents only work if there is no connection: close.

@FNAFDEV

This comment was marked as duplicate.

@kjerk
Copy link

kjerk commented May 19, 2022

We can't disable this because urllib doesn't support http keep alive / persistent connections. For us that requires #3668 (which this tiktok issue still exists on even with persistent connections)

edit: unless you are meaning certain user agents only work if there is no connection: close.

Yeah it's a little counterintuitive but I tested this out (hoisting the do_open() method and then disabling that header) in both Python3 and Postman directly and in either case it worked fine, the connection was still closed. It's some manner of combination of that header plus a user agent that caused the response to hang, I suspect the Akamai CDN servers are behaving badly or something. Pretty peculiar.
image

@pukkandan
Copy link
Member

@dirkf

For example, try this ytdl-org/youtube-dl#30479, which currently finds the first 30 videos, and can download at least the first 10kB of at least the first item, with the problem URL

This does not appear to work. The first issue is that a 403 is recieved for the _real_initialize

❯ py -2.7 -m youtube_dl "https://www.tiktok.com/@derekbrunsonmma" -v
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'https://www.tiktok.com/@derekbrunsonmma', u'-v']
[debug] Encodings: locale cp65001, fs mbcs, out cp65001, pref cp65001
[debug] youtube-dl version 2021.12.17
[debug] Git HEAD: 2f65e205e
[debug] Python version 2.7.18 (CPython) - Windows-10-10.0.22000
[debug] exe versions: ffmpeg N-106550-g072101bd52-20220410, ffprobe N-106624-g391ce570c8-20220415, phantomjs 2.1.1
[debug] Proxy map: {}
[tiktok:user] Setting up session
ERROR: Unable to download webpage: HTTP Error 403: Forbidden (caused by HTTPError()); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
  File "youtube_dl\extractor\common.py", line 634, in _request_webpage
    return self._downloader.urlopen(url_or_request)
  File "youtube_dl\YoutubeDL.py", line 2288, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "D:\Programs\scoop\apps\python27\current\lib\urllib2.py", line 435, in open
    response = meth(req, response)
  File "D:\Programs\scoop\apps\python27\current\lib\urllib2.py", line 548, in http_response
    'http', request, response, code, msg, hdrs)
  File "D:\Programs\scoop\apps\python27\current\lib\urllib2.py", line 473, in error
    return self._call_chain(*args)
  File "D:\Programs\scoop\apps\python27\current\lib\urllib2.py", line 407, in _call_chain
    result = func(*args)
  File "D:\Programs\scoop\apps\python27\current\lib\urllib2.py", line 556, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)

This can be fixed by adding extpected_status:

diff --git a/youtube_dl/extractor/tiktok.py b/youtube_dl/extractor/tiktok.py
index 49df0844b..1aec9386c 100644
--- a/youtube_dl/extractor/tiktok.py
+++ b/youtube_dl/extractor/tiktok.py
@@ -145,7 +145,7 @@ class TikTokIE(TikTokBaseIE):
     def _real_initialize(self):
         # Setup session (will set necessary cookies)
         self._request_webpage(
-            'https://www.tiktok.com/', None, note='Setting up session')
+            'https://www.tiktok.com/', None, note='Setting up session', expected_status=403)

     def _real_extract(self, url):
         m = re.match(self._VALID_URL, url).groupdict()

But the extraction still fails:

❯ py -2.7 -m youtube_dl "https://www.tiktok.com/@derekbrunsonmma" -v
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'https://www.tiktok.com/@derekbrunsonmma', u'-v']
[debug] Encodings: locale cp65001, fs mbcs, out cp65001, pref cp65001
[debug] youtube-dl version 2021.12.17
[debug] Git HEAD: 2f65e205e
[debug] Python version 2.7.18 (CPython) - Windows-10-10.0.22000
[debug] exe versions: ffmpeg N-106550-g072101bd52-20220410, ffprobe N-106624-g391ce570c8-20220415, phantomjs 2.1.1
[debug] Proxy map: {}
[tiktok:user] Setting up session
[tiktok:user] derekbrunsonmma: Downloading webpage

The downloaded webpage is a captcha page and does not have any SIGI_STATE. Here's the dump

<!DOCTYPE html>
<html>
    <Head>
        <meta charset="utf-8">
        <title>TikTok</title>
        <link rel="shortcut icon" type="image/x-icon" id="favicon">
        <meta name="screen-orientation" content="portrait">
        <meta name="x5-orientation" content="portrait">
        <meta name="format-detection" content="telephone=no">
        <meta name="viewport" content="width=device-width, initial-scale=1, user-scalable=no, minimum-scale=1, maximum-scale=1, minimal-ui, viewport-fit=cover">
        <meta name="apple-mobile-web-app-capable" content="yes">
        <meta name="applicable-device" content="pc,mobile"/>
        <link rel="dns-prefetch" href="https://sf16-scmcdn-va.ibytedtos.com" />
        <script async src="https://sf16-scmcdn-va.ibytedtos.com/goofy/log-sdk/collect/collect-tcpy.js"></script>
        <script>
            const option = {"title":"tiktok-verify-page","iid":"0","did":"0","app_name":"tiktok","aid":1284,"favicon":"https://lf16-tiktok-common.ibytedtos.com/obj/tiktok-web-common-sg/mtact/static/images/tiktok-logo/logo.png","mobileIcons":["https://lf16-tiktok-common.ibytedtos.com/obj/tiktok-web-common-sg/mtact/static/images/tiktok-logo/tiktok_m.png","https://lf16-tiktok-common.ibytedtos.com/obj/tiktok-web-common-sg/mtact/static/images/tiktok-logo/tiktok_m2x.png","https://lf16-tiktok-common.ibytedtos.com/obj/tiktok-web-common-sg/mtact/static/images/tiktok-logo/tiktok_m3x.png"],"icons":["https://lf16-tiktok-common.ibytedtos.com/obj/tiktok-web-common-sg/mtact/static/images/tiktok-logo/tiktok_w.png","https://lf16-tiktok-common.ibytedtos.com/obj/tiktok-web-common-sg/mtact/static/images/tiktok-logo/tiktok_w2x.png","https://lf16-tiktok-common.ibytedtos.com/obj/tiktok-web-common-sg/mtact/static/images/tiktok-logo/tiktok_w3x.png"],"region":"va","type":"slide","verifyConfig":{"code":10000,"type":"verify","subtype":"slide","fp":"verify_ba93a25726de3d5ce7d74fa8c489a339","region":"va","detail":"BTNI5LBD0rUnU4MesPjv-u4pTx360fUy*Ged9-rq9vxo0-xG4grMQqu2Vc7yR3A6gvNUN9BKCAa5rKxl1VYVslhMosF8xDDmyMUfh6jHhQ3NxS*IcPKHjuo29sny6nvZ66SS0smUrX0ObNqGq3fYOpP6L5BPsZi7HlnUrJTEuLvHAD3Xhs4qMyrk99XtDQJKq57Oj3Fv12qWd64e1SjJ-NvEiSU7T1s2k5pBZVCFj3ZqMpzIlV3Fo9rSqPCuUnxFIVaxo87ntdgzANL-wawL*VwrhoGAECteZEIpdsUT81PnrbV6--T7nnUI2Bj9nVkfvyIQW0OhPVcjINgeSCukUJZi5tNLF0e2SxWuOgLwp-IaguU*7NMwzZCf6rIXWVxacb6v3pH3A1n9WNnxUsoiGFU."},"lang":"en"};

            if(!option.region) {
                option.region = 'va';
            }
            var verifyTime = new Date().getTime();
            (function(win, export_obj) {
                win['TeaAnalyticsObject'] = export_obj;
                if (!win[export_obj]) {
                    function _collect() {
                        _collect.q.push(arguments);
                    }
                    _collect.q = _collect.q || [];
                    win[export_obj] = _collect;            
                }
                win[export_obj].l = +new Date();
            })(window, 'collectEvent');

            window.collectEvent('page.init', {
                app_id: option.region === 'cn' ? 2018 : 2740,
                channel: option.region === 'boe' ? 'cn' : option.region,
                log: true,
            });

            window.collectEvent('page.start');
            window.collectEvent('page.verify_page_load', {
                aid: option.aid,
                product_host: location.host,
                product_path: location.pathname,
                time: new Date().getTime(),
                is_success: 0,
                duration: new Date().getTime() - verifyTime
            })
            window.onbeforeunload = function () {
                window.collectEvent(document.readyState === 'complete' ? 'page.verify_page_close' : 'page.verify_page_load_close', {
                    product_host: location.host,
                    product_path: location.pathname,
                    aid: option.aid,
                    time: new Date().getTime(),
                    fp: (document.cookie.match(/s_v_web_id=(\w+)/) || [])[1],
                    is_success: Number(!!window.verify_is_success)
                })
            }
        </script>
        <script src="https://lf16-cdn-tos.tiktokcdn-us.com/obj/static-tx/sec_sdk_build/3.6.0/captcha/index.js"></script>
        <script async src="https://sf16-muse-va.ibytedtos.com/obj/eden-va2/fviylclsjeh7bogubfbd/tt-webapp/starling.browser.js"></script>
    </Head>
    <body>
        <div class="content">
            <div class="app_icon"></div>
            <div class="verify-wrap">
                <div id="verify-ele"></div>
            </div>
            <p class="page-desc" id="verifyEle"></p>
        </div>
    </body>
    <style>
        html, body {
            min-height: 500px;
        }
        body {
            background: #EDF0F5;
            display: flex;
        }
        .content { 
            width: 300px;
            margin: auto;
        }

        .verify-wrap {
            min-height: 306px;
        }

        @media (min-width: 1281px) {
            .content {
                width: 380px;
            }

            .verify-wrap {
                min-height: 386px;
            }
        }
        
        .captcha_verify_container {
            border: 1px solid #E8E8E8;
        }

        .captcha_verify_container #verify-bar-close {
            display: none;
        }

        .app_icon img {
            margin-bottom: 20px;
        }

        .page-desc{
            margin-top: 32px;
            font-family: PingFangSC-Medium;
            font-size: 12px;
            color: #505050;
            line-height: 19px;
        }

        .page-desc span {
            font-family: 'PingFangSC-Regular';
            color: #505050;

        }
    </style>
    <script>
        const pageDescKey = 'user_verify_page_description';
        const ua = navigator.userAgent;
        const isMobile = /Android|webOS|iPhone|iPod|BlackBerry|Windows Phone|iPad/i.test(ua);
        const hosts = {
            cn: '//verify.snssdk.com',
            boe: '//boe-verify.snssdk.com',
            sg: '//verify-sg.byteoversea.com',
            va: '//verification-va.byteoversea.com',
            ttp: '//verification.us.tiktok.com'
        };

        window.TTGCaptcha.init({
          commonOptions: {
            aid: option.aid || 0,
            did: '0',
            iid: '0',
          },
          captchaOptions: {
            ele: 'verify-ele',
            host: hosts.ttp,
            lang: option.lang.match(/zh/) ? 'zh-Hant' : option.lang,
            region: option.region,
            app_name: option.app_name || '',
            hideCloseBtn: true,
            successCb: successCb,
            feedbackSubmitCb: feedbackSubmitCb,
            autoClose: false
          },
        });
       
        window.onload = function () {
            window.verify_is_success = true;
            document.querySelector('#favicon').href = option.favicon;
            document.title = option.title || 'security verification';
            const icon = isMobile ? option.mobileIcons : option.icons;
            if (option.icons || true) {
                let img = document.createElement('img');
                img.srcset = `${icon[0]} 1x,
                    ${icon[1]} 2x,
                    ${icon[2]} 3x`;
                img.alt = 'app logo';
                img.src = icon[0];
                document.querySelector('.app_icon').appendChild(img);
            }

            window.collectEvent(location.href === document.referrer ? 'page.verify_page_refresh' : 'page.verify_page_init', {
                duration: new Date().getTime() - verifyTime,
                aid: option.aid,
                region: option.region,
                fp: (document.cookie.match(/s_v_web_id=(\w+)/) || [])[1],
                is_success: 1,
                product_host: location.host,
                product_path: location.pathname,
            })

            const starling = new Starling({
                api_key: '5dc26cf008d511e9b571e1bc0c9e23b5',
                namespace: 'Captcha',
                locale: option.lang,
                zone: (option.region || 'SG').toUpperCase(),
                test: false,
                fallbackLang: ['en'],
            });
            starling.load((texts) => {
                let desc = document.querySelector('#verifyEle');
                desc.innerText = texts[pageDescKey];
            });

            window.TTGCaptcha.render({
                verify_data: JSON.stringify({
                  subtype: option.type,
                  ...(option.verifyConfig || {}),
                })
            });

            // tt embed iframe
            if (window.self !== window.top && window.name.indexOf('__tt_embed__') !== -1) {
                const resizeData = JSON.stringify({
                    signalSource: window.name,
                    height: 600,
                });
                window.parent.postMessage(resizeData, '*');
            }
        }

        function successCb() {
            location.reload();
        }

        function feedbackSubmitCb() {
            window.TTGCaptcha.render({
                verify_data: JSON.stringify({
                  subtype: option.type,
                  ...(option.verifyConfig || {}),
                })
            });
        }
    </script>
</html>

I am using a VPN (tiktok is banned here), so that could be the issue. But I can access the webpage in browser over the same VPN (it does not ask for captcha). Passing the cookies to youtube-dl gives the same result

Can someone confirm whether VPN is the issue? i.e, does the PR work for you?

@pukkandan
Copy link
Member

and then disable the hardcoded "Connection"="close" header with a flag, a ye-olde issue from urllib: python/cpython@137fd3d/Lib/urllib/request.py#L1333
After doing that other user-agents and headers work in testing also.

@kjerk Can you share the code you used for testing this?

@dirkf
Copy link
Contributor

dirkf commented May 20, 2022

There must be some discrimination at TT. Here:

$ git checkout df-wranai-tiktok-patch
Switched to branch 'df-wranai-tiktok-patch'
$ python -m youtube_dl -F -v 'https://www.tiktok.com/@derekbrunsonmma'
[debug] System config: [u'--prefer-ffmpeg']
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-F', u'-v', u'https://www.tiktok.com/@derekbrunsonmma']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Git HEAD: 2f65e205e
[debug] Python version 2.7.17 (CPython) - Linux-4.4.0-210-generic-i686-with-Ubuntu-16.04-xenial
[debug] exe versions: avconv 4.3, avprobe 4.3, ffmpeg 4.3, ffprobe 4.3
[debug] Proxy map: {}
[tiktok:user] Setting up session
[tiktok:user] derekbrunsonmma: Downloading webpage
WARNING: More videos are available but the current extractor doesn't know how to find them
[download] Downloading playlist: 6805667245747209222
[tiktok:user] playlist 6805667245747209222: Collected 30 video ids (downloading 30 of them)
[download] Downloading video 1 of 30
[tiktok] Setting up session
[tiktok] 7099799828678380842: Downloading webpage
[info] Available formats for 7099799828678380842:
format code  extension  resolution note
0            mp4        576x1024   
[download] Downloading video 2 of 30
[tiktok] 7099586219537108270: Downloading webpage
[info] Available formats for 7099586219537108270:
format code  extension  resolution note
0            mp4        576x828    
[download] Downloading video 3 of 30
[tiktok] 7099505233373465898: Downloading webpage
...

And Py3.9 (same for 3.5):

$ python3.9 -m youtube_dl --flat-playlist -j -v 'https://www.tiktok.com/@derekbrunsonmma'
[debug] System config: ['--prefer-ffmpeg']
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--flat-playlist', '-j', '-v', 'https://www.tiktok.com/@derekbrunsonmma']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Git HEAD: 2f65e205e
[debug] Python version 3.9.12 (CPython) - Linux-4.4.0-210-generic-i686-with-glibc2.23
[debug] exe versions: avconv 4.3, avprobe 4.3, ffmpeg 4.3, ffprobe 4.3
[debug] Proxy map: {}
WARNING: More videos are available but the current extractor doesn't know how to find them
{"_type": "url", "url": "tiktok:derekbrunsonmma:7099799828678380842", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7099586219537108270", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7099505233373465898", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7099455914255633710", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7099277655979248942", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7099258188624579886", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7099227036148976939", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7099211696870411566", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7099172441154473262", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7099067570380197162", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7098932076711284014", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7098912942279839022", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7098901483139173675", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7098868510356557098", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7098848419925527854", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7098731859403623723", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7098553426669243691", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7098530750685007150", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7098503324227423534", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7098493495358410030", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7098164938190998826", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7098161970817404203", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7098009011781520686", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7097779473386327338", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7097699241228766510", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7097660895177952558", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7097444267131538734", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7097056072573259054", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7097018259224022314", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7096969440977259818", "ie_key": "TikTok"}
$

If the _real_initialize() fails, the later requests will behave as you saw. Fetching the home page should set some state (cookies IIRC) that let the later requests fetch the actual pages instead of being sent to captcha hell.

@pukkandan
Copy link
Member

hm.. that would make it very difficult for me to do anything about this issue... Someone would have to make a PR directly to ytdlp

@dirkf
Copy link
Contributor

dirkf commented May 20, 2022

... Passing the cookies to youtube-dl gives the same result

They could be tied to the UA.

@dirkf
Copy link
Contributor

dirkf commented May 20, 2022

If yt-dlp is tweaked to send the vanilla UA instead of the FB UA in the code, or indeed whatever yt-dlp sends by default, it fetches derek's page OK for me. But the user list API call fails (empty JSON).

@pukkandan
Copy link
Member

This is the code I was working with - master...pukkandan:features/tiktok. It should bring us on par with ytdl-org/youtube-dl#30479 (for #3551 as well), but I can't fully test coz of #3776 (comment) 😞

@redraskal
Copy link
Contributor

Successfully downloaded a whole user account -- nice work! If merged, I think it'd be a good idea to make a note of the pyppeteer dependency, and its possible need to grab a 150 MiB Chromium binary to do the extraction, in the README.md.

The one lingering question I have (and please forgive me if I'm missing something obvious here) is how to specify format for the whole-user download -- for example, how to get 720p all the time. It looks like the format ID changes for each individual video, but there might be a workaround I'm missing here.

The chromium binary should automatically install when using the tiktok extractor

@julian45
Copy link

Successfully downloaded a whole user account -- nice work! If merged, I think it'd be a good idea to make a note of the pyppeteer dependency, and its possible need to grab a 150 MiB Chromium binary to do the extraction, in the README.md.

The one lingering question I have (and please forgive me if I'm missing something obvious here) is how to specify format for the whole-user download -- for example, how to get 720p all the time. It looks like the format ID changes for each individual video, but there might be a workaround I'm missing here.

The chromium binary should automatically install when using the tiktok extractor

Right, it did for me. To clarify, I meant I think it'd be a good idea to at least give users a heads up that using this particular extractor downloads a 150 MiB binary to your system, since some users' devices may not have much storage space available, that single binary is roughly 3x the size of the entire yt-dlp repo, some users might have internet/data plans with a quota, etc. Maybe this is just overly cautious on my part, but I'm not sure.

@dirkf
Copy link
Contributor

dirkf commented Sep 23, 2022

Users already have a web browser that's at least 10x too big (cf. Netscape 9 for Windows, <6MB download, 2007) and only yt-dlp users who are dedicated TT/DY fiends are likely to want another one for this specific application. Even Node.js is a small fraction of 150MB. But perhaps this is an olde worlde concern when users may be regularly downloading GBs that are immediately discarded.

I wondered if there was a lower overhead solution using https://chromedevtools.github.io/devtools-protocol/ but apparently that's what pyppeteer uses anyway.

Accepting that this pyppeteer/chromium solution effectively resolves the problem in the issue and is the only successful approach found despite many other attempts, it seems like a setback for the yt-dl[p] approach as well as for the open Web in general. We can't back-port such a solution to yt-dl, whose use cases include overcoming deficiencies of, or actual lack of, a platform web browser.

@redraskal
Copy link
Contributor

redraskal commented Sep 23, 2022

Users already have a web browser that's at least 10x too big (cf. Netscape 9 for Windows, <6MB download, 2007) and only yt-dlp users who are dedicated TT/DY fiends are likely to want another one for this specific application. Even Node.js is a small fraction of 150MB. But perhaps this is an olde worlde concern when users may be regularly downloading GBs that are immediately discarded.

I wondered if there was a lower overhead solution using https://chromedevtools.github.io/devtools-protocol/ but apparently that's what pyppeteer uses anyway.

Accepting that this pyppeteer/chromium solution effectively resolves the problem in the issue and is the only successful approach found despite many other attempts, it seems like a setback for the yt-dl[p] approach as well as for the open Web in general. We can't back-port such a solution to yt-dl, whose use cases include overcoming deficiencies of, or actual lack of, a platform web browser.

It's so much more realistic to use pyppeteer to sign requests. One change to the signing method could set back the extractor weeks if we went ahead with reverse engineering it. It's only used for API requests that are needed to be signed (not bloated web pages). The requests by pyppeteer are actually faster than the requests for video information.

TikTok is just not wanting their stuff openly accessible.

We can rewrite tls (to emulate browsers), as well as reverse engineer the tiktok signer but that would take a long time and does not seem worth it when we could throw on pyppeteer for this one use case.

@paulescobar
Copy link

Hey guys, is RedRaskal's solution going to be integrated into the next YT-DLP update? Or is this something we users are meant to integrate separately?

@pukkandan
Copy link
Member

Not for now. It can be considered once #1354 is done

@paulescobar
Copy link

Not for now. It can be considered once #1354 is done

Sorry for the bother...but is there some way to use it while you guys are making the necessary decisions?
I'm a noob, but if you point me in the general direction of how to integrate it, I can learn by trial and error.

@julian45
Copy link

Not for now. It can be considered once #1354 is done

Sorry for the bother...but is there some way to use it while you guys are making the necessary decisions?

I'm a noob, but if you point me in the general direction of how to integrate it, I can learn by trial and error.

Go to redraskal's fork of yt-dlp, and then either git clone it and checkout the fix/tiktok-user branch, or select the same branch from the web interface and download the ZIP of that. From there, compile yt-dlp for your OS using the README.md instructions, and you'll have an executable you can use on the command line like usual, except with redraskal's patch applied.

@FractaIism
Copy link

Go to redraskal's fork of yt-dlp, and then either git clone it and checkout the fix/tiktok-user branch, or select the same branch from the web interface and download the ZIP of that. From there, compile yt-dlp for your OS using the README.md instructions, and you'll have an executable you can use on the command line like usual, except with redraskal's patch applied.

Also had to add pyppeteer as a dependency on pyinst.py line 83 so pyinstaller can include this package in the executable.
The updated extractor works perfectly btw, great work!

@zfiggueroa
Copy link

zfiggueroa commented Sep 28, 2022

User friendly guide

  1. Install python-pip

  2. Compile redraskal's fork in a single command:
    python3 -m pip install --force-reinstall https://github.com/redraskal/yt-dlp/archive/refs/heads/fix/tiktok-user.zip

  3. Test it yt-dlp https://www.tiktok.com/@tiktok -v

@github-userx

This comment was marked as spam.

@zfiggueroa

This comment was marked as off-topic.

@julian45

This comment was marked as off-topic.

@zfiggueroa

This comment was marked as off-topic.

@julian45

This comment was marked as off-topic.

@julian45
Copy link

Welp, I seem to have encountered a regression. Using @redraskal's fork in sh, I attempted to download a user page that, when its videos are enumerated into a text file via the JavaScript trick mentioned a while ago in this thread, had 880 videos. However, when I used the compiled binary by itself, I only got 61 videos.

If I had to guess, this may be related to a behavior I've seen when browsing TikTok's website with a GUI; after scrolling through a certain amount of videos on a given user's profile, it'll prompt you to solve a puzzle, and won't let you scroll to the actual bottom of the page until you've solved that.

I've uploaded the list of enumerated URLs and debug output from the binary to this gist. (The info confirming the limited downloads is appended to the very bottom of the first file.) I think I saw something similar, but different, happen with another profile where it gave a distinct error in the debug output, but I want to try to reproduce and log that later this evening.

@bashonly
Copy link
Member

bashonly commented Sep 30, 2022

I attempted to download a user page that, when its videos are enumerated into a text file via the JavaScript trick mentioned a while ago in this thread, had 880 videos. However, when I used the compiled binary by itself, I only got 61 videos.

from your log it looks like video info extraction via the feed endpoint failed on the 62nd video (likely due to #4992), the extractor error broke the playlist entries extraction loop, and the downloader started downloading an incomplete playlist

@snazss

This comment was marked as off-topic.

@snazss

This comment was marked as off-topic.

@NicoWeio

This comment was marked as off-topic.

@snazss

This comment was marked as off-topic.

@snazss

This comment was marked as off-topic.

@Lesmiscore

This comment was marked as off-topic.

@snazss

This comment was marked as off-topic.

@Lesmiscore

This comment was marked as off-topic.

@snazss

This comment was marked as off-topic.

@snazss

This comment was marked as off-topic.

@pukkandan
Copy link
Member

pukkandan commented Oct 7, 2022

If you want to learn how to install a git branch with pip, google it. This is not the place to discuss it

Reminder to self: Don't unlock this issue again

@yt-dlp yt-dlp locked as off-topic and limited conversation to collaborators Oct 7, 2022
TheMrRandomDude added a commit to TheMrRandomDude/tiktok-scraper-yt-dlp-based-easy-to-use that referenced this issue Jan 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
site-bug Issue with a specific website
Projects
Status: tiktok