New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[tiktok:user] Failed to parse JSON #3776
Comments
I should add that I've tried this from multiple machines with different IPs, so it's not that I've personally been blocked. |
Can confirm this report. Additionally, vm.tiktok.com URL's timeout also. |
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as resolved.
This comment was marked as resolved.
Try Actual code changes are needed for the shortcut (and profile) URLs as the redirect extractor may not see the custom UA. For example, try this yt-dl PR, which currently finds the first 30 videos, and can download at least the first 10kB of at least the first item, with the problem URL, but says
|
Same problem :/ |
Indeed, that is a shortcut URL and so the UA command-line option doesn't solve the problem. |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
Workaround which just uses curl to get the redirect target and then passes it down to yt-dlp for the actual download :
Crude, but worked for me. |
I think a lot of the comments in this thread are related to a different bug than I reported above.
|
Your original problem URL is a profile page. Code changes are needed to fix this for yt-dlp. The yt-dlp extractor bypasses the site's UA block for individual videos by not initially loading the video page, instead using a separate metadata URL if available. |
(we should probably look into changing that behaviour too) |
Did a little bit of triage on this, to add a bit of clarity to the cascading two issues:
Reverting entirely the tiktok.py file to the upstream branch version fixes the first and second problems because it goes back to extracting the urls from the webpage request instead of using the broken metadata api, but you reintroduce the limitation previous people mentioned where only the first ~30 videos are listed because there's no pagination. |
We can't disable this because urllib doesn't support http keep alive / persistent connections. For us that requires #3668 (which this tiktok issue still exists on even with persistent connections) edit: unless you are meaning certain user agents only work if there is no connection: close. |
This comment was marked as duplicate.
This comment was marked as duplicate.
Yeah it's a little counterintuitive but I tested this out (hoisting the do_open() method and then disabling that header) in both Python3 and Postman directly and in either case it worked fine, the connection was still closed. It's some manner of combination of that header plus a user agent that caused the response to hang, I suspect the Akamai CDN servers are behaving badly or something. Pretty peculiar. |
This does not appear to work. The first issue is that a 403 is recieved for the
This can be fixed by adding extpected_status: diff --git a/youtube_dl/extractor/tiktok.py b/youtube_dl/extractor/tiktok.py
index 49df0844b..1aec9386c 100644
--- a/youtube_dl/extractor/tiktok.py
+++ b/youtube_dl/extractor/tiktok.py
@@ -145,7 +145,7 @@ class TikTokIE(TikTokBaseIE):
def _real_initialize(self):
# Setup session (will set necessary cookies)
self._request_webpage(
- 'https://www.tiktok.com/', None, note='Setting up session')
+ 'https://www.tiktok.com/', None, note='Setting up session', expected_status=403)
def _real_extract(self, url):
m = re.match(self._VALID_URL, url).groupdict() But the extraction still fails:
The downloaded webpage is a captcha page and does not have any SIGI_STATE. Here's the dump <!DOCTYPE html>
<html>
<Head>
<meta charset="utf-8">
<title>TikTok</title>
<link rel="shortcut icon" type="image/x-icon" id="favicon">
<meta name="screen-orientation" content="portrait">
<meta name="x5-orientation" content="portrait">
<meta name="format-detection" content="telephone=no">
<meta name="viewport" content="width=device-width, initial-scale=1, user-scalable=no, minimum-scale=1, maximum-scale=1, minimal-ui, viewport-fit=cover">
<meta name="apple-mobile-web-app-capable" content="yes">
<meta name="applicable-device" content="pc,mobile"/>
<link rel="dns-prefetch" href="https://sf16-scmcdn-va.ibytedtos.com" />
<script async src="https://sf16-scmcdn-va.ibytedtos.com/goofy/log-sdk/collect/collect-tcpy.js"></script>
<script>
const option = {"title":"tiktok-verify-page","iid":"0","did":"0","app_name":"tiktok","aid":1284,"favicon":"https://lf16-tiktok-common.ibytedtos.com/obj/tiktok-web-common-sg/mtact/static/images/tiktok-logo/logo.png","mobileIcons":["https://lf16-tiktok-common.ibytedtos.com/obj/tiktok-web-common-sg/mtact/static/images/tiktok-logo/tiktok_m.png","https://lf16-tiktok-common.ibytedtos.com/obj/tiktok-web-common-sg/mtact/static/images/tiktok-logo/tiktok_m2x.png","https://lf16-tiktok-common.ibytedtos.com/obj/tiktok-web-common-sg/mtact/static/images/tiktok-logo/tiktok_m3x.png"],"icons":["https://lf16-tiktok-common.ibytedtos.com/obj/tiktok-web-common-sg/mtact/static/images/tiktok-logo/tiktok_w.png","https://lf16-tiktok-common.ibytedtos.com/obj/tiktok-web-common-sg/mtact/static/images/tiktok-logo/tiktok_w2x.png","https://lf16-tiktok-common.ibytedtos.com/obj/tiktok-web-common-sg/mtact/static/images/tiktok-logo/tiktok_w3x.png"],"region":"va","type":"slide","verifyConfig":{"code":10000,"type":"verify","subtype":"slide","fp":"verify_ba93a25726de3d5ce7d74fa8c489a339","region":"va","detail":"BTNI5LBD0rUnU4MesPjv-u4pTx360fUy*Ged9-rq9vxo0-xG4grMQqu2Vc7yR3A6gvNUN9BKCAa5rKxl1VYVslhMosF8xDDmyMUfh6jHhQ3NxS*IcPKHjuo29sny6nvZ66SS0smUrX0ObNqGq3fYOpP6L5BPsZi7HlnUrJTEuLvHAD3Xhs4qMyrk99XtDQJKq57Oj3Fv12qWd64e1SjJ-NvEiSU7T1s2k5pBZVCFj3ZqMpzIlV3Fo9rSqPCuUnxFIVaxo87ntdgzANL-wawL*VwrhoGAECteZEIpdsUT81PnrbV6--T7nnUI2Bj9nVkfvyIQW0OhPVcjINgeSCukUJZi5tNLF0e2SxWuOgLwp-IaguU*7NMwzZCf6rIXWVxacb6v3pH3A1n9WNnxUsoiGFU."},"lang":"en"};
if(!option.region) {
option.region = 'va';
}
var verifyTime = new Date().getTime();
(function(win, export_obj) {
win['TeaAnalyticsObject'] = export_obj;
if (!win[export_obj]) {
function _collect() {
_collect.q.push(arguments);
}
_collect.q = _collect.q || [];
win[export_obj] = _collect;
}
win[export_obj].l = +new Date();
})(window, 'collectEvent');
window.collectEvent('page.init', {
app_id: option.region === 'cn' ? 2018 : 2740,
channel: option.region === 'boe' ? 'cn' : option.region,
log: true,
});
window.collectEvent('page.start');
window.collectEvent('page.verify_page_load', {
aid: option.aid,
product_host: location.host,
product_path: location.pathname,
time: new Date().getTime(),
is_success: 0,
duration: new Date().getTime() - verifyTime
})
window.onbeforeunload = function () {
window.collectEvent(document.readyState === 'complete' ? 'page.verify_page_close' : 'page.verify_page_load_close', {
product_host: location.host,
product_path: location.pathname,
aid: option.aid,
time: new Date().getTime(),
fp: (document.cookie.match(/s_v_web_id=(\w+)/) || [])[1],
is_success: Number(!!window.verify_is_success)
})
}
</script>
<script src="https://lf16-cdn-tos.tiktokcdn-us.com/obj/static-tx/sec_sdk_build/3.6.0/captcha/index.js"></script>
<script async src="https://sf16-muse-va.ibytedtos.com/obj/eden-va2/fviylclsjeh7bogubfbd/tt-webapp/starling.browser.js"></script>
</Head>
<body>
<div class="content">
<div class="app_icon"></div>
<div class="verify-wrap">
<div id="verify-ele"></div>
</div>
<p class="page-desc" id="verifyEle"></p>
</div>
</body>
<style>
html, body {
min-height: 500px;
}
body {
background: #EDF0F5;
display: flex;
}
.content {
width: 300px;
margin: auto;
}
.verify-wrap {
min-height: 306px;
}
@media (min-width: 1281px) {
.content {
width: 380px;
}
.verify-wrap {
min-height: 386px;
}
}
.captcha_verify_container {
border: 1px solid #E8E8E8;
}
.captcha_verify_container #verify-bar-close {
display: none;
}
.app_icon img {
margin-bottom: 20px;
}
.page-desc{
margin-top: 32px;
font-family: PingFangSC-Medium;
font-size: 12px;
color: #505050;
line-height: 19px;
}
.page-desc span {
font-family: 'PingFangSC-Regular';
color: #505050;
}
</style>
<script>
const pageDescKey = 'user_verify_page_description';
const ua = navigator.userAgent;
const isMobile = /Android|webOS|iPhone|iPod|BlackBerry|Windows Phone|iPad/i.test(ua);
const hosts = {
cn: '//verify.snssdk.com',
boe: '//boe-verify.snssdk.com',
sg: '//verify-sg.byteoversea.com',
va: '//verification-va.byteoversea.com',
ttp: '//verification.us.tiktok.com'
};
window.TTGCaptcha.init({
commonOptions: {
aid: option.aid || 0,
did: '0',
iid: '0',
},
captchaOptions: {
ele: 'verify-ele',
host: hosts.ttp,
lang: option.lang.match(/zh/) ? 'zh-Hant' : option.lang,
region: option.region,
app_name: option.app_name || '',
hideCloseBtn: true,
successCb: successCb,
feedbackSubmitCb: feedbackSubmitCb,
autoClose: false
},
});
window.onload = function () {
window.verify_is_success = true;
document.querySelector('#favicon').href = option.favicon;
document.title = option.title || 'security verification';
const icon = isMobile ? option.mobileIcons : option.icons;
if (option.icons || true) {
let img = document.createElement('img');
img.srcset = `${icon[0]} 1x,
${icon[1]} 2x,
${icon[2]} 3x`;
img.alt = 'app logo';
img.src = icon[0];
document.querySelector('.app_icon').appendChild(img);
}
window.collectEvent(location.href === document.referrer ? 'page.verify_page_refresh' : 'page.verify_page_init', {
duration: new Date().getTime() - verifyTime,
aid: option.aid,
region: option.region,
fp: (document.cookie.match(/s_v_web_id=(\w+)/) || [])[1],
is_success: 1,
product_host: location.host,
product_path: location.pathname,
})
const starling = new Starling({
api_key: '5dc26cf008d511e9b571e1bc0c9e23b5',
namespace: 'Captcha',
locale: option.lang,
zone: (option.region || 'SG').toUpperCase(),
test: false,
fallbackLang: ['en'],
});
starling.load((texts) => {
let desc = document.querySelector('#verifyEle');
desc.innerText = texts[pageDescKey];
});
window.TTGCaptcha.render({
verify_data: JSON.stringify({
subtype: option.type,
...(option.verifyConfig || {}),
})
});
// tt embed iframe
if (window.self !== window.top && window.name.indexOf('__tt_embed__') !== -1) {
const resizeData = JSON.stringify({
signalSource: window.name,
height: 600,
});
window.parent.postMessage(resizeData, '*');
}
}
function successCb() {
location.reload();
}
function feedbackSubmitCb() {
window.TTGCaptcha.render({
verify_data: JSON.stringify({
subtype: option.type,
...(option.verifyConfig || {}),
})
});
}
</script>
</html> I am using a VPN (tiktok is banned here), so that could be the issue. But I can access the webpage in browser over the same VPN (it does not ask for captcha). Passing the cookies to youtube-dl gives the same result Can someone confirm whether VPN is the issue? i.e, does the PR work for you? |
@kjerk Can you share the code you used for testing this? |
There must be some discrimination at TT. Here: $ git checkout df-wranai-tiktok-patch
Switched to branch 'df-wranai-tiktok-patch'
$ python -m youtube_dl -F -v 'https://www.tiktok.com/@derekbrunsonmma'
[debug] System config: [u'--prefer-ffmpeg']
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-F', u'-v', u'https://www.tiktok.com/@derekbrunsonmma']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Git HEAD: 2f65e205e
[debug] Python version 2.7.17 (CPython) - Linux-4.4.0-210-generic-i686-with-Ubuntu-16.04-xenial
[debug] exe versions: avconv 4.3, avprobe 4.3, ffmpeg 4.3, ffprobe 4.3
[debug] Proxy map: {}
[tiktok:user] Setting up session
[tiktok:user] derekbrunsonmma: Downloading webpage
WARNING: More videos are available but the current extractor doesn't know how to find them
[download] Downloading playlist: 6805667245747209222
[tiktok:user] playlist 6805667245747209222: Collected 30 video ids (downloading 30 of them)
[download] Downloading video 1 of 30
[tiktok] Setting up session
[tiktok] 7099799828678380842: Downloading webpage
[info] Available formats for 7099799828678380842:
format code extension resolution note
0 mp4 576x1024
[download] Downloading video 2 of 30
[tiktok] 7099586219537108270: Downloading webpage
[info] Available formats for 7099586219537108270:
format code extension resolution note
0 mp4 576x828
[download] Downloading video 3 of 30
[tiktok] 7099505233373465898: Downloading webpage
... And Py3.9 (same for 3.5): $ python3.9 -m youtube_dl --flat-playlist -j -v 'https://www.tiktok.com/@derekbrunsonmma'
[debug] System config: ['--prefer-ffmpeg']
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--flat-playlist', '-j', '-v', 'https://www.tiktok.com/@derekbrunsonmma']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Git HEAD: 2f65e205e
[debug] Python version 3.9.12 (CPython) - Linux-4.4.0-210-generic-i686-with-glibc2.23
[debug] exe versions: avconv 4.3, avprobe 4.3, ffmpeg 4.3, ffprobe 4.3
[debug] Proxy map: {}
WARNING: More videos are available but the current extractor doesn't know how to find them
{"_type": "url", "url": "tiktok:derekbrunsonmma:7099799828678380842", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7099586219537108270", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7099505233373465898", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7099455914255633710", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7099277655979248942", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7099258188624579886", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7099227036148976939", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7099211696870411566", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7099172441154473262", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7099067570380197162", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7098932076711284014", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7098912942279839022", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7098901483139173675", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7098868510356557098", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7098848419925527854", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7098731859403623723", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7098553426669243691", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7098530750685007150", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7098503324227423534", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7098493495358410030", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7098164938190998826", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7098161970817404203", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7098009011781520686", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7097779473386327338", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7097699241228766510", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7097660895177952558", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7097444267131538734", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7097056072573259054", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7097018259224022314", "ie_key": "TikTok"}
{"_type": "url", "url": "tiktok:derekbrunsonmma:7096969440977259818", "ie_key": "TikTok"}
$ If the |
hm.. that would make it very difficult for me to do anything about this issue... Someone would have to make a PR directly to ytdlp |
They could be tied to the UA. |
If yt-dlp is tweaked to send the vanilla UA instead of the FB UA in the code, or indeed whatever yt-dlp sends by default, it fetches derek's page OK for me. But the user list API call fails (empty JSON). |
This is the code I was working with - master...pukkandan:features/tiktok. It should bring us on par with ytdl-org/youtube-dl#30479 (for #3551 as well), but I can't fully test coz of #3776 (comment) 😞 |
The chromium binary should automatically install when using the tiktok extractor |
Right, it did for me. To clarify, I meant I think it'd be a good idea to at least give users a heads up that using this particular extractor downloads a 150 MiB binary to your system, since some users' devices may not have much storage space available, that single binary is roughly 3x the size of the entire yt-dlp repo, some users might have internet/data plans with a quota, etc. Maybe this is just overly cautious on my part, but I'm not sure. |
Users already have a web browser that's at least 10x too big (cf. Netscape 9 for Windows, <6MB download, 2007) and only yt-dlp users who are dedicated TT/DY fiends are likely to want another one for this specific application. Even Node.js is a small fraction of 150MB. But perhaps this is an olde worlde concern when users may be regularly downloading GBs that are immediately discarded. I wondered if there was a lower overhead solution using https://chromedevtools.github.io/devtools-protocol/ but apparently that's what pyppeteer uses anyway. Accepting that this pyppeteer/chromium solution effectively resolves the problem in the issue and is the only successful approach found despite many other attempts, it seems like a setback for the yt-dl[p] approach as well as for the open Web in general. We can't back-port such a solution to yt-dl, whose use cases include overcoming deficiencies of, or actual lack of, a platform web browser. |
It's so much more realistic to use pyppeteer to sign requests. One change to the signing method could set back the extractor weeks if we went ahead with reverse engineering it. It's only used for API requests that are needed to be signed (not bloated web pages). The requests by pyppeteer are actually faster than the requests for video information. TikTok is just not wanting their stuff openly accessible. We can rewrite tls (to emulate browsers), as well as reverse engineer the tiktok signer but that would take a long time and does not seem worth it when we could throw on pyppeteer for this one use case. |
Hey guys, is RedRaskal's solution going to be integrated into the next YT-DLP update? Or is this something we users are meant to integrate separately? |
Not for now. It can be considered once #1354 is done |
Sorry for the bother...but is there some way to use it while you guys are making the necessary decisions? |
Go to redraskal's fork of yt-dlp, and then either |
Also had to add |
User friendly guide
|
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
Welp, I seem to have encountered a regression. Using @redraskal's fork in If I had to guess, this may be related to a behavior I've seen when browsing TikTok's website with a GUI; after scrolling through a certain amount of videos on a given user's profile, it'll prompt you to solve a puzzle, and won't let you scroll to the actual bottom of the page until you've solved that. I've uploaded the list of enumerated URLs and debug output from the binary to this gist. (The info confirming the limited downloads is appended to the very bottom of the first file.) I think I saw something similar, but different, happen with another profile where it gave a distinct error in the debug output, but I want to try to reproduce and log that later this evening. |
from your log it looks like video info extraction via the |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
If you want to learn how to install a git branch with pip, google it. This is not the place to discuss it Reminder to self: Don't unlock this issue again |
fixes yt-dlp/yt-dlp#3776 fixes yt-dlp/yt-dlp#4984 fixes yt-dlp/yt-dlp#5064 fixes yt-dlp/yt-dlp#5706 fixes yt-dlp/yt-dlp#5665
Checklist
Region
USA
Description
Starting earlier today, tiktok user pages started timing out. Downloading an individual video still works, but user pages don't.
For example (in the log below),
yt-dlp.sh "https://www.tiktok.com/@derekbrunsonmma" -vU
times out, butyt-dlp.sh "https://www.tiktok.com/@derekbrunsonmma/video/7098932076711284014" -vU
works fine.Verbose log
The text was updated successfully, but these errors were encountered: