New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[extractor/tiktok] Fix TikTokUserIE #4996
base: master
Are you sure you want to change the base?
Conversation
Will look into tests again tomorrow, just wanted to put this out there |
I switched the first test to therock because there is an issue where some videos cannot be extracted even though they are available. This is a video extraction problem- unrelated to user extraction logic |
Some regressions with this new method:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
“Heart count” should be changed to the like count.
Hello! I'm interested in this change and am not having any luck running this fork on either macOS or Linux. It's totally possible that I'm doing something wrong here and just need to be pointed in the right direction - my goal is to be able to pull/archive all of my own videos (i currently use Install steps (both systems):
Linux:
macOS:
On macOS the chromium process seems to hang around in the background until it is killed manually, whereas on Linux it's dead before I can |
@bahamas10 hmm I will take a look on macOS |
I got the exact same errors as @bahamas10 described on both Linux & macOS. :( |
I also had the same error occur, it appears I only needed to install chromium dependencies. Retriving user id
[INFO] Starting Chromium download.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 109M/109M [00:00<00:00, 239Mb/s]
[INFO] Beginning extraction
[INFO] Chromium extracted to: /home/linux-user/.config/local/share/pyppeteer/local-chromium/588429
ERROR: Browser closed unexpectedly:
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/home/linux-user/.local/lib/python3.8/site-packages/pyppeteer/launcher.py", line 153, in _close_process
self._loop.run_until_complete(self.killChrome())
File "/usr/lib/python3.8/asyncio/base_events.py", line 591, in run_until_complete
self._check_closed()
File "/usr/lib/python3.8/asyncio/base_events.py", line 508, in _check_closed
raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
sys:1: RuntimeWarning: coroutine 'Launcher.killChrome' was never awaited
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
[tiktok:user] tiktok: Downloading user embed
[tiktok:user] 7151807169850051883: Downloading video feed
[tiktok:user] Downloading signature function
ERROR: Browser closed unexpectedly:
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/home/linux-user/.local/lib/python3.8/site-packages/pyppeteer/launcher.py", line 153, in _close_process
self._loop.run_until_complete(self.killChrome())
File "/usr/lib/python3.8/asyncio/base_events.py", line 591, in run_until_complete
self._check_closed()
File "/usr/lib/python3.8/asyncio/base_events.py", line 508, in _check_closed
raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
sys:1: RuntimeWarning: coroutine 'Launcher.killChrome' was never awaited
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
To fix it I just followed this answer The only difference was updating the chromium path, that depends on your home dir and chromium version -ldd ~/.local/share/pyppeteer/local-chromium/588429/chrome-linux/chrome | grep 'not found'
+ldd ~/.config/local/share/pyppeteer/local-chromium/588429/chrome-linux/chrome | grep 'not found'
Then I just followed the instructions to install google chrome and the extractor worked! note: I am using a vps, so that's probably why I didn't have the required dependencies. |
i'm getting "ERROR: remove loop argument"?
|
This comment was marked as spam.
This comment was marked as spam.
This works pretty well, thanks so much for putting this together! My only request at this would be to adjust the loop logic so that if one video is unavailable for one reason or another, it notes the failure (if desired), but continues to download other videos from the user's profile. This may be just me, but I'd rather the executable download all possible videos from the profile first, and then exit once all videos in the profile have been tried (whether with exit code 0 or otherwise), rather than it just stopping partway through the profile because it encountered a single inaccessible video. For reference, I'm working from macOS 12.6, using a freshly compiled version of the relevant branch for this PR. Strangely enough, I'm not encountering either of the errors that @bahamas10 or @zulc22 are. However:
Extract from, for example, running the same command as @bahamas10: [tiktok:user] 7134728030202875182: Downloading video feed
ERROR: 7134728030202875182: Unable to find video in feed; please report this issue on https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using yt-dlp -U
Traceback (most recent call last):
File "/Users/julian/rr-ytdlp/yt-dlp/yt_dlp/YoutubeDL.py", line 1477, in wrapper
return func(self, *args, **kwargs)
File "/Users/julian/rr-ytdlp/yt-dlp/yt_dlp/utils.py", line 2999, in <lambda>
return type(self.ydl)._handle_extraction_exceptions(lambda _, i: self._entries[i])(self.ydl, i)
File "/Users/julian/rr-ytdlp/yt-dlp/yt_dlp/utils.py", line 2769, in __getitem__
self._cache.extend(itertools.islice(self._iterable, n))
File "/Users/julian/rr-ytdlp/yt-dlp/yt_dlp/extractor/tiktok.py", line 675, in _entries_api
**self._extract_aweme_app(video['id']),
File "/Users/julian/rr-ytdlp/yt-dlp/yt_dlp/extractor/tiktok.py", line 534, in _extract_aweme_app
raise ExtractorError('Unable to find video in feed', video_id=aweme_id)
yt_dlp.utils.ExtractorError: 7134728030202875182: Unable to find video in feed; please report this issue on https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using yt-dlp -U
[tiktok:user] Playlist bahamas10_: Downloading 28 videos of 27 |
@julian45 are you using python 3.10? |
@zulc22 Yes, I'm using version 3.10.8. |
So does anyone have idea what causes
error described above by @bahamas10 and @zulc22? I'm also experiencing it |
Replaced pyppeteer with playwright since pyppeteer will not be receiving fixes for the problems we encountered. The API requirements changed, but it looks like a headless browser is still required. The response is blank if I don't send requests from a browser. Downside is you will need to run |
Thanks! Unfortunately, I'm encountering the same issues as I did on Oct. 19; although it begins to download en masse, it still stops upon encountering any video with missing sound. For example, with @bahamas10's issue above, I am able to download 37 videos (which makes sense, as 10 have been posted since Oct. 19 and I was able to [actually] download 27 then), but no more: sh-3.2$ ~/rr-ytdlp/yt-dlp https://www.tiktok.com/@bahamas10_
[tiktok:user] bahamas10_: Downloading user embed
[tiktok:user] 7164841989530373422: Downloading video feed
[tiktok:user] Downloading page 1
[tiktok:user] Downloading page 2
[tiktok:user] Downloading page 3
[tiktok:user] Downloading page 4
[tiktok:user] Downloading page 5
[tiktok:user] Downloading page 6
[tiktok:user] Downloading page 7
[tiktok:user] Downloading page 8
[tiktok:user] Downloading page 9
[tiktok:user] Downloading page 10
[tiktok:user] Downloading page 11
[tiktok:user] Downloading page 12
[tiktok:user] Downloading page 13
[download] Downloading playlist: bahamas10_
[tiktok:user] 7164841989530373422: Downloading video feed
[tiktok:user] 7164821593967889710: Downloading video feed
[tiktok:user] 7164209740883430699: Downloading video feed
[tiktok:user] 7162680518335548715: Downloading video feed
[tiktok:user] 7162234814542728491: Downloading video feed
[tiktok:user] 7158503684312157482: Downloading video feed
[tiktok:user] 7158241982349987115: Downloading video feed
[tiktok:user] 7157457925466754347: Downloading video feed
[tiktok:user] 7156762277989895466: Downloading video feed
[tiktok:user] 7156734153147305262: Downloading video feed
[tiktok:user] 7154941390693109035: Downloading video feed
[tiktok:user] 7154172716684152106: Downloading video feed
[tiktok:user] 7152944176110423342: Downloading video feed
[tiktok:user] 7152547553580453163: Downloading video feed
[tiktok:user] 7151936193779748142: Downloading video feed
[tiktok:user] 7151925545683422507: Downloading video feed
[tiktok:user] 7151205459649645870: Downloading video feed
[tiktok:user] 7150785587887295786: Downloading video feed
[tiktok:user] 7150022740764806446: Downloading video feed
[tiktok:user] 7146825795997093166: Downloading video feed
[tiktok:user] 7146267469533990190: Downloading video feed
[tiktok:user] 7145281423388331307: Downloading video feed
[tiktok:user] 7144474645620722986: Downloading video feed
[tiktok:user] 7143719905806994734: Downloading video feed
[tiktok:user] 7143411265371704622: Downloading video feed
[tiktok:user] 7143001307996114219: Downloading video feed
[tiktok:user] 7142661476354886955: Downloading video feed
[tiktok:user] 7142275191962586414: Downloading video feed
[tiktok:user] 7142244394500902186: Downloading video feed
[tiktok:user] 7141013832662699310: Downloading video feed
[tiktok:user] 7140135300482862382: Downloading video feed
[tiktok:user] 7139705568700140846: Downloading video feed
[tiktok:user] 7139585213042150699: Downloading video feed
[tiktok:user] 7139548951212002603: Downloading video feed
[tiktok:user] 7138852612786654510: Downloading video feed
[tiktok:user] 7138148822366096682: Downloading video feed
[tiktok:user] 7137743845999234350: Downloading video feed
[tiktok:user] 7134728030202875182: Downloading video feed
ERROR: 7134728030202875182: Unable to find video in feed; please report this issue on https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using yt-dlp -U
[tiktok:user] Playlist bahamas10_: Downloading 38 items of 37 I can confirm, when browsing the profile, that the 38th video from the top as of the time of writing has no sound. |
This extractor only passes along video ids to another extractor, that is an issue with the individual video extractor. FYI each "page" downloaded contains 30 video ids now. At least, I'm fairly sure it has nothing to do with the user extractor lol |
@julian45 TikTokIE throws an ExtractorError when a video is not found. It says unavailable when I try and open it on tiktok.com. Maybe that is not the proper way of handling this? Is this not a problem with other extractors? |
I'm not sure, to be honest. In the mobile apps, one can still view the video and all its associated data and metadata, just no sound. I don't know why they'd make that possible in one context and throw 404s in another. As for other extractors, I haven't used enough of them to give a good answer; I mostly use this one, YouTube, and occasionally Twitter. |
I wrapped the TikTokIE function to suppress the error into a warning. Not sure if this is the best way of going about things, but it fixed the issue on my end. Try it now |
That seems to have done the trick — I don't have enough time tonight to let bahamas10_'s profile download run in full, but it definitely isn't stopping at soundless videos. Unfortunately, this new way of downloading seems to preclude a nice side-feature of the days of yore (i.e., before TikTok changed their API a bunch) where once you reached a video in the feed that you'd already downloaded, it would simply log an informative message letting you know as much and then continue to the next video; this was a nice tell for knowing when to stop the process once all new videos were downloaded. In this system, it looks like it basically gets the metadata for every single video in the profile, then tries downloading them all en masse; if that's what works, that's what works, but alas. My Python is a bit rusty, but I'd like to test if it'd be possible to move the video download calls into the same loop as the one that grabs video IDs. Your solution is definitely workable, and I mean no ill will to you whatsoever re: the situation described above — I'd simply like to experiment a bit myself to try to get one particular bit of functionality, and if that doesn't work out, there'll still be a well-functioning download solution to use. Thanks for your hard work in figuring all of this out!!! |
Probably a good idea, I also notice skipping is more delayed |
I can report that I had the same issue as julian45 and that it is now fixed. Great work! I'm also echoing a similar thought about the old days; it was possible to use Thanks again for your work so far. |
Made some changes:
|
I am using this fix (or more exactly this here: https://github.com/TheMrRandomDude/yt-dlp-tiktok-scraper-fixed which is basically the same) to download TikTok videos of some users. While most of the users work (celebs with 1M+ but also normal with only few hundred followers) I found three users I am unable to download anything: |
@tomperchtold The TikTok HTML embeds are throwing 400 errors (ex. https://tiktok.com/embed/@username). This page is required to fetch a user identifier. In this case, you may manually supply the user id. To find the user id for one of these accounts, you first need to open the browser developer tools (on the profile page like you linked). Then, swap to the "Network" tab and filter the contents to XHR. Refresh the page and find a request for Click on Response and then navigate through the JSON structure to userInfo -> user -> secUid. Copy the value of secUid. Then, substitute it into this additional argument (example): yt-dlp --extractor-args "tiktok:secuid=PASTE_HERE" https://tiktok.com/@example See https://github.com/redraskal/yt-dlp/tree/fix/tiktok-user#extractor-arguments for the guide on extractor arguments. |
@redraskal I then tried to integrate the value to the --extractor-args "tiktok:secuid=MS4..."
It never even made a .dump file ... |
f56ce39
to
d27bde9
Compare
@tomperchtold I found the problem. The argument was automatically being converted to lowercase & TikTok apparently only checks the case on accounts without embeds enabled 🤷♂️ Example of working syntax:
|
I screwed up the branch somehow and I think it's fixed now 😅 |
Doesn't works for me, stuck on playwright install.
|
Turns out playwright saves browsers to However, I still get another error, this time more deep:
|
wget works, so I doubt it's IP blocking:
|
@chavinlo I really doubt its ip blocking. I can download the profile on Windows. So, I'm guessing this is something with your environment or the x-tt-params header. The x-tt-params header contains operating system details, so TikTok might detect the os in the header is incorrect or possibly the timezone. Can you run yt-dlp with --verbose and send the output? |
@redraskal When I use |
@tomperchtold It may have to do with the account not having a lot of videos. The videos come with the profile page online without calling the API. TikTok might block API usage if they know it could be pre-rendered with captcha. Do you notice this with other accounts that have only a couple of videos? |
@redraskal |
Is there any plans to have the playwright obtain secuid automatically? |
Many videos can be downloaded without watermark on here & this fork does automatically fetch the secuid. Sometimes, the request fails. The extractor argument is to ensure it succeeds even if tiktok blocks the request. |
I graduated from Bing Chat University, but I can't code for sh*t. However, I was able to ask Bing Chat to write this code for me, which can scrape video URLs from a user's profile without requiring a secuid. I'm not sure if it will be helpful since the original code seems to rely on sending API requests and other technical aspects that I don't understand, since I can't code. Lol. It was able to scrape 50000+ url from addisonre account I figure you might be able to find a way to integrate the code to your code since I don't know how to do it. |
There is no way to retrieve a list of videos without an secuid. The TikTok web API does not internally index accounts by username. They use secuid, which you have to retrieve with various methods. My fork uses a method to automatically retrieve the secuid. TikTok will occasionally block this method, which is why I added an argument for fallback. Behind the scenes, the browser in your example runs the same logic from my implementation. Now, it is slower because the browser renders all of the videos while scrolling and uses significantly more resources. My fork uses a minimal amount of browser interaction to achieve the same results. The other problem with the scrolling approach is you risk captcha screening that will break the scraping. In the browser, TikTok supplies the secuid. This page can be blocked by captcha, which is why I designed a new approach to retrieve the secuid without risking captcha. Unfortunately, this solution still risks random failures from TikTok's security measures. But, it's more reliable. |
was able to fix the secuid issue i think, dont need it anymore to download video, if you have the time to double check it, that be great. also please fix the issue with batch file downloading. I was trying to do a batch file download and it didnt work. Do you think this method would work without having to deal with a captcha? Sorry if I'm bothering you. I'm just trying to fix the issue with the secUid and HTTP 400 errors and make it easy for everyone to use without having to deal with these problems. https://www.tiktok.com/@MS4wLjABAAAAqr9sFjn_sxBpBIXokOmQuOkDYAHd2H1vqA5WcNmW8AEpENMmU3YQcbEXLgqocQv1 |
code broken again,
|
ee280c7
to
7aeda6c
Compare
I'm not sure how helpful this information is to @redraskal (thanks so much for all of your time and effort on this thus far!!) or anyone else here, but I tried using this branch a few different ways. No luck on any of them, unfortunately. However, I did put the
|
IMPORTANT: PRs without the template will be CLOSED
Description of your pull request and other information
Fixes #3776
Resolves #1923
Explanation
This PR fixes TikTok user extraction by adding code to fetch the user embed page, pulling the latest user video id (bypassing captcha, contains basic user info + a few videos). Video details are fetched to obtain a secuid (tiktok user identifier). Then, we can use the video listing api with the secuid. I run this api request through a headless browser (see below).
Headless browser
This method requires a headless browser. PhantomJS does not properly function nor does Deno. I use playwright (previously pyppeteer, playwright is better supported). TikTok signature headers do not seem to affect the response and can be ignored. My guess is TikTok fingerprints browsers based on tls hello packets. Playwright prompts you to install browser binaries upon running the tiktok extractor and can be done in advance by running
playwright install
.Fingerprinting?
Looks like the web api is using fingerprinting to block automation like yt-dlp because web api urls that work in browsers were not displaying json, but instead whitespace, with python (identical requests besides tls implementation). Someone suggested ja3 fingerprinting is the cause, meaning we would have to send custom hello packets to tiktok for mitigation.
(Non-issue while using a headless browser)
Template
Before submitting a pull request make sure you have:
In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check one of the following options:
What is the purpose of your pull request?