Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ie/tiktok:user] Fix extractor #9661

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

bashonly
Copy link
Member

@bashonly bashonly commented Apr 10, 2024

WIP, TODO:

  • try a version without prefix URLs, e.g. https://www.tiktok.com/@SEC_UID
  • fix video webpage urls
  • investigate validity of video webpage urls with sec_uid in place of username
  • simplify sec_uid extraction?

Closes #3776

Template

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

  • I am the original author of this code and I am willing to release it under Unlicense

What is the purpose of your pull request?

  • Fix or improvement to an extractor (Make sure to add/update tests)

Authored by: bashonly
Authored by: bashonly
@bashonly bashonly added the site-bug Issue with a specific website label Apr 10, 2024
@bashonly bashonly marked this pull request as draft April 10, 2024 15:38

old_cursor = cursor
cursor = traverse_obj(
response, ('itemList', -1, 'createTime', {lambda x: x * 1E3}, {int_or_none}))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
response, ('itemList', -1, 'createTime', {lambda x: x * 1E3}, {int_or_none}))
response, ('itemList', -1, 'createTime', {lambda x: int_or_none(x, invscale=1E3}))

or actually, just

Suggested change
response, ('itemList', -1, 'createTime', {lambda x: x * 1E3}, {int_or_none}))
response, ('itemList', -1, 'createTime', {lambda x: int(x * 1E3}))

cursor = traverse_obj(
response, ('itemList', -1, 'createTime', {lambda x: x * 1E3}, {int_or_none}))
if not cursor:
cursor = old_cursor - 604800000 # jump 1 week back in time
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cursor = old_cursor - 604800000 # jump 1 week back in time
cursor = old_cursor - 7 * 86_400_000 # jump 1 week back in time

is more readable imo. As for the comment, it may be more useful to explain "why 1 week"

_WORKING = False
_VALID_URL = [
r'https?://(?:www\.)?tiktok\.com/@(?P<id>[\w\.-]+)/?(?:$|[#?])',
r'tiktokuser:(?P<id>MS4wLjABAAAA[\w-]{64})',
Copy link
Member

@pukkandan pukkandan Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do something like this if you think it's useful

Suggested change
r'tiktokuser:(?P<id>MS4wLjABAAAA[\w-]{64})',
r'tiktokuser:(?P<id>MS4wLjABAAAA[\w-]{64})(?:@(?P<username>[\w.-]+))?',

_VALID_URL = r'https?://(?:www\.)?tiktok\.com/@(?P<id>[\w\.-]+)/?(?:$|[#?])'
_WORKING = False
_VALID_URL = [
r'https?://(?:www\.)?tiktok\.com/@(?P<id>[\w\.-]+)/?(?:$|[#?])',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
r'https?://(?:www\.)?tiktok\.com/@(?P<id>[\w\.-]+)/?(?:$|[#?])',
r'https?://(?:www\.)?tiktok\.com/@(?P<id>[\w.-]+)/?(?:$|[#?])',

_TESTS = [{
'url': 'https://tiktok.com/@corgibobaa?lang=en',
'playlist_mincount': 45,
'info_dict': {
'id': '6935371178089399301',
'id': 'MS4wLjABAAAAepiJKgwWhulvCpSuUVsp7sgVVsFJbbNaLeQ6OQ0oAJERGDUIXhb2yxxHZedsItgT',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said on discord, this isn't necessary to get tiktokuser: working, but I leave it to your discretion

'secUid': sec_uid,
'type': '1', # pagination type: 0 == oldest-to-newest, 1 == newest-to-oldest
'tz_name': 'UTC',
'verifyFp': 'verify_%s' % ''.join(random.choices(string.hexdigits, k=7)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'verifyFp': 'verify_%s' % ''.join(random.choices(string.hexdigits, k=7)),
'verifyFp': f'verify_{"".join(random.choices(string.hexdigits, k=7))}',

Comment on lines +892 to +898
return traverse_obj(
self._get_universal_data(webpage, user_name),
('webapp.user-detail', 'userInfo', 'user', 'secUid', {str})) or traverse_obj(
self._get_sigi_state(webpage, user_name),
('LiveRoom', 'liveRoomUserInfo', 'user', 'secUid'),
('UserModule', 'users', ..., 'secUid'),
get_all=False, expected_type=str)
Copy link
Member

@pukkandan pukkandan Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I try to avoid this type of indenting where it's not obvious where one traverse_obj ends and next begins. But not a big deal

Suggested change
return traverse_obj(
self._get_universal_data(webpage, user_name),
('webapp.user-detail', 'userInfo', 'user', 'secUid', {str})) or traverse_obj(
self._get_sigi_state(webpage, user_name),
('LiveRoom', 'liveRoomUserInfo', 'user', 'secUid'),
('UserModule', 'users', ..., 'secUid'),
get_all=False, expected_type=str)
return (traverse_obj(self._get_universal_data(webpage, user_name),
('webapp.user-detail', 'userInfo', 'user', 'secUid', {str}))
or traverse_obj(self._get_sigi_state(webpage, user_name),
('LiveRoom', 'liveRoomUserInfo', 'user', 'secUid', {str}),
('UserModule', 'users', ..., 'secUid', {str}, {any})))

Comment on lines +901 to +905
user_name, sec_uid = None, None
if url.startswith('tiktokuser:'):
sec_uid = self._match_id(url)
else:
user_name = self._match_id(url)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
user_name, sec_uid = None, None
if url.startswith('tiktokuser:'):
sec_uid = self._match_id(url)
else:
user_name = self._match_id(url)
if url.startswith('tiktokuser:'):
sec_uid, user_name = self._match_id(url), None
else:
sec_uid, user_name = None, self._match_id(url)

Comment on lines +907 to +914
if not sec_uid:
for user_url, msg in (
(self._UPLOADER_URL_FORMAT % user_name, 'user'),
(self._UPLOADER_URL_FORMAT % f'{user_name}/live', 'live'),
):
sec_uid = self._get_sec_uid(user_url, user_name, msg)
if sec_uid:
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if not sec_uid:
for user_url, msg in (
(self._UPLOADER_URL_FORMAT % user_name, 'user'),
(self._UPLOADER_URL_FORMAT % f'{user_name}/live', 'live'),
):
sec_uid = self._get_sec_uid(user_url, user_name, msg)
if sec_uid:
break
for user_url, msg in (
(self._UPLOADER_URL_FORMAT % user_name, 'user'),
(self._UPLOADER_URL_FORMAT % f'{user_name}/live', 'live'),
):
if sec_uid:
break
sec_uid = self._get_sec_uid(user_url, user_name, msg)

@bashonly bashonly closed this May 10, 2024
@bashonly bashonly deleted the fix/tiktok-creator branch May 10, 2024 19:07
@bashonly bashonly restored the fix/tiktok-creator branch May 10, 2024 19:08
@bashonly bashonly reopened this May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
site-bug Issue with a specific website
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[tiktok:user] Failed to parse JSON
2 participants