New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ie/twitter] Add fallback, improve error handling #7621
Conversation
yt_dlp/extractor/twitter.py
Outdated
'cards_platform': 'Web-12', | ||
'include_cards': 1, | ||
'include_reply_count': 1, | ||
'include_user_entities': 0, | ||
'tweet_mode': 'extended', | ||
}), 'retweeted_status', None) | ||
elif not self.is_logged_in: | ||
status = self._graphql_to_legacy( | ||
self._call_graphql_api('2ICDjqPd81tulZcYrtpTuQ/TweetResultByRestId', twid), twid) | ||
else: | ||
if self.is_logged_in: | ||
status = self._graphql_to_legacy( | ||
self._call_graphql_api('zZXycP0V6H7m-2r0mOnFcA/TweetDetail', twid), twid) | ||
else: | ||
try: | ||
status = self._graphql_to_legacy( | ||
self._call_graphql_api('2ICDjqPd81tulZcYrtpTuQ/TweetResultByRestId', twid), twid) | ||
except ExtractorError as e: | ||
if self._login_hint() in e.msg or bug_reports_message() not in e.msg: | ||
raise # Do not try fallback when tweet is expected to be unavailable | ||
self.report_warning(e.msg, video_id=twid) | ||
self.report_warning('Falling back to syndication endpoint; some metadata may be missing') | ||
status = self._download_json( | ||
'https://cdn.syndication.twimg.com/tweet-result', twid, 'Downloading syndication JSON', | ||
headers={'User-Agent': 'Googlebot'}, query={'id': twid}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be worth splitting this into a function, but not mandatory
The issue with the guest token was caused by a core regression (fixed by #7648), not by any change that Twitter made. I only realized this last night after I discovered the regression. I do think some of the changes in this PR are still valuable, like checking for protected tweets and adding the syndication fallback. Should I revert the unnecessary changes and continue working with this PR as a site-enhancement? |
Sure. Go with whichever impl you think is better |
Closes yt-dlp#7579, Closes yt-dlp#7625 Authored by: bashonly
Adds syndication fallback for tweet extraction and improves error handling all over
Closes #7579, Closes #7625
Outdated description
(EDIT: The guest token extraction was actually broken by a core regression which has been fixed)
Twitter has apparently decommissioned the guest token API endpoint, and the browser now gets the token from the webpage html instead.
_perform_login
was already doing this, so this patch updates_fetch_guest_token
to always do this. No more guest token endpoint means that "legacy API" tweet extraction is now broken, so the dead code has been removed.with master:
with patch:
Template
Before submitting a pull request make sure you have:
In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:
What is the purpose of your pull request?
Copilot Summary
🤖 Generated by Copilot at b13f71b
Summary
🔄🚀🔢
Improve the performance and reliability of the
twitter.py
extractor by streamlining the authentication process. Use the webpage to get the guest token and reduce redundant requests.Walkthrough