Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ie/twitter] Fix retweet extraction & remove broken fallback #8016

Merged
merged 11 commits into from
Sep 9, 2023

Conversation

bashonly
Copy link
Member

@bashonly bashonly commented Sep 2, 2023

When the input URL is for a tweet of a retweeted video, the actual tweet that's being reposted should be extracted instead of the content-less retweet itself. This was being done correctly when passing the legacy_api extractor-arg, but the necessary transformations to the GraphQL response were not being made otherwise.

Since I was already rewriting TwitterIE._extract_status(), I cleaned it up by removing the syndication fallback that is now broken. This will make debugging much simpler.

Finally, in the process of running/updating the tests, I realized the TwitterBroadcast extractor was raising an unexpected AttributeError when a broadcast no longer exists, so I added 2 lines to catch that.

Addresses #7850 (comment)

Template

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Copilot Summary

🤖 Generated by Copilot at e8b6fdb

Summary

🐦🎥🔄

Improve Twitter extraction and fix bugs. Add support for retweets, handle missing or private broadcasts, and use legacy API as fallback. Update and add tests for yt_dlp/extractor/twitter.py.

Oh we are the coders of the Twitter sea
We extract the tweets and the videos with glee
We handle the retweets and the broadcasts that are gone
And we use the legacy API when we need to carry on

Walkthrough

  • Remove unused import of functools (link)
  • Add and update test cases for TwitterIE class, testing different scenarios of tweets, retweets, and broadcasts (link, link, link, link, link, link, link)
  • Add _GRAPHQL_ENDPOINT property to TwitterIE class, returning the appropriate GraphQL endpoint depending on login status (link)
  • Modify _graphql_to_legacy method of TwitterIE class, adding support for extracting retweeted status and user from GraphQL response (link)
  • Modify _extract_status method of TwitterIE class, changing the order and logic of API calls, preferring legacy API over GraphQL API if not logged in and legacy_api argument is set, and returning retweeted status if present (link)
  • Modify _real_extract method of TwitterIE class, removing redundant extraction of media id from video URL (link)
  • Modify _real_extract method of TwitterBroadcastIE class, adding a check for the existence of broadcast in API response, and raising extractor error if missing (link)
  • Add comment to _real_extract method of TwitterBroadcastIE class, explaining the purpose of broadcast id extraction (link)

@bashonly bashonly added site-bug Issue with a specific website pending-review PR needs a review labels Sep 2, 2023
@gamer191
Copy link
Collaborator

gamer191 commented Sep 6, 2023

The fallback actually does work-see the message in Discord

@bashonly bashonly merged commit a006ce2 into yt-dlp:master Sep 9, 2023
13 checks passed
@bashonly bashonly removed the pending-review PR needs a review label Sep 9, 2023
aalsuwaidi pushed a commit to aalsuwaidi/yt-dlp that referenced this pull request Apr 21, 2024
@bashonly bashonly deleted the fix/twitter-retweets branch May 10, 2024 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
site-bug Issue with a specific website
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants