-
-
Notifications
You must be signed in to change notification settings - Fork 516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
looks like X/twitter(?) broke something again #983
Comments
broke again zedeus/nitter#983
|
Also, the syndication api for |
|
Yes, it is not working now. I hope the Nitter people fix this soon. |
|
Is there a online/CLI tool converting |
Not really. showReplies=false shows years-old content when not logged in. |
|
Down again... |
That's because in that specific example, those tweets were years ago. Look again at the like count, notice anything? |
We can just search for the first |
Indeed #!/usr/bin/python3
import requests
import re
import urllib
url = "https://syndication.twitter.com/srv/timeline-profile/screen-name/elonmusk"
with urllib.request.urlopen(url) as response:
encoding = response.info().get_param('charset', 'utf8')
html = response.read().decode(encoding)
result = re.search('script id="__NEXT_DATA__" type="application\/json">([^>]*)<\/script>', html)[1]
print(result) |
Interesting, but this doesn't return RSS with 'item', 'pubDate' etc. tags. Maybe a script using https://github.com/lkiesow/python-feedgen would do the job? |
Not sure I understand ? It expose far more informations than needed and it does expose the date and all Here's an example for one tweet only : {
"type": "tweet",
"entry_id": "tweet-1519480761749016577",
"sort_index": "1691455400412446720",
"content": {
"tweet": {
"id": 0,
"location": "",
"conversation_id_str": "1519480761749016577",
"created_at": "Thu Apr 28 00:56:58 +0000 2022",
"display_text_range": [
0,
52
],
"entities": {
"user_mentions": [],
"urls": [],
"hashtags": [],
"symbols": [],
"media": []
},
"favorite_count": 4600599,
"favorited": false,
"full_text": "Next I’m buying Coca-Cola to put the cocaine back in",
"id_str": "1519480761749016577",
"lang": "en",
"permalink": "/elonmusk/status/1519480761749016577",
"possibly_sensitive": false,
"quote_count": 171975,
"reply_count": 187438,
"retweet_count": 649833,
"retweeted": false,
"text": "Next I’m buying Coca-Cola to put the cocaine back in",
"user": {
"blocking": false,
"created_at": "Tue Jun 02 20:12:29 +0000 2009",
"default_profile": false,
"default_profile_image": false,
"description": "Blades of Glory",
"entities": {
"description": {
"urls": []
},
"url": {}
},
"fast_followers_count": 0,
"favourites_count": 30569,
"follow_request_sent": false,
"followed_by": false,
"followers_count": 153112066,
"following": false,
"friends_count": 410,
"has_custom_timelines": false,
"highlightedLabel": {
"badge": {
"url": "https://pbs.twimg.com/profile_images/1683899100922511378/5lY42eHs_bigger.jpg"
},
"description": "X",
"userLabelType": "BusinessLabel",
"userLabelDisplayType": "Badge"
},
"id": 0,
"id_str": "44196397",
"is_translator": false,
"listed_count": 126597,
"location": "𝕏Ð",
"media_count": 1659,
"name": "Elon Musk",
"normal_followers_count": 153112066,
"notifications": false,
"profile_banner_url": "https://pbs.twimg.com/profile_banners/44196397/1690621312",
"profile_image_url_https": "https://pbs.twimg.com/profile_images/1683325380441128960/yRsRRjGO_normal.jpg",
"protected": false,
"screen_name": "elonmusk",
"show_all_inline_media": false,
"statuses_count": 29441,
"time_zone": "",
"translator_type": "none",
"url": "",
"utc_offset": 0,
"verified": false,
"withheld_in_countries": [],
"withheld_scope": "",
"is_blue_verified": true
}
}
}
},EDIT : Maybe you meant a directly usable solution for an end user, and of course it's not, the snippet need to be adapted by a dev. |
Ok, thank you, I'll try this. |
calling the syndication URL without being logged in twitter doesn't retrieve the most recent tweets. If I call this url in postman, I retrieve 100 tweets from 10/19/2018 to 07/31/2023; no tweets from august... |
It retrieves the tweets with the highest like count from that user, which doesnt sound good if your goal is retrieving the most recent tweets, as there's no guarantee new tweets will make it to the top 100 tweets from that user. And even if they did, it might take a considerable amount of time |
|
I've noticed that for smaller accounts that have less than 100 tweets, that syndication URL does not load any tweets. |
No. That is the case for all big accounts. I am interested in the most recent Tweets and this approach will lead to nothing. |
im using this for my bot and it working fine with cookies and headers. |
Yes. And there is at least one tweet from August with more likes (>807K) than some older tweets which are included (e.g. <680K). |
|
is there any forecast for solving this problem? |
|
Looks like https://nitter.privacydev.net/ is working |
|
That one is a fork which uses account credentials. See #830 |
|
I am aware but couldn't nitter implement a system that aurora uses with lots of accounts that rotate per user? |
That's hard to maintain and simple for twitter to ban by just filtering "if number of accounts per IP > SOME_CONSTANT: ban all of them" |
User feeds not working on this |
|
I switched to the privacydevel fork, credentials in but its still 404ing the same endpoint upstream is having problems with |
|
Strange. privacydev (without credentials) works more or less for @ElonMusk, but not for other users like for instance @BarackObama. |
Cheers for all the work that you do on Poast bud @animegrafmays Apologies in advance for the plethora of questions but I would appreciate some further tips before I start smashing my head against a wall trying to get a private instance working well enough for my needs. Given your experience running the most reliable Nitter instance as of late I would be very grateful for any advice around accounts usage and I'm sure I'm not the only one. Basically I need to scrape a few hundred accounts, their complete timelines including likes, complete list of followers and following. What would be the best way to go about it? Would it be more effective to whip up some quick scripts that query graphql rather than run a private nitter instance? I'm not as concerned about performance gains as I am with how nitter cycles through the accounts. I wonder if e.g. requesting the timelines in order with sane delays through some custom scripts would help as far as not getting the accounts blacklisted as much. I've also thought about queuing up all the tweets containing media and then spinning up a headless browser separately in order to request them directly from twitter, since you can seemingly still do this much unauthenticated. Or is this overkill? You've mentioned aged accounts faring better but most people, myself included, don't have access to any. Although the scale of private instances is infinitely smaller than the one you run, I'm still wondering what measures I could take to optimize fresh accounts usage. Thank you in advance for any insight which will result in less hair pulling on my part. |
i run private instances for this so they don't have impact on the public instance, you can reach out to me at graf [at] poast.org if you are interested.
for what you are asking, if you are doing a one-time thing, running a quick nitter with some even new-ish accounts would be fine. the limits for accounts older than I think 2018 are doubled whereask those prior to 2012 seem to be tripled or maybe quadrupled. i can't find documentation on this but in practice we run 200 accounts older than 2012 and very rarely run into limited accounts beyond maybe a dozen out of that in a day. older account limits are much, much more lax and definitely the way to go if you can get your hands on them
if you are using RSS it's limited to the cache time of the nitter instance. nitter.poast.org is on 30 minute refresh for rss, staggering requests on a private instance (i.e. request 5 accounts now, 5 more in 5 minutes, etc) would help spread out user account exhaustion but in my experience as I said prior you can more or less ignore this with aged accounts. banned accounts have limits almost identical to those created in the last 12 months so if you can get your hands on a bunch of tokens from those and you're just serving yourself/your own interests its more than adequate. it's a lot easier to find banned/suspended accounts from people than current, active ones
you mean fetching a list of attachments on the tweet and bulk downloading? seems a bit overkill. if you are running a private nitter instance you should be fine without needing to go to this extreme
we happen to have a relatively large userbase (32k users at time of writing) who had donated a bunch of accounts and I have some that I had left over. I also have some surplus I could give you tokens for if you're using a private instance for yourself to get you started. we have 22-25k requests/s avg (just recently had to move it to a ryzen 7 server because the core clock on individual threads wasn't enough even running multiple nitter processes)
to be honest, as long as you aren't listing yourself on the wiki or on the status page you likely wont run into issues. new accounts have about ~200 profile queries each that seem to reset between 8-12 hours, so a private instance with maybe 5 or 6 accounts would do you with staggered scraping for quite some time i dont want to clog up this issue or notify everybody so if you'd like to discuss it further you can email me at the address above |
|
Why are you even engaging in a convo with someone that "needs to scrape a few hundred accounts" and then offering to help with his endeavour? I must be taking crazy pills without knowing. |
|
Twitter's fault for not providing an API for this. There's nothing wrong with scraping. |
I'm not going to take your bait and get bogged down in a pointless debate re the ethics of scraping and how it all might pertain to Twitter's newest policies. From a pragmatic standpoint, several Nitter instance maintainers have mentioned in this very thread that scraping is a big enough problem as far as keeping Nitter usable for regular users. Since I prefer to use Nitter myself for casual lurking rather than having to deal with Twitter's interface and algo, the last thing I wanted to do was contribute to the problem. I believed addressing this out in the open would go a long way towards helping those with this specific need get their own private instances up and running, rather than ruin a good thing for everyone. Taking my own use case as an example, since speed is not a concern it would've certainly been more convenient to scrape graf's instance for weeks rather than have to deal with Twitter socks, but it's a really shitty thing to do. As for why someone might be interested in scraping Twitter on a small scale - several hundred accounts is nothing really despite your scaremongering - there are legitimate reasons. From Twitter's shitty built-in search to archival purposes; some communities sharing precious tidbits on Twitter are already more or less shutting down in protest because Elon man bad. |
i'll break it down for you -- i'd rather somebody scrape an instance i run that nobody else is trying to use than use my main instance, that literally everybody is linking to. cool with you? no? i dont give a shit |
Scraping is why we cant have nice things such as Nitter in the public as it used to work before "some" started to scrap Twitter through Nitter and put Nitter to a halt. |
|
nitter.poast.org sometimes doesn't show search results or individual Tweets. nitter.privacydev.net shows the rate limit error on all searches and account pages even with a full bar of green at the Nitter status. It'd be great if people started working on a solution as opposed to dooming every moment about the death of Nitter 😐 |
|
the reason searches are "sometimes" not shown is due to the amount of accounts limited. there's currently 200 accounts on that instance and because it's serving traffic to almost 200 thousand unique visitors daily more than half of them are rate limited. with that amount of traffic having more than half the accounts rate limited will result in not being able to expand tweets, searches not functioning etc. it's a literal full time job maintaining it and preventing scraping but I am doing what I can including procuring an additional 100 to add to it this week. sorry I can't be good enough for you, @nkfm200 😭😭😭 |
|
I continue to be amazed that you've got this running so long after development ceased |
|
@animegrafmays That I've heard of, the small number of accounts serving a huge number of visitors every day causing rate limits on accounts. Hopefully that extra 100 accounts is there at the time when it's the end of the week. This may be off-topic but I'm just strict about stuff because this is a world where the people living in it refuse to do something unless it fits the rules of the culture |
|
hello, I have opened a new instance with around 500 accounts. feel free to try: https://xcancel.com Most popular RSS readers should also work on that instance. If you can't use it on your favorite rss reader then please send me an email rss [AT] xcancel [DOT] com Thanks. |
|
@unixfox saw this. mad work, thanks for sharing. |
|
fellas, looks like twitter backpedaled something. instances on commits from back last August (before the guest account branch) are working fine https://t.com.sb/dominickmatthew/status/1802031639792632297#m |
|
can confirm, I just rolled fresh deployment of d7ca353 with no guest accounts and it is working. I will have my users use this instance and see wat kind of limitations they've imposed and report back search doesn't work, media tab doesn't load but going to specific users timelines, viewing their threads/comments are all fine. edit: seems like they've reverted it again |
|
funny timing, I was also looking into why my code from a year ago was suddenly working lmfao |
|
Does this mean Nitter is back? |
i can't claim to know what they were or are doing, but i was able to spawn a pre-guest account branch version of nitter and have it work reliably. here's hoping maybe they fully reverse their choice but at this time nitter is still "dead" |
|
Looks like they switched back on the old v1.1 timeline endpoint? It's working on a bunch of old instances right now... |
|
it looks like they fucked up something again. my old instance, running 2023.08.08-d7ca353, suddenly started working again. |
this is definitely what's happened, they've turned it on and off randomly over the past several days so i was hesitant to comment. i don't want to speculate but I also don't believe they'd accidentally enable it again. bait, maybe? |
|
Now 'all' that needs to be done is to get video playback working again. TwiXXer videos can be played using 'any' media player, e.g. ...shows you some knuckleheaded adherents of the climate doom cult putting their tribal mark on the Stonehenge monument. The solution should be fairly simple, I might even have a look at it myself once times arrives. |
There is no need to mocking people's irrational action to a grim future. |
|
Looks like v1.1 API opening up was temporary Most instances are back to being down. |
All this great stuff I'm missing because I'm not on Twitter... |
The future is quite bright as long as groups like JSO/XR (UK), Letzte Generation (Germany), Återställ Våtmarken (Sweden) and similar cookie-cutter activist collectivists are held in check. These are doom cults which, in contrast to the doom preachers of yore with their sandwich signs 'The End is Nigh, Repent!' who only rang their bells and shouted a bit use destructive tactics to proclaim their quasi-religious zeal. They do not help whatever cause they claim to adhere to, if anything they achieve the opposite. If you believe in the climate scare - which I do not - and want society to move away from fossil fuels - which I do, just not for those reasons - you should not root for these groups. As to that 'grim future', do you have children? Do you think it a good idea to tell them they won't live to reach the age where they might have children themselves? Read up on how children react to that type of rhetoric, especially younger children. They actually believe it word for word, abject nonsense though it is. What a way to portray, nay destroy their future. So yes, call those numbskulls for what they are.
|


The text was updated successfully, but these errors were encountered: