Twint doesn't get all followers list #340

mmosleh · 2019-01-29T04:26:16Z

It seems Twint doesn't get the list of all followers for accounts with large number of followers and stoped abruptly at some random number. For example tried twint -u nasa --followers and each time the script stopped at some random number with few thousand screen_names.

Python version is 3.6;
Updated Twint with pip3 install --upgrade -e git+https://github.com/twintproject/twint.git@origin/master#egg=twint;
I have searched the issues and there are no duplicates of this issue/question/request.

The text was updated successfully, but these errors were encountered:

pielco11 · 2019-01-29T12:40:56Z

It could be possible that Twitter stops returning new entities because you (as everyone else) in that case requested too many queries.

mmosleh · 2019-01-29T13:20:30Z

Thanks @pielco11 , I was wondering what is the way around it. e.g., If it raises an error and it can continue pulling the information from that point. or controlling the requests within some limit so that doesn't happen. thanks

pielco11 · 2019-01-29T21:53:47Z

I'll look deeper (can't determine a dead-line), for now I can say that the issue does not seem to have an unique pattern. I
tried a couple of queries and got a long list, ~100k users. Plus when one stopped I started a new one, and this lasted a long. So I guess that's not Twitter that's blocking you, for what I tested I think that using a VPN will not get you around the issue.

A solution could be to re-try the query when it fails, anyway the code should be changed after a deeper look at what-is-going-on

mmosleh · 2019-02-04T16:15:20Z

Thank you @pielco11 !

castrovictor · 2019-02-09T12:38:41Z

I'll look deeper (can't determine a dead-line), for now I can say that the issue does not seem to have an unique pattern. I
tried a couple of queries and got a long list, ~100k users. Plus when one stopped I started a new one, and this lasted a long. So I guess that's not Twitter that's blocking you, for what I tested I think that using a VPN will not get you around the issue.

A solution could be to re-try the query when it fails, anyway the code should be changed after a deeper look at what-is-going-on

Hi, first of all, thanks for doing such amazing tool and public the code, I am sure I will learn a lot from your work. I tried getting a long list, ~130k and it stopped in random number of followers in each query.
On the other hand, I am making a script to get all tweet links of a user, because I think your tool does not do it. Without log in and without using the API, but in some way, after doing a lot of querys (with my script), twitter blocks your user's tweets search. Using an VPN, the problem was solved. This is just to give you some information.

Finally, If you have a PayPal account, I would like to buy you a coffe for posting the source code, because as I said, I would like to learn who do you did the tool, which would be impossible without the source code.

pielco11 · 2019-02-09T13:05:07Z

~130k followers are a lot so Twitter might be blocking requests at a random time

For the second point, that's why Twitter blocks an IP if it makes too many requests, that's why using a VPN solves the problem.

What we could try is handling that "followers count" issue and ask the user to change the IP and then retry the query, and see if this solves the issue.

Unfortunately I do not have enough time to solve every issue, so the patch will be delayed. Every kind of help in the development is widely accepted

pielco11 · 2019-02-10T13:28:48Z

@mmosleh Here is what's going on

In the first case there is a show more, Twint extracts that link and does a new request. Then that button vanishes so Twint is not able to make a new request.
If I get the last cursor-id and make a new request changing the IP and stuff, nothing changes

I think that we found the origin of the issue and sadly we can't do anything, at least for now

mmosleh · 2019-02-11T19:44:41Z

@pielco11 I made a quick dirty patch into the previous version of Twint (the one with a single file). Just few retrial on the last curser-id when receive the error massage. I managed to download all 32M NASA followers this way. (I'm not familiar the code base on the new version though)

pielco11 · 2019-02-11T19:52:59Z

@mmosleh oh, nice... may you provide me the commit id? git rev-parse HEAD

castrovictor · 2019-03-03T14:52:52Z

@mmosleh oh, nice... may you provide me the commit id? git rev-parse HEAD

So, was the update uploaded? Is it possible to download a large list of followers? as @mmosleh managed to do

pielco11 · 2019-03-26T18:17:55Z

Adding timeout seems to solve the issue.

Without timeouts I'm able to get upto 40 followers/following, adding time.sleep(3) to line 161 in twint/get.py allows me to get upto 440 followers/following

KrisM-tor · 2019-04-25T02:51:35Z

In the current iteration of get.py - has this issue been resolved? I'm not seeing the time.sleep(3) line within the script

thanks once again!

pielco11 · 2019-04-25T10:58:32Z

@KristopherMakuch I did not apply that "patch" since I'm not sure that's a patch. More testing is needed, everyone is welcome to find a workaround

Matiusco · 2019-06-14T17:03:43Z

how do I include a control file to know on which page it stopped?

example:
twint -u username --followers -o username_followers.txt username_page.txt -t 3 -r 15

username_page.txt = file with last id page followers.
-t = 3 (Time elipse for new page followers)
-r = 15 (time random ofr new page followers)
the time to go to the next page to find followers should be the sum of t + r. R will always be random and can be 2, 3 or 15. So time will vary.

If processing is interrupted, it may after a while try to run again and will continue from the followers page according to the id of the page in file.

My original command:
twint -u username --followers -o username_followers.txt

my error today:
CRITICAL:root:twint.feed:Follow:IndexError

file with 1036 followers, but this profile have 3.800 followers.

Thanks again for all work.
Congratz

Sorry my poor english....
my first language is portugues.

pielco11 · 2019-06-15T09:21:42Z

Hi @Matiusco

to add some timeouts you have just to add a line as descried above

You could resume the scrape with something like twint -u username --followers -o user_followers.txt --resume username_followers_resume.txt

When Twint will stop (most probably because Twitter does not return more data) you will have just to re-run the command to resume from where it stopped

Matiusco · 2019-06-17T11:58:50Z

hi @pielco11
Thanks for informing.
I will try to perform this operation.

edited report
yes, work perfect now.
resume.txt is [] if finished. ;)

Matiusco · 2019-06-19T14:20:28Z

I can not get all followers.
Sometimes reaching up to 15,000 others ends in 9,000, but the resume file is empty [].

my command in terminal:

twint -u zehdeabreu --followers -o user_followers.txt --resume zehdeabreu_followers_resume.txt -t 15 -l 50

-t 15 and -l 50 not working...

How could I do inside a python file to control the time of each request to give a much longer time between requests?

=====
max followers at moment is:

wc -l user_followers.txt

16470 user_followers.txt

tks for all help

pielco11 · 2019-06-19T21:09:37Z

-t is not implemented, yet (at least); -l is for the lang, --limit is for the limit. If you want to control the time for each request, you have to play with get.py

Your query should be something like twint -u zehdeabreu --followers -o user_followers.txt --resume zehdeabreu_followers_resume.txt --limit 60

I also tried the resume option and it works fine

Matiusco · 2019-06-20T10:51:03Z

tks @pielco11 , I'll try

mmosleh · 2019-06-20T10:55:57Z

I had similar issue. The problem seems to be sometimes the "more" button doesn't appear on Twitter end and Twint assumes that it has reached the end of the list of followers. However, if it tries and sends another request, the button might be available to the scraper ... .. So once quick fix could be just get the number of followers first then keep retrying till the number of followers matches.

…

On Thu, Jun 20, 2019 at 12:51 PM Matiusco ***@***.***> wrote: tks @pielco11 <https://github.com/pielco11> , I'll try — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#340?email_source=notifications&email_token=AFIVAC5GAVFM5FJCHCHIFY3P3NOJ3A5CNFSM4GS5C642YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYFBXIQ#issuecomment-503978914>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFIVAC6RLTX7EKFIHGCYJQTP3NOJ3ANCNFSM4GS5C64Q> .

Matiusco · 2019-06-21T15:30:59Z

ok @mmosleh , but I do not know how I could change this part in code get.py

I still can not get all the followers.

nxhuy-github · 2019-06-26T09:42:09Z

How can I get the IDs of the followers instead of the username, please? Thank you

pielco11 · 2019-06-26T14:46:46Z

@nxhuy-github please write comments about the topic of the issue. Anyway you can do that using .Lookup as showed in the wiki

datduong · 2019-10-02T01:16:47Z

Hi, what is the current status for the code that retrieves all the followers of one person? I am still having the problem that only a subset of followers is downloaded. I am using the command
twint -u SpeakerPelosi --followers but unable to get all 3 millions followers (my result is only about 30k users). I saw that line 161 has a timeout. Would increasing this timeout help ?

pielco11 · 2019-10-02T10:21:16Z

@datduong Twitter works effectively to not allow Twint to get all the followers, I highly suggest you to use the API

yuiseki · 2019-11-15T13:01:43Z

Hi. I faced same issue now.
I've trial and error so many times, And perhaps I found some workaround of this issue.

My found is this:

For example, twint -u nasa --following --resume nasa_following_resume.txt --limit 60 is basically works well.
When repeating above command in short period, We got CRITICAL:root:twint.feed:Follow:IndexError
But after waiting of several secounds, We can resume above command once again.
Wait and Resume above command, I can perfectly collect hundreds followings.

My Proposal is this:

twint command should add command line args like --wait-random 120, for example.
When twint faced CRITICAL:root:twint.feed:Follow:IndexError, twint should wait random seconds and try again the command.
For final ideal command is like this: twint -u nasa --following --wait-random 120.
- --resume filename is should automatically determine or store only in memory.
- --limit 60 is should determine appropriate default value.

pielco11 added the unknown label Jan 29, 2019

pielco11 mentioned this issue Feb 9, 2019

missing favorites and error occurred - list index out of range [x] feed.mobile #338

Closed

pielco11 closed this as completed Feb 10, 2019

pielco11 added Twitter Flaw and removed unknown labels Feb 10, 2019

pielco11 reopened this Feb 11, 2019

pielco11 mentioned this issue May 3, 2019

CRITICAL:root:twint.feed:Follow:IndexError #409

Closed

3 tasks

pielco11 pinned this issue May 26, 2019

rlleshi mentioned this issue Dec 17, 2019

Twint returning exceptions and timing out after --follower list pull #612

Closed

yuis-ice mentioned this issue May 26, 2021

CRITICAL:root:twint.feed:Follow:IndexError #1206

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Twint doesn't get all followers list #340

Twint doesn't get all followers list #340

mmosleh commented Jan 29, 2019

pielco11 commented Jan 29, 2019

mmosleh commented Jan 29, 2019

pielco11 commented Jan 29, 2019

mmosleh commented Feb 4, 2019

castrovictor commented Feb 9, 2019 •

edited

Loading

pielco11 commented Feb 9, 2019

pielco11 commented Feb 10, 2019

mmosleh commented Feb 11, 2019

pielco11 commented Feb 11, 2019

castrovictor commented Mar 3, 2019

pielco11 commented Mar 26, 2019

KrisM-tor commented Apr 25, 2019

pielco11 commented Apr 25, 2019

Matiusco commented Jun 14, 2019 •

edited

Loading

pielco11 commented Jun 15, 2019

Matiusco commented Jun 17, 2019 •

edited

Loading

Matiusco commented Jun 19, 2019 •

edited

Loading

pielco11 commented Jun 19, 2019

Matiusco commented Jun 20, 2019

mmosleh commented Jun 20, 2019 via email

Matiusco commented Jun 21, 2019

nxhuy-github commented Jun 26, 2019

pielco11 commented Jun 26, 2019

datduong commented Oct 2, 2019

pielco11 commented Oct 2, 2019

yuiseki commented Nov 15, 2019 •

edited

Loading

Twint doesn't get all followers list #340

Twint doesn't get all followers list #340

Comments

mmosleh commented Jan 29, 2019

pielco11 commented Jan 29, 2019

mmosleh commented Jan 29, 2019

pielco11 commented Jan 29, 2019

mmosleh commented Feb 4, 2019

castrovictor commented Feb 9, 2019 • edited Loading

pielco11 commented Feb 9, 2019

pielco11 commented Feb 10, 2019

mmosleh commented Feb 11, 2019

pielco11 commented Feb 11, 2019

castrovictor commented Mar 3, 2019

pielco11 commented Mar 26, 2019

KrisM-tor commented Apr 25, 2019

pielco11 commented Apr 25, 2019

Matiusco commented Jun 14, 2019 • edited Loading

pielco11 commented Jun 15, 2019

Matiusco commented Jun 17, 2019 • edited Loading

Matiusco commented Jun 19, 2019 • edited Loading

16470 user_followers.txt

pielco11 commented Jun 19, 2019

Matiusco commented Jun 20, 2019

mmosleh commented Jun 20, 2019 via email

Matiusco commented Jun 21, 2019

nxhuy-github commented Jun 26, 2019

pielco11 commented Jun 26, 2019

datduong commented Oct 2, 2019

pielco11 commented Oct 2, 2019

yuiseki commented Nov 15, 2019 • edited Loading

My found is this:

My Proposal is this:

castrovictor commented Feb 9, 2019 •

edited

Loading

Matiusco commented Jun 14, 2019 •

edited

Loading

Matiusco commented Jun 17, 2019 •

edited

Loading

Matiusco commented Jun 19, 2019 •

edited

Loading

yuiseki commented Nov 15, 2019 •

edited

Loading