Skip to content
This repository has been archived by the owner on Mar 30, 2023. It is now read-only.

Number of Extracted Tweets #106

Closed
DavidPerea opened this issue May 22, 2018 · 27 comments
Closed

Number of Extracted Tweets #106

DavidPerea opened this issue May 22, 2018 · 27 comments

Comments

@DavidPerea
Copy link

Initial Check

I have made sure to check the following.

[] Python version is 3.6.2
[] Using the latest version of Twint.

Command

I run this command to get the total of his published tweets.

python Twint.py -u malaga -o file.csv --csv

Description of Issue

I hope this question is not duplicated. I have reviewed the closed questions and I have not seen one related to this topic directly.

I intend to extract all the tweets from a user. However, I can not extract the total number of tweets that the user has published. Also every time I execute the code I get a different tweets number.

Why does this happen?

Why is it that the number of extracted tweets varies?

OS Details

I am using Windows10.

@pielco11
Copy link
Member

Same problem here, sometimes it stops at just 10 tweets and sometimes it goes to end.

It seems to be a problem with Twitter so I suggest you to try over time, this works for me.

I don't see a specific pattern of the problem at every try so it's quite hard for me to dig further into the problem, but if you get this feel free to let us know!

@haccer
Copy link
Member

haccer commented May 23, 2018

I am seeing the issue,

I will look into this further

@haccer
Copy link
Member

haccer commented May 24, 2018

Weird,

Seems right now I cannot collect anything past 2018-05-14

@DavidPerea
Copy link
Author

So is.
Neither I can get more tweets from that date. I do not know why. It's strange.

@haccer
Copy link
Member

haccer commented May 24, 2018

Okay, I've just fixed this in the recent commit. I'll update the PyPI later today after I add some more stuff

@haccer
Copy link
Member

haccer commented May 24, 2018

I've tested with python Twint.py -u malaga -o file.csv --csv and everything seems to be running back to normal. Thanks @DavidPerea for reporting this!

@DavidPerea
Copy link
Author

Thank you very much @haccer

But should I make any changes?
Is that I run the command again and it happens the same. I can not get tweets from May 2018.

@haccer
Copy link
Member

haccer commented May 24, 2018

Did you update the repo?

@haccer
Copy link
Member

haccer commented May 24, 2018

I uploaded changes to pypi, so pip install twint --upgrade should suffice to update

@DavidPerea
Copy link
Author

True. Now that works.
I had not used that command to update the repo.
Thank you very much really. You are a genius.

@DavidPerea
Copy link
Author

Again, with the new updates, I can not extract all the tweets. It only allows me to extract about 3000 tweets.
Did you know why?

@DavidPerea
Copy link
Author

Sorry to return to this topic. But I'm going to detail some more data, to see if it is possible to locate the error.

It's curiosity that when I use the command --profile-full I do not get all the tweets. It depends the user gets a number of different tweets, but all are around 3000 tweets. However, I try the same user at different times and always get the same number of tweets. So I think it may be something from the code, may not it?

Can you get all the tweets with the command --profile-full?

@haccer
Copy link
Member

haccer commented Jun 5, 2018

Hi @DavidPerea

Can you provide the user so I may take a better look?

@DavidPerea
Copy link
Author

The problem is that there are many tweets, about 5900 tweets and you will have to wait a long time. I'm embarrassed that you have to wait so long.
But, for example, is this user AytoSanFernando:

python Twint.py -u AytoSanFernando --profile-full -o file.csv --csv

Without --profile-full I get them until the end, but since I'm interested in getting the retweets I use the --profile-ful and in this case I only get 3175 tweets

@haccer
Copy link
Member

haccer commented Jun 5, 2018

Yes it is quite a long time lol @DavidPerea , I'll take a look later tonight or maybe this afternoon

@haccer haccer self-assigned this Jun 5, 2018
@haccer haccer added the Triaged label Jun 5, 2018
@haccer haccer reopened this Jun 5, 2018
@DavidPerea
Copy link
Author

Of course, no problem. When you have time.

I tell you better another user, AytoHuelva, who has fewer tweets, 4600. However, I only get 3115 tweets.

python Twint.py -u AytoHuelva --profile-full -o file.csv --csv

All this happens to me since the last update of the package.

You already communicate to me whether it has been solved or not. Thank you very much!!

@Nestor75
Copy link
Contributor

Nestor75 commented Jun 5, 2018

I faced the same issue also

@haccer
Copy link
Member

haccer commented Jun 5, 2018

@DavidPerea A quick thought,

3115 and 3175 are very close, perhaps my second method for grabbing retweets is limited to around that number; I haven't confirmed this yet, I will later though.

Question:
Does the last Tweet you grab with the --profile-full match the last Tweet with the regular option that utilizes Twitter's search? (i.e. their earliest Tweet)

@DavidPerea
Copy link
Author

DavidPerea commented Jun 5, 2018

The strange thing is that before making the last update, I could get all the tweets with --profile-full For that my doubt. So I doubt that this second method is limited to that number. I do not know, I'm a little confused.

And excuse me @haccer , you could repeat the question with other words. I do not understand it well.

@DavidPerea
Copy link
Author

Thinking well, you may be right @haccer . Because, I'm not sure if before with -profile-full, I could get all the tweets. So maybe with this second method if that limit exists.

@Nestor75
Copy link
Contributor

Nestor75 commented Jun 5, 2018

I am running the query but is quite large so I will let you know if it works as soon as it finishes

@Nestor75
Copy link
Contributor

Nestor75 commented Jun 5, 2018

I still face problems getting the tweets :( there are many left https://github.com/haccer/twint/issues/141#issuecomment-394861498

@haccer
Copy link
Member

haccer commented Jun 7, 2018

Okay I looked at this tonight.

Number of Tweets advertised on the profile: 4,662

Test 1

  • Command Ran: python3 Twint.py -u aytohuelva -o file.csv --csv
  • Total Tweets Collected: 4,283
  • Final Tweet: 565119429986770944 2015-02-10 12:05:46 UTC <AytoHuelva> El jueves vive el Día de los Enamorados con el #concierto de la Banda. 20 h. Casa Colón, #Huelva gratis @BSMHuelva pic.twitter.com/Jp2hS4Mdt9

Test 2

  • Command Ran: python3 Twint.py -u aytohuelva -o file.csv --csv --profile-full
  • Total Tweets Collected: 3,124
  • Final Tweet: 700258959714873348 2016-02-18 05:02:00 EDT RT <detreslados> @AytoHuelva Jueves 18, 9:00PM, Gran Teatro Huelva, Delbosque, por tan solos 5€. ¡No lo dudes y ven a conocerlos! https://www.youtube.com/watch?v=qGmXwMBNmg4 … @aytoHuelva

Test 3

  • Command Ran: python3 Twint.py -u aytohuelva -o file.csv --csv --retweets
  • Total Tweets Collected: 1,020
  • Final Tweet: 932945666078277632 2017-11-21 12:15:54 UTC <AytoHuelva> Abrimos el plazo de recogida de solicitudes para la participación en un nuevo proyecto de inserción juvenil para menores de 30 inscritos en el Fichero de Garantía Juvenil. Toda la información en el enlace: https://goo.gl/H7TNhy  pic.twitter.com/eyZpB1hMX1

I'm going to make some adjustments to the code tonight and tomorrow, we'll see if these numbers change

@DavidPerea
Copy link
Author

Perfect, when you finish these adjustments, let me know and we will confirm if the numbers of the tweets change with respect to that test.

As always thank you very much.

@haccer
Copy link
Member

haccer commented Jun 8, 2018

An update @DavidPerea,

I suspect this issue is very similar to #141, in which the reason the script is stopping early is because Twitter has hit a request limit and giving a 503 error... I'm rerunning the command with the requests being logged so I can see the last request when the script stops working. I should have more information about this tomorrow.

@DavidPerea
Copy link
Author

Okay, it seems that a possible cause has already been found. Please, comment on the new information you have about this.

@haccer
Copy link
Member

haccer commented Jun 20, 2018

Sorry for the delay, I've been quite busy the past week.

tl;dr There's no way to resolve this, Twitter has imposed a limit.

--

When requesting with the last ID there are no Tweets in the response.

From my URL debug logs:

https://mobile.twitter.com/aytohuelva?lang=en&max_id=715174184243105791
https://mobile.twitter.com/aytohuelva?lang=en&max_id=713660578628313087
https://mobile.twitter.com/aytohuelva?lang=en&max_id=710511190422773759 <-- Last request

I then constructed a --resume option for the profile:

➜ twint-dev python3 Twint.py -u aytohuelva -o file.csv --csv --profile-full --debug --resume 710511190422773759

and it always receives this HTML response (no Tweets inside):

➜  twint-dev cat twint-last-request.log
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//WAPFORUM//DTD XHTML Mobile 1.1//EN" "http://www.openmobilealliance.org/tech/DTD/xhtml-mobile11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
        <meta name="HandheldFriendly" content="True" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=0"/>
    <link rel="canonical" href="https://twitter.com/aytohuelva">
    <meta name="twitter-redirect-url" content="twitter://user?screen_name=aytohuelva"/>
    <meta name="twitter-redirect-srcs" content="{&quot;pwreset-iphone&quot;:true,&quot;android&quot;:true,&quot;email&quot;:true}"/>
    <link href="https://ma.twimg.com/twitter-mobile/8f3445bd0e5eb63b939e25a6ff29981d947a4a51/images/favicon.ico" rel="icon" type="image/x-icon" />
    <title>Ayuntamiento Huelva (@AytoHuelva) on Twitter</title>
      <link href="https://ma.twimg.com/twitter-mobile/8f3445bd0e5eb63b939e25a6ff29981d947a4a51/assets/a.css" inline="false" media="screen" rel="stylesheet" type="text/css" />
    <script src="https://ma.twimg.com/twitter-mobile/8f3445bd0e5eb63b939e25a6ff29981d947a4a51/javascripts/framebust.js" type="text/javascript"></script>
    <meta name="google-site-verification" content="V0yIS0Ec_o3Ii9KThrCoMCkwTYMMJ_JYx_RSaGhFYvw" />
    <meta name="deciders" content="{&quot;m2_mmw_scribe_get_url&quot;:true}" />
  </head>
  <body class="images nojs users-page users-show-page">
    <div id="container">


        <div id="brand_bar">
  <table id="top">
    <tr>
      <td class="left">
        <a href="/" class="brandmark">
              <img alt="Twitter" height="28"  src="https://ma.twimg.com/twitter-mobile/8f3445bd0e5eb63b939e25a6ff29981d947a4a51/images/sprites/larry_28px.gif">


        </a>
      </td>
      <td class="right">
            <img alt="|" class="divider" height="28" src="https://ma.twimg.com/twitter-mobile/8f3445bd0e5eb63b939e25a6ff29981d947a4a51/images/brandbar_divider.gif" />
          <a class="search" href="#search">
                <img alt="Search" height="28"  src="https://ma.twimg.com/twitter-mobile/8f3445bd0e5eb63b939e25a6ff29981d947a4a51/images/sprites/search_28px.gif">


          </a>
            <img alt="|" class="divider" height="28" src="https://ma.twimg.com/twitter-mobile/8f3445bd0e5eb63b939e25a6ff29981d947a4a51/images/brandbar_divider.gif" />
          <a class="signin" href="/session/new"><span>Log in</span></a>
            <img alt="|" class="divider" height="28" src="https://ma.twimg.com/twitter-mobile/8f3445bd0e5eb63b939e25a6ff29981d947a4a51/images/brandbar_divider.gif" />
          <a href="/signup"><span class="signup">Sign up</span></a>
      </td>
    </tr>
  </table>
</div>




      <div id="main_content">

      <div class="profile">
        <table class="profile-details">
  <tr>
      <td class="avatar">
        <img alt="Ayuntamiento Huelva" src="https://pbs.twimg.com/profile_images/890511236533825536/5mgw8tfP_normal.jpg" />
      </td>
      <td class="user-info">
        <div class="fullname">Ayuntamiento Huelva
        </div>
        <div class="username">
          <span>@</span>
          <span class="screen-name">AytoHuelva</span>
        </div>
        <div class="location">Huelva</div>
      </td>
  </tr>
  <tr>
    <td class="details" colspan="2">
      <div class="bio">
        <div class="dir-ltr" dir="ltr">
          Cuenta oficial del Ayuntamiento de Huelva.
        </div>
      </div>
      <div class="url">
        <div class="dir-ltr">
          <a href="http://t.co/paMajpJDSk" data-url="huelva.es" class="twitter-timeline-link activeLink dir-ltr tco-link"
              dir="ltr" rel="nofollow" target="_blank">huelva.es</a>
        </div>
      </div>
    </td>
  </tr>
</table>

        <table class="profile-stats">
  <tr>
    <td class="stat">
      <div class="statnum">4,730</div>
      <div class="statlabel"> Tweets </div>
    </td>
    <td class="stat">
        <a href="/AytoHuelva/following">
          <div class="statnum">333</div>
          <div class="statlabel"> Following </div>
        </a>
    </td>
    <td class="stat stat-last">
        <a href="/AytoHuelva/followers">
          <div class="statnum">5,724</div>
          <div class="statlabel"> Followers </div>
        </a>
    </td>
  </tr>
</table>

        <div class="profile-actions">
        <form action="/i/guest/follow/AytoHuelva" method="post">
            <span class="m2-auth-token">
    <input name="authenticity_token" type="hidden" value="259db82899551d8f4a7a49c4bd2243517335eb14"/>
  </span>

          <span class="w-button-common w-button">
            <input name="commit" type="submit" value="Follow">
          </span>
        </form>
      <form action="/AytoHuelva/actions" method="get">

        <span class="w-button-common w-button">
          <input name="commit" type="submit" value="•••">
        </span>
      </form>
</div>

      </div>

        <div class="w-mediaonebox">
    <table>
      <tr>
        <td style="width: 73px;">
          <a href="/AytoHuelva/media/grid?idx=0"><img src="https://pbs.twimg.com/media/DgDdTP9X4AE3v3O.jpg:thumb" width="73" height="78"/></a>
        </td>
        <td style="width: 55px;">
          <a href="/AytoHuelva/media/grid?idx=1"><img src="https://pbs.twimg.com/media/DgDChdiXUAATyNi.jpg:thumb" width="55" height="78"/></a>
        </td>
        <td>&nbsp;</td>
      </tr>
    </table>
    <div class="see-more">
      <a href="/AytoHuelva/media/grid">View more photos</a>
    </div>
</div>



      </div>
      <div id="footer">
    <div class="search-fields">
    <div class="title">
      <label for="q">Enter a topic, @name, or fullname</label>
    </div>
    <form action="/search" class="search-input" method="get">
    <table>
      <tr>
        <td class="value" id="search"><div><input id="q" name="q" type="text" value=""/></div></td>
        <td class="button">
          <input type="hidden" name="s" value="typd" />
          <input type="image" src="https://ma.twimg.com/twitter-mobile/8f3445bd0e5eb63b939e25a6ff29981d947a4a51/images/sprites/magnifying_glass.gif" alt="Search"/>
        </td>
      </tr>
    </table>
    </form>
</div>

    <table class="global-actions">
      <tr>
        <td><a href="/settings">Settings</a></td>
        <td><a href="https://support.twitter.com/"> Help</a></td>
      </tr>
    </table>
    <div class="view-actions"><a href="#top"> Back to top</a> &middot; <a href="/settings/profile_images">Turn images off</a></div>
</div>

    </div>
      <script id="scribe-configuration" type="application/json">{"page":"profile"}</script>
      <script src="https://ma.twimg.com/twitter-mobile/8f3445bd0e5eb63b939e25a6ff29981d947a4a51/assets/m2_tweets.js" type="text/javascript"></script>
      <img src="/i/anonymize?data=%5B%7B%22integration%22%3A%22ga%22%2C%22ref%22%3A%22%22%2C%22mobileMetricsToken%22%3A%22152948269839453175%22%7D%5D" height="0" width="0" style="opacity: 0">

  </body>
</html>

In conclusion the limitation is: 160 Pages/Requests

@haccer haccer closed this as completed Jun 20, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Development

No branches or pull requests

4 participants