Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweepy not getting full text #935

Closed
manoelhortaribeiro opened this issue Sep 27, 2017 · 37 comments
Closed

Tweepy not getting full text #935

manoelhortaribeiro opened this issue Sep 27, 2017 · 37 comments
Labels
API This is regarding Twitter's API Invalid This is not valid Question This is a question Stale This is inactive, outdated, too old, or no longer applicable

Comments

@manoelhortaribeiro
Copy link

Hey guys, just a heads up that currently the library is not getting the new tweets. According to the following example:

https://developer.twitter.com/en/docs/tweets/tweet-updates#chars280

Notice that currently tweepy is retrieving only 140 chars and then adding "..." at the end. Any of you guys now if there is any easy fix for that? Or is this a problem from Twitter?

@jaycech3n
Copy link

jaycech3n commented Oct 1, 2017

If the tweet has 280 chars the JSON returned by Twitter will contain the extended_tweet property, which is a dictionary containing the untruncated tweet under the key full_text.

More details on the extended tweet API here: https://developer.twitter.com/en/docs/tweets/tweet-updates

@manoelhortaribeiro
Copy link
Author

@jaycech3n hey, you're correct, thank you for that. I assumed I was already using tweet_mode='extended' but I was not. For other people the way to go is:

a = api.get_status(912886007451676672, tweet_mode='extended')

Cheers

@trtm
Copy link

trtm commented Nov 21, 2017

Hello,
How would I do this for the Stream class?

from tweepy import Stream
stream = Stream(auth, l)
stream.filter( track=['my_search'] )

is there also a tweet_mode='extended'?
Many thanks in advance!

@kodeine
Copy link

kodeine commented Nov 21, 2017

also interested in knowing.

@Flyer4109
Copy link

Hello,

For the search API would you do the same?

example: api.search(q="query", tweet_mode="extended")

Thanks.

@vism2889
Copy link

vism2889 commented Jan 12, 2018

Hello,
am having a similar issue with not grabbing a full tweet when using the api.user_timeline method. Was wondering if anyone could help me out with it? Tweepy code is below:

twitter_name = input("Enter a Twitter Handle: ")
for name in twitter_name:
    try:
        stuff = api.user_timeline(screen_name = twitter_name, count = 100, include_rts = False)
    except tweepy.TweepError as e:
        twitter_name = input("The handle you entered was invalid, please try again: ")
    catchcount = 0

Returns the tweet followed by '...' then a link. Ideally would like to return the full tweet. Is there a way to do that while still using the api.user_timeline method? Or another method that allows specified tweet count and returns a list of Status objects without specifying a tweet ID number? like in the api.get_status method mentioned above but for returning 100's or more tweets at a time.
Much appreciated!!

@varadhbhatnagar
Copy link

I tried using the tweet_mode="extended" add on as show in the code but still there are a couple of tweets getting truncated.

Code :

auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
api = tweepy.API(auth)


class MyStreamListener(tweepy.StreamListener):

    def on_status(self, status):
        print(status.text)

myStreamListener = MyStreamListener()
myStream = tweepy.Stream(auth = api.auth, listener=myStreamListener , tweet_mode='extended')

 
myStream.filter(track=['#bitcoin'])

Any idea why this is happening ?

Output :

RT @PundiXLabs: We are now having 10,000 followers in our #Telegram group. Thank you for joining and support! 🎉Photo by @NASA #pundix #bitc…
RT @CryptocoinAPI: GoUrl.(Myip.ms) Recommended Plugin for Google Chrome - Websites #Whois / #Reputation Plugin  - https://t.co/kcJmjFXEYF
#…
RT @WhalePanda: Huge spike in #Bitcoin hashrate. Spikes happen from time to time but this one is huge. Looks like a lot of hashrate came on…
Ford Pinto it is... https://t.co/1V9f6T7BoU #btc #bitcoin https://t.co/r4X125MlGi

@vism2889
Copy link

@varadhbhatnagar change all instances of status.text to status.full_text. The 'extended' tweet mode returns a different or additional object called full_text (with the entire tweet) but still allows 'text' to work (only returning truncated tweet). Hope this was helpful, just had this issue a few days back! see #988

@varadhbhatnagar
Copy link

@vism2889 Also , I am getting rate limited error 420 too often. Is there a way I can make that go away?

@vism2889
Copy link

vism2889 commented Jan 18, 2018

@varadhbhatnagar So theres some info regarding handling errors on the bottom of this linked page http://docs.tweepy.org/en/v3.5.0/streaming_how_to.html , but it looks like it will disconnect your stream which i don't think you want. More info regarding rate limiting https://developer.twitter.com/en/docs/basics/rate-limiting.html . Do you have anymore code than what you've posted above? If so please post it.

@varadhbhatnagar
Copy link

varadhbhatnagar commented Jan 18, 2018

@vism2889 I read the link and handled the rate limiting issues pretty well . Also , I tried using full_text and here is the error that I am getting : AttributeError: 'Status' object has no attribute 'full_text'

Here is the complete code now :

auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
api = tweepy.API(auth)


class MyStreamListener(tweepy.StreamListener):

    def on_status(self, status):

	with open('fetched_tweets.txt','a') as tf:
            tf.write(status.full_text.encode('utf-8') + '\n')
	
        print(status.text)

    def on_error(self, status):
	print("Error Code : " + status)

    def test_rate_limit(api, wait=True, buffer=.1):
	    """
	    Tests whether the rate limit of the last request has been reached.
	    :param api: The `tweepy` api instance.
	    :param wait: A flag indicating whether to wait for the rate limit reset
		         if the rate limit has been reached.
	    :param buffer: A buffer time in seconds that is added on to the waiting
		           time as an extra safety margin.
	    :return: True if it is ok to proceed with the next request. False otherwise.
	    """
	    #Get the number of remaining requests
	    remaining = int(api.last_response.getheader('x-rate-limit-remaining'))
	    #Check if we have reached the limit
	    if remaining == 0:
		limit = int(api.last_response.getheader('x-rate-limit-limit'))
		reset = int(api.last_response.getheader('x-rate-limit-reset'))
		#Parse the UTC time
		reset = datetime.fromtimestamp(reset)
		#Let the user know we have reached the rate limit
		print "0 of {} requests remaining until {}.".format(limit, reset)

		if wait:
		    #Determine the delay and sleep
		    delay = (reset - datetime.now()).total_seconds() + buffer
		    print "Sleeping for {}s...".format(delay)
		    sleep(delay)
		    #We have waited for the rate limit reset. OK to proceed.
		    return True
		else:
		    #We have reached the rate limit. The user needs to handle the rate limit manually.
		    return False 

	    #We have not reached the rate limit
	    return True

myStreamListener = MyStreamListener()
myStream = tweepy.Stream(auth = api.auth, listener=myStreamListener , tweet_mode='extended')

 
myStream.filter(track=['#bitcoin'],async=True)

@vism2889
Copy link

@varadhbhatnagar hmm, personally havent used the streamlistener very much. You do have an instance of status.text still in this code right before def on_error . In my experience both text and full_text have worked when in 'extended' tweet_mode, but it might be worth trying to change that so you are only using the full_text attribute. Hope this helps... if it does/ or if you figure it out am definitely curious to know the outcome. @corycomer (hope its cool to tag you in this) any idea why this isn't working or trouble shooting advice?

@Flyer4109
Copy link

I have changed my tweet_mode to 'extended' and I have replaced all instances of '.text' to '.full_text' but there are still tweets that are being truncated.

My code:

tweets = api.search(q="#gameofthrones", lang="en", count=100, tweet_mode="extended")

for tweet in tweets:
    print("--------------------")
    print(tweet.full_text)
    print("--------------------\n")

Example result:


RT @WiCnet: Time for some news about filming on #GameofThrones season 8. We've got some intriguing new shots of the massive King's Landing…

@varadhbhatnagar

This comment has been minimized.

@Flyer4109
Copy link

@varadhbhatnagar is this function part of Tweepy or was it created by you?

@varadhbhatnagar
Copy link

@Flyer4109 . Hi. The on_data functionality is a part of tweepy. Since tweepy is poorly documented, I referred to this link and went through a couple of examples when I wanted to understand how on_data() worked.

@jaycech3n
Copy link

jaycech3n commented Jan 29, 2018

@varadhbhatnagar @vism2889 @trtm @kodeine Note that the developer documentation says that using tweet_mode=extended is only guaranteed to work for the standard REST API's.
The streaming API is not a REST API, and in this case I've found the best way to go is to get the tweet text directly from the JSON. See my earlier comment.

@jaycech3n
Copy link

@Flyer4109 There is no issue, in your case the ellipsis is an actual part of the tweet's full text, and not a truncation. This is the usual behavior for retweets.

@Flyer4109
Copy link

@jaycech3n Ah thank you, that has sorted it out

@AttributeErrorCat
Copy link

def get_all_tweets(screen_name):
#Twitter only allows access to a users most recent 3240 tweets with this method

import tweepy
import csv

consumer_key = ''
consumer_secret = "'
access_token = ''
access_token_secret = ''

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
auth.secure = True
api = tweepy.API(auth)

#initialize a list to hold all the tweepy Tweets
alltweets = []

#make initial request for most recent tweets (200 is the maximum allowed count)
new_tweets = api.user_timeline(screen_name = screen_name,count=340, include_rts=False)

#save most recent tweets
alltweets.extend(new_tweets)

#save the id of the oldest tweet less one
oldest = alltweets[-1].id - 1

#keep grabbing tweets until there are no tweets left to grab
while len(new_tweets) > 0:
	print "getting tweets before %s" % (oldest)

	#all subsiquent requests use the max_id param to prevent duplicates
	new_tweets = api.user_timeline(screen_name = screen_name,count=340,max_id=oldest,tweet_mode = 'extended')

	#save most recent tweets
	alltweets.extend(new_tweets)

	#update the id of the oldest tweet less one
	oldest = alltweets[-1].id - 1

	print "...%s tweets downloaded so far" % (len(alltweets))

#transform the tweepy tweets into a 2D array that will populate the csv
outtweets = [[tweet.id_str, tweet.created_at, tweet.text.encode("utf-8")] for tweet in alltweets]

#write the csv
with open('%s_nyttweet2.csv' % screen_name, 'wb') as f:
		writer = csv.writer(f)
		writer.writerow(["tweetid","date","text"])
		writer.writerows(outtweets)

from urllib import urlopen

pass

if name == 'main':
#pass in the username of the account you want to download
get_all_tweets("nytimes")

Hi guys - I'm new to python and I'm trying to stop my tweets from being truncated. I added an extended tweet mode but I get this error.

" AttributeError: 'Status' object has no attribute 'text'"
I can't find where I should change text to full.text.

Also, does anyone know the code to remove the URL from the output too? My tweets look like this:

"The Trump administration released a list of 210 people who were identified because of their "closeness to the Russi… https://t.co/5NmKPtQNrO"

THANK YOU!!!

@gabrer
Copy link

gabrer commented Feb 28, 2018

Even using the extended_tweet modality (suggested by @jaycech3n), I was getting truncated tweets (and strangely, the flag "truncated" was set to False).

Actually, the problem arises for "retweeted" tweets. So, to avoid this problem, you need to get the "full_text" from the "retweeted_status", when available:

full_text_retweeted = tweet_status_object._json.get("retweeted_status")

if None != full_text_retweeted:
      print(str( full_text_retweeted.get("full_text") ))
else:
      print(str( tweet_status_object._json["full_text"] ))

@clayadavis
Copy link

@gabrer, I'm able to get the full text in a streaming listener by using
status.extended_tweet['full_text']

Likewise, the (basic) entities in the full tweet are available from
status.extended_tweet['entities']

@mefryar
Copy link

mefryar commented Mar 24, 2018

For those who try status.extended_tweet['full_text'] while streaming and get
AttributeError: 'Status' object has no attribute 'extended_tweet', you can use:

class MyStreamListener(tweepy.StreamListener):
    def on_status(self, status):
        try:
            text = status.extended_tweet["full_text"]
        except AttributeError:
            text = status.text

Hat tip to this StackOverflow post.

@shaharyar2643
Copy link

I'm able to get full text in api.timeline but unable to get full text in api.statuses_lookup.
tweets = api.statuses_lookup(id_batch,tweet_mode = 'extended') for tweet in tweets: all_data.append(dict(tweet._json)) with open(output_file) as json_data: data = json.load(json_data) . . . for entry in data: t = { "created_at": entry["created_at"], "text": entry["full_text"].encode('utf-8-sig'),

It displays an error of unexpected keyword argument 'tweet_mode'

@numan619
Copy link

numan619 commented Apr 23, 2018

try this out. I guess you have to consider the retweeted_status attribute for retweets.
You have to use the try and except block for both the cases because status object has no attribute extended_tweet for tweets less than 140 characters long.

class Stream2Screen(tweepy.StreamListener):
	def on_status(self, status):
		if hasattr(status, 'retweeted_status'):
			try:
				tweet = status.retweeted_status.extended_tweet["full_text"]
			except:
				tweet = status.retweeted_status.text
		else:
			try:
				tweet = status.extended_tweet["full_text"]
			except AttributeError:
				tweet = status.text

@chaoswjz
Copy link

chaoswjz commented Oct 15, 2018

on_data(

Is there any way to get full text for RT in the api.search? I need to use this for retrieving past tweets (ex. tweets between 8 days ago and 2 days ago).

I used the mentioned method above for status..text convert to status.full_text but does not work.

I know that this works for on_data() but I also need it for the other situation.
@varadhbhatnagar

@wangcongcong123
Copy link

Hey guys, just a heads up that currently the library is not getting the new tweets. According to the following example:

https://developer.twitter.com/en/docs/tweets/tweet-updates#chars280

Notice that currently tweepy is retrieving only 140 chars and then adding "..." at the end. Any of you guys now if there is any easy fix for that? Or is this a problem from Twitter?

The following is the answer including processing for many different situations (no satisfying answers are avaliable online. I got it after many anonying hours of error and trial). Hope it helps.
status_json = status._json if "extended_tweet" in status_json: print(status_json['extended_tweet']['full_text']) elif 'retweeted_status' in status_json: if 'extended_tweet' in status_json['retweeted_status']: print(status_json['retweeted_status']['extended_tweet']['full_text']) else: print(status_json['text']) else: print(status_json['text'])

@tushar2899

This comment has been minimized.

@ottobricks
Copy link

Regarding the api.search() method, neither the use of tweet_mode='extended' nor compression=False seems to hinder the api from truncating text.
My full_text is still truncated in many occasions. Has anybody been able to figure this one out yet?

@fabianslife
Copy link

@varadhbhatnagar i am currently having the same Problem, and you seem to be the only to have ever solved it. But unfortunatly i cant get your code to work. what do i have to write into the on_data function to get even the full Retweets.

@ValnirJr
Copy link

@varadhbhatnagar i am currently having the same Problem, and you seem to be the only to have ever solved it. But unfortunatly i cant get your code to work. what do i have to write into the on_data function to get even the full Retweets.

And I'm having the same problem.... Any new ideas?

@fabianslife
Copy link

fabianslife commented May 17, 2019

@ValnirJr I solved it, will post my code later today or tomorrow. The thing is you need to access the tweet element in the json file. For Example:
Tweet=tweet.json
Text=tweet[text]
If tweet.truncated==True:
Text=tweet[full_text][text]
elif “retweeted_status” in tweet:
Text=tweet[retweeted_status][text]

@Harmon758
Copy link
Member

As Twitter’s Tweet updates documentation explains, Twitter's API only provides extended Tweets when using extended mode.

I've added a section in Tweepy's documentation that covers extended Tweets to supplement Twitter's documentation.
@trtm @kodeine @Flyer4109 @vism2889 @varadhbhatnagar @AttributeErrorCat @chaoswjz @wangcongcong123 @fabianslife @ValnirJr

@jaycech3n

If the tweet has 280 chars the JSON returned by Twitter will contain the extended_tweet property, which is a dictionary containing the untruncated tweet under the key full_text.

This is only true for streams.

@vism2889 @varadhbhatnagar

The 'extended' tweet mode returns a different or additional object called full_text (with the entire tweet) but still allows 'text' to work (only returning truncated tweet).

Extended mode returns the same type of Status object. It simply replaces the text attribute with a full_text attribute.

In my experience both text and full_text have worked when in 'extended' tweet_mode

This should not be the case. Extended Tweets will not have a text attribute.

@varadhbhatnagar

Also , I am getting rate limited error 420 too often. Is there a way I can make that go away?

This isn't relevant to this issue. Like you said, you're being rate limited by Twitter's API. This isn't an issue with Tweepy, and you can simply not hit the rate limit to not get rate limited.

Also , I tried using full_text and here is the error that I am getting : AttributeError: 'Status' object has no attribute 'full_text'

Status objects from streams use compatibility mode with an additional extended_tweet attribute/field rather than extended mode.

@varadhbhatnagar @Flyer4109

From my experience, the only way to get the complete tweet is to use the on_data() function of tweepy instead.

This is not the case. Not only is @Flyer4109 not using a stream (and therefore not using the data event handler), but the Status object provided to the Status event handler should be fully representative of the corresponding data provided to the data event handler.

The on_data functionality is a part of tweepy. Since tweepy is poorly documented,

The on_data event handler is covered in the section of the documentation regarding streaming, and PRs are welcome.

@jaycech3n @varadhbhatnagar @vism2889 @trtm @kodeine

The streaming API is not a REST API, and in this case I've found the best way to go is to get the tweet text directly from the JSON.

This shouldn't be necessary, as the Status object provided to the Status event handler should be fully representative of the corresponding data provided to the data event handler.

@jaycech3n @Flyer4109

There is no issue, in your case the ellipsis is an actual part of the tweet's full text, and not a truncation. This is the usual behavior for retweets.

This is not the case, as that is not the full text of the Retweeted Tweet. It is true however, that the text and full_text attributes for Retweets can be truncated and that this is normal for Retweets.

@AttributeErrorCat
As your traceback should indicate, this is occurring when you attempt to use tweet.text on the line where you set outtweets. When using extended mode, the text attribute is replaced by a full_text attribute, not a full attribute's text attribute.

Also, that is the normal text for a truncated Tweet. You can simply use the full, untrucated text instead.

For code block usage, see https://help.github.com/articles/creating-and-highlighting-code-blocks/.

@gabrer This is normal behavior for Retweets.

@shaharyar2643
This is a separate issue (#840), and should be resolved with c997ee7 (#926) as part of Tweepy v3.7.0.

For code block usage, see https://help.github.com/articles/creating-and-highlighting-code-blocks/.

@chaoswjz Yes, you can simply use extended mode. That is the only time the full_text attribute will be available in place of the text attribute. Without more information than "does not work", such as a traceback and/or any relevant code, it's impossible to determine what your issue is.

@wangcongcong123
Note, this can return a truncated Retweet in the case that the Retweeted Tweet itself is not an extended Tweet, but where the addition of the Retweet prefix exceeds the 140 character limit.

Also, you should be able to use the Status attributes, as the Status object should be fully representative of the corresponding JSON data used to generate it.

For code block usage, see https://help.github.com/articles/creating-and-highlighting-code-blocks/.

@tushar2899 That has nothing to do with this issue.

@ottok92 That should not be the case. Are you sure they are not Retweets? If so, can you provide a reproducible example (MCVE/SSCCE)?

@fabianslife

i am currently having the same Problem, and you seem to be the only to have ever solved it.

That's clearly not the case, as there are multiple better examples and explanations afterwards in this thread.

@fabianslife @ValnirJr

The thing is you need to access the tweet element in the json file.

I'm not sure what JSON file you're referring to, but if you mean the data event handler or the Status object's JSON data, that shouldn't be necessary, as the Status object provided to the Status event handler should be fully representative of the corresponding data provided to the data event handler and used to generate the Status object.

This code will error if tweet is a Status object, as json (rather than _json) is not a valid attribute and you're using undefined variables (rather than strings) to access dictionaries. You're also mixing usage of tweet as a Status object and as a dictionary/JSON and using an unnecessary ==True check.

For code block usage, see https://help.github.com/articles/creating-and-highlighting-code-blocks/.

@Harmon758 Harmon758 added API This is regarding Twitter's API Invalid This is not valid Question This is a question Stale This is inactive, outdated, too old, or no longer applicable labels Jul 31, 2019
@fabiomathu

This comment has been minimized.

@Harmon758
Copy link
Member

@12Akansha

This comment was marked as off-topic.

@Harmon758

This comment was marked as resolved.

@Harmon758 Harmon758 closed this as not planned Won't fix, can't repro, duplicate, stale May 29, 2022
@tweepy tweepy locked as resolved and limited conversation to collaborators May 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API This is regarding Twitter's API Invalid This is not valid Question This is a question Stale This is inactive, outdated, too old, or no longer applicable
Projects
None yet
Development

No branches or pull requests