Skip to content
This repository has been archived by the owner on Mar 30, 2023. It is now read-only.

TypeError(s), AttributeError(s): on format.py #960

Open
ivanlewin opened this issue Oct 13, 2020 · 13 comments
Open

TypeError(s), AttributeError(s): on format.py #960

ivanlewin opened this issue Oct 13, 2020 · 13 comments

Comments

@ivanlewin
Copy link

ivanlewin commented Oct 13, 2020

  • Python version is 3.8.4;
  • Updated Twint with pip3 install --user --upgrade -e git+https://github.com/twintproject/twint.git@origin/master#egg=twint;
  • I have searched the issues and there are no duplicates of this issue/question/request.
  • Using Windows 10 ver 2004, running from VSCode / Terminal

Code ran:

import twint

c = twint.Config()

c.All = "realDonaldTrump"
c.Since = "2020-10-01"
c.Format = "User: {username} |Tweet: {tweet} |Replies: {replies} |Likes: {likes} |RT: {retweets} |Time: {date} {time}"
c.Store_csv = True
c.Output = "tweets.csv"

twint.run.Search(c)

I just updated twint and tried to run the same script I have been using the last months and got this error:

Traceback (most recent call last):
  File "c:/Users/Ivan/Documents/menta/repos/twint_scraping/pruebas.py", line 13, in <module>
    twint.run.Search(c)
  File "C:\Users\Ivan\AppData\Roaming\Python\Python38\site-packages\twint\run.py", line 427, in Search
    run(config, callback)
  File "C:\Users\Ivan\AppData\Roaming\Python\Python38\site-packages\twint\run.py", line 319, in run
    get_event_loop().run_until_complete(Twint(config).main(callback))
  File "C:\Users\Ivan\AppData\Local\Programs\Python\Python38\lib\asyncio\base_events.py", line 616, in run_until_complete
    return future.result()
  File "C:\Users\Ivan\AppData\Roaming\Python\Python38\site-packages\twint\run.py", line 239, in main
    await task
  File "C:\Users\Ivan\AppData\Roaming\Python\Python38\site-packages\twint\run.py", line 290, in run
    await self.tweets()
  File "C:\Users\Ivan\AppData\Roaming\Python\Python38\site-packages\twint\run.py", line 230, in tweets
    await output.Tweets(tweet, self.config, self.conn)
  File "C:\Users\Ivan\AppData\Roaming\Python\Python38\site-packages\twint\output.py", line 175, in Tweets
    await checkData(tweets, config, conn)
  File "C:\Users\Ivan\AppData\Roaming\Python\Python38\site-packages\twint\output.py", line 140, in checkData
    output = format.Tweet(config, tweet)
  File "C:\Users\Ivan\AppData\Roaming\Python\Python38\site-packages\twint\format.py", line 23, in Tweet
    output = output.replace("{replies}", t.replies_count)
TypeError: replace() argument 2 must be str, not int

It was a type error which I solved by casting the value to a string (see below).
I also had to cast the retweets_count and likes_count attributes

What was

23 output = output.replace("{replies}", t.replies_count)
24 output = output.replace("{retweets}", t.retweets_count)
25 output = output.replace("{likes}", t.likes_count)

I changed to

23 output = output.replace("{replies}", str(t.replies_count))
24 output = output.replace("{retweets}", str(t.retweets_count))
25 output = output.replace("{likes}", str(t.likes_count))

Then I was getting this error:

Traceback (most recent call last):
  File "c:/Users/Ivan/Documents/menta/repos/twint_scraping/pruebas.py", line 13, in <module>
    twint.run.Search(c)
  File "C:\Users\Ivan\AppData\Roaming\Python\Python38\site-packages\twint\run.py", line 427, in Search
    run(config, callback)
  File "C:\Users\Ivan\AppData\Roaming\Python\Python38\site-packages\twint\run.py", line 319, in run
    get_event_loop().run_until_complete(Twint(config).main(callback))
  File "C:\Users\Ivan\AppData\Local\Programs\Python\Python38\lib\asyncio\base_events.py", line 616, in run_until_complete
    return future.result()
  File "C:\Users\Ivan\AppData\Roaming\Python\Python38\site-packages\twint\run.py", line 239, in main
    await task
  File "C:\Users\Ivan\AppData\Roaming\Python\Python38\site-packages\twint\run.py", line 290, in run
    await self.tweets()
  File "C:\Users\Ivan\AppData\Roaming\Python\Python38\site-packages\twint\run.py", line 230, in tweets
    await output.Tweets(tweet, self.config, self.conn)
  File "C:\Users\Ivan\AppData\Roaming\Python\Python38\site-packages\twint\output.py", line 175, in Tweets
    await checkData(tweets, config, conn)
  File "C:\Users\Ivan\AppData\Roaming\Python\Python38\site-packages\twint\output.py", line 140, in checkData
    output = format.Tweet(config, tweet)
  File "C:\Users\Ivan\AppData\Roaming\Python\Python38\site-packages\twint\format.py", line 27, in Tweet
    output = output.replace("{is_retweet}", str(t.retweet))
AttributeError: 'tweet' object has no attribute 'retweet'

Which I solved by commenting the retweet and then user_rt_id
What was

27 output = output.replace("{is_retweet}", str(t.retweet))
28 output = output.replace("{user_rt_id}", str(t.user_rt_id))

I changed to

27 # output = output.replace("{is_retweet}", str(t.retweet))
28 # output = output.replace("{user_rt_id}", str(t.user_rt_id))

I am not sure if my changes are a good solution but they work for me now and maybe they will for someone else

@himanshudabas
Copy link
Contributor

himanshudabas commented Oct 13, 2020

@ivanlewin
yes, you are right

I changed to

23 output = output.replace("{replies}", str(t.replies_count))
24 output = output.replace("{retweets}", str(t.retweets_count))
25 output = output.replace("{likes}", str(t.likes_count))

this will solve the first issue,

and

I changed to

27 # output = output.replace("{is_retweet}", str(t.retweet))
28 # output = output.replace("{user_rt_id}", str(t.user_rt_id))

this will solve the second issue too. (for now)
I have already put up a new PR #955. which solves this issue of retweet & user_rt_id.

You can put up a PR for your fix, but first merge my branch branch into yours, so that you have my fixes too. then you'll have to fix up a few more things. which I can guide you through.

Although I'd recommend you to try as many as possible Format to try to find more bugs before you put up your PR.

@ivanlewin
Copy link
Author

@ivanlewin
yes, you are right

I changed to

23 output = output.replace("{replies}", str(t.replies_count))
24 output = output.replace("{retweets}", str(t.retweets_count))
25 output = output.replace("{likes}", str(t.likes_count))

this will solve the first issue,

and

I changed to

27 # output = output.replace("{is_retweet}", str(t.retweet))
28 # output = output.replace("{user_rt_id}", str(t.user_rt_id))

this will solve the second issue too. (for now)
I have already put up a new PR #955. which solves this issue of retweet & user_rt_id.

You can put up a PR for your fix, but first merge my branch branch into yours, so that you have my fixes too. then you'll have to fix up a few more things. which I can guide you through.

Although I'd recommend you to try as many as possible Format to try to find more bug before you put up your PR.

Sure, will do!

@himanshudabas
Copy link
Contributor

this is what you'd need for handling mentions on line 34 in format.py, because in the new implementation mentions is not a list, instead it is a dict :

34 output = output.replace("{mentions}", ",".join([json.dumps(mention) for mention in t.mentions]))

ivanlewin added a commit to ivanlewin/twint that referenced this issue Oct 13, 2020
ivanlewin added a commit to ivanlewin/twint that referenced this issue Oct 13, 2020
@ivanlewin
Copy link
Author

ivanlewin commented Oct 13, 2020

Hi, I merged your branch into my fork and made the changes, is that okay? I don't want to break anything heh.

I tested it using the same script I used originally and it worked

@himanshudabas
Copy link
Contributor

himanshudabas commented Oct 13, 2020

looks good.

@ivanlewin ok I found another issue.

Traceback (most recent call last):
  File "ivan.py", line 11, in <module>
    twint.run.Search(c)
  File "/home/baapuji/tor_test/twint/twint/run.py", line 419, in Search
    run(config, callback)
  File "/home/baapuji/tor_test/twint/twint/run.py", line 315, in run
    get_event_loop().run_until_complete(Twint(config).main(callback))
  File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/home/baapuji/tor_test/twint/twint/run.py", line 235, in main
    await task
  File "/home/baapuji/tor_test/twint/twint/run.py", line 286, in run
    await self.tweets()
  File "/home/baapuji/tor_test/twint/twint/run.py", line 226, in tweets
    await output.Tweets(tweet, self.config, self.conn)
  File "/home/baapuji/tor_test/twint/twint/output.py", line 166, in Tweets
    await checkData(tweets, config, conn)
  File "/home/baapuji/tor_test/twint/twint/output.py", line 137, in checkData
    output = format.Tweet(config, tweet)
  File "/home/baapuji/tor_test/twint/twint/format.py", line 14, in Tweet
    output = output.replace("{place}", t.place)
TypeError: replace() argument 2 must be str, not dict

might wanna fix this too

@ivanlewin
Copy link
Author

@himanshudabas cool. could you share me what you ran? I didn't get back any tweets with place info when I ran it

@himanshudabas
Copy link
Contributor

himanshudabas commented Oct 13, 2020

sure.

import twint
c = twint.Config()

c.Limit = 100
c.Search = "apple"
c.Store_json = True
c.Output = "tweets.json"
c.Format = "User: {username} |Tweet: {tweet} |Replies: {replies} |Likes: {likes} |RT: {retweets} |Time: {date} {time}"
c.Geo = "36.055121,-119.01595,10mi"

twint.run.Search(c)

Aah never mind. I figured out why this is happening.
when I implemented the parser for tweet data, I got things mixed up. I though place and geo were the same thing. clearly they are not.
I'll put up a fix for that perhaps tomorrow.

@ivanlewin
Copy link
Author

For some reason I am getting some t.quote_url as a 0 instead of an empty string
image

Should we check for str.isdigit() and replace with ""? I mean casting 0 to str would work but maybe it's more elegant in the final outcome?.

Side question: this format.py file is only for printing to the console, right? Or does it have to do with saving files as .csv, .json etc..

@himanshudabas
Copy link
Contributor

yes this is something i looked into yesterday.
I am planning to fix this inside tweet.py that is where quote_url is assigned so we won't have to check for that condition specifically.

Actually what 0 represent is that the Tweet contains a Quoted Tweet which has been deleted. That is the reason it's URL is not present.
I'm planning to replace the 0 with "<deleted>" which would be a string so there won't be a separate check required in format.py.

@skwolvie
Copy link

how did you change it to this:
23 output = output.replace("{replies}", str(t.replies_count))
24 output = output.replace("{retweets}", str(t.retweets_count))
25 output = output.replace("{likes}", str(t.likes_count))

How can i edit the package?

@ivanlewin
Copy link
Author

how did you change it to this:
23 output = output.replace("{replies}", str(t.replies_count))
24 output = output.replace("{retweets}", str(t.retweets_count))
25 output = output.replace("{likes}", str(t.likes_count))

How can i edit the package?

You can fork the repo, download it and modify it as you please. Keep in mind to set your working directory to wherever you are editing your script, otherwise Python will be using the files in the site-packages folder

@skwolvie
Copy link

skwolvie commented Nov 8, 2020

how did you change it to this:
23 output = output.replace("{replies}", str(t.replies_count))
24 output = output.replace("{retweets}", str(t.retweets_count))
25 output = output.replace("{likes}", str(t.likes_count))
How can i edit the package?

You can fork the repo, download it and modify it as you please. Keep in mind to set your working directory to wherever you are editing your script, otherwise Python will be using the files in the site-packages folder

     21         output = output.replace("{hashtags}", ",".join(t.hashtags))
     22         output = output.replace("{cashtags}", ",".join(t.cashtags))
---> 23         output = output.replace("{replies}", str(t.replies_count))
     24         output = output.replace("{retweets}", str(t.retweets_count))
     25         output = output.replace("{likes}", str(t.likes_count))

TypeError: replace() argument 2 must be str, not int

I replaced it. Yet, I still get the same error with updated code.

@vxhl
Copy link

vxhl commented Jun 13, 2021

how did you change it to this:
23 output = output.replace("{replies}", str(t.replies_count))
24 output = output.replace("{retweets}", str(t.retweets_count))
25 output = output.replace("{likes}", str(t.likes_count))
How can i edit the package?

You can fork the repo, download it and modify it as you please. Keep in mind to set your working directory to wherever you are editing your script, otherwise Python will be using the files in the site-packages folder

     21         output = output.replace("{hashtags}", ",".join(t.hashtags))
     22         output = output.replace("{cashtags}", ",".join(t.cashtags))
---> 23         output = output.replace("{replies}", str(t.replies_count))
     24         output = output.replace("{retweets}", str(t.retweets_count))
     25         output = output.replace("{likes}", str(t.likes_count))

TypeError: replace() argument 2 must be str, not int

I replaced it. Yet, I still get the same error with updated code.

Yeah, it's the same for me.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

4 participants