Skip to content
This repository has been archived by the owner on Mar 30, 2023. It is now read-only.

[Question] Is it possible to get the replies (and replies to the replies) to a tweet with twint? #513

Open
pbabvey opened this issue Sep 6, 2019 · 15 comments
Labels

Comments

@pbabvey
Copy link

pbabvey commented Sep 6, 2019

No description provided.

@pielco11
Copy link
Member

pielco11 commented Sep 7, 2019

Yes&No

No because there's no way, as of now, to place the ID of the tweet and get only the replies to it

Yes because if you know the ID of the tweet which you want the replies of (let's call it TweetID, for example), you have just to search for tweets sent to your target, and then filter for conversation_id == TweetID

@pbabvey
Copy link
Author

pbabvey commented Sep 8, 2019

Actually I am trying to collect all the reply threads to a primary tweet. Since the reply_to_id does not exist among tweet attributes, using the above method the model mix up the replies threads in some cases.

Assume user A replies to a tweet, then it gets a reply from user B. After, user A responds to B's reply. Or assume user A replies to the main tweet several times and gets replies from a group of users in each case. Then, using, "To" configuration the model cannot get the correct tree structure of replies. Is there any other method to troubleshoot the problem?

I wonder the only way to resolve some ambiguation is using the 'reply_to' attributes, unless, we can add some fields to the attributes.

@pielco11
Copy link
Member

pielco11 commented Sep 9, 2019

Assume user A replies to a tweet, then it gets a reply from user B. After, user A responds to B's reply.

Technically speaking, in such case a new discussion is involved

In your case I would get all the replies to the "mother tweet", and than for every "child tweet", get the corresponding replies

In the from field you can add up to 20 users (about, more or less), also considering that when you reply to a tweet and (maybe) start a new discussion, all the users involved in the "inherited discussion(s)" get notified. So you can filter with from field and to field, and then organize the data using conversation_id as driver

That's what I'd do, as starting point at least

Hope this helps

@pushshift
Copy link

pushshift commented Sep 10, 2019

I am fairly certain that searching "to:@user" picks up all replies regardless of where they are in the tree. This should get replies to the original user, replies to replies, etc.

If you notice on Twitter itself, all names involved in a reply are there. So if @user1 makes a tweet and @user2 replies to @user1 and @user3 replies to @user2, when @user3 makes a reply, you will see both @user2 and @user1 in that tweet.

In this case, simply doing a search for "to:@original_user" should eventually find all tweets in the tree.

(I'm ~80% this is the case.)

@pbabvey
Copy link
Author

pbabvey commented Sep 10, 2019

I am fairly certain that searching "to:@user" picks up all replies regardless of where they are in the tree. This should get replies to the original user, replies to replies, etc.

If you notice on Twitter itself, all names involved in a reply are there. So if @user1 makes a tweet and @user2 replies to @user1 and @user3 replies to @user2, when @user3 makes a reply, you will see both @user2 and @user1 in that tweet.

In this case, simply doing a search for "to:@original_user" should eventually find all tweets in the tree.

(I'm ~80% this is the case.)

Thank you for your response.
Unfortunately, "to:@original_user" does not give replies to the replies. Actually, it finds all the tweets that involve merely original_user. For example, if we run the code below, the "to:@original_user" gets just the direct replies.

import twint
from collections import Counter

mothers = twint.Config()
mothers.Username = "@JonAcuff"
mothers.Since = "2019-09-04"
mothers.Until = "2019-09-08"
mothers.Lang = 'en'
mothers.Pandas = True
mothers.Store_csv = True
mothers.Hide_output = True
twint.run.Search(mothers)
df = twint.storage.panda.Tweets_df
Replies = {x:y for x,y in zip(df['conversation_id'],df['nreplies'])}

replies = twint.Config()
replies.Since = "2019-09-04"
replies.Until = "2019-09-10"
replies.Pandas = True
replies.To = "@JonAcuff"
replies.Hide_output = True
twint.run.Search(replies)
df = twint.storage.panda.Tweets_df

fetchedReplies =Counter(df['conversation_id'])
for tweet in Replies:
    print(tweet, "\t{}\t{}\t".format(Replies[tweet],fetchedReplies[tweet]))

If we could customize to "to:@original_user" that made everything easier.

@pushshift
Copy link

My bad -- the mistake I made was using "to:@user1" instead of "@user1" -- if you do a search for "@user1" it picks up all tweets in the tree (replies, replies to replies, etc.)

I just tested this and was able to reconstruct the entire tree for a few sample cases.

So this works well -- you will also get all user mentions with "@user1" but you can throw everything out except the tweets with "in_reply_to_status_id" or "in_reply_to_screen_name".

@pbabvey pbabvey changed the title [Question] Is it possible to get the replies to a tweet with twint? [Question] Is it possible to get the replies (and replies to the replies) to a tweet with twint? Sep 12, 2019
@pbabvey
Copy link
Author

pbabvey commented Sep 12, 2019 via email

@dotgodly
Copy link

Is in_reply_to_status_id still a response? Using the search query "@username" or "to:@username" does not give me that field.

I can track replies with reply_to and conversation_id but I can't put them in the correct order with just that information.

@AmanKabra
Copy link

To fetch replies to a tweet, one needs the tweet ID. The tweet ID (of 20 characters) scraped by twint is getting rounded off to the 5th digit from left. Therefore, it is not the correct representation of the tweet. Can anyone help?

@himanshudabas
Copy link
Contributor

@AmanKabra
You must be opening the CSV in MS Excel?
MS excel cannot represent integers bigger than 15 digits correctly.
Try importing the CSV in pandas as a dataframe, that will give the correct Tweet IDs, Or perhaps try opening the CSV in a text editor like Notepad++, but I'd suggest you to use pandas instead.

@AmanKabra
Copy link

@himanshudabas
To give an example, here's how the first 10 rows of my dataset looks like in pandas:

0 1.289350e+18
1 1.289350e+18
2 1.289340e+18
3 1.289330e+18
4 1.289330e+18
5 1.289330e+18
6 1.289310e+18
7 1.289310e+18
8 1.289310e+18
9 1.289310e+18

These are for all unique tweets. 4 IDs are being shown as duplicate (using drop.duplicate() function in python). This is only possible if there last few digits of conversation_id are being set to zero.

Could you suggest something else? Thanks in advance.

@himanshudabas
Copy link
Contributor

@AmanKabra
use something like this to extract all the ids in the desired format.
Note tweet_data_pd_df here is your csv data and np is Numpy

all_ids = tweet_data_pd_df["id"].fillna(0.0).astype(np.int64)

@git175
Copy link

git175 commented Nov 4, 2020

int64

@himanshudabas
To give an example, here's how the first 10 rows of my dataset looks like in pandas:

0 1.289350e+18
1 1.289350e+18
2 1.289340e+18
3 1.289330e+18
4 1.289330e+18
5 1.289330e+18
6 1.289310e+18
7 1.289310e+18
8 1.289310e+18
9 1.289310e+18

These are for all unique tweets. 4 IDs are being shown as duplicate (using drop.duplicate() function in python). This is only possible if there last few digits of conversation_id are being set to zero.

Could you suggest something else? Thanks in advance.

No description provided.

@AmanKabra
use something like this to extract all the ids in the desired format.
Note tweet_data_pd_df here is your csv data and np is Numpy

all_ids = tweet_data_pd_df["id"].fillna(0.0).astype(np.int64)

Yes&No

No because there's no way, as of now, to place the ID of the tweet and get only the replies to it

Yes because if you know the ID of the tweet which you want the replies of (let's call it TweetID, for example), you have just to search for tweets sent to your target, and then filter for conversation_id == TweetID

@AmanKabra
Copy link

AmanKabra commented Nov 5, 2020

@himanshudabas

Response:

0 1289350000000000000
1 1289350000000000000
2 1289340000000000000
3 1289330000000000000
4 1289330000000000000
...
14750 1289375576856330240
14751 1289370524196298752
14752 1289354168709165056
14753 1289354145258856448
14754 1289354143362985984
Name: id, Length: 14755, dtype: int64

Should I scrape from scratch?

@himanshudabas
Copy link
Contributor

@AmanKabra
For some reason your initial tweet ids have been changed.
Perhaps you tried to read them somewhere and modified the truncated values.
Try to do a fresh scrape and check, load the CSV in pandas.
That should give you the desired tweet ids.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Development

No branches or pull requests

7 participants