## Extracting Up-to-Date Metadata for Original Tweets

This step is needed to account for the way the Twitter Streaming API works, i.e., tweets are provided almost instantly after their creation. However, over the lifetime of a tweet, its metadata (e.g., likes, retweets, etc.) is updated. This “fresh” metadata can be extracted from retweets of an original tweet. This notebook provides a step-by-step tutorial of the extraction process.

## Target Audience

This tutorial is aimed at a basic level. You should have basic knowledge of Pandas and Python programming.

## Duration

About half an hour.

## Use Cases

The created file will be used in subsequent analyses.

## Environment Setup

Run the cells below:

In [1]:
import pandas as pd

In [2]:
df_joint = pd.read_pickle("joint_tweetplomacy_23.pkl")

In [3]:
len(df_joint)

3836611

## Timestamps to Datetime 

In [4]:
df_joint['dt'] = pd.to_datetime(df_joint['timeStamp'])

In [5]:
len(df_joint['tweetId'].unique())

2048232

In [6]:
df_joint['retweet_dt'] = pd.to_datetime(df_joint['retweetTimeStamp'])

## Extract Most Up-to-Date Metadata from Retweets

In [7]:
max_scores = df_joint.loc[df_joint.groupby('tweetId')['retweet_dt'].transform(max) == df_joint['retweet_dt']]

In [8]:
len(max_scores)

1174482

In [9]:
len(max_scores['tweetId'].unique())

1174359

In [10]:
max_scores_first= max_scores.groupby('tweetId').first().reset_index()

In [11]:
max_scores_first

Unnamed: 0,tweetId,entities,favorites,followees,followers,hashedUserName,hashtags,language,matchingKeywords,matchingUserMentions,...,sentiments,timeStamp,urls,userBio,userName,topic_covid_19,topic_energy_security,topic_climate_change,dt,retweet_dt
0,947619712149663744,"{'software': 'entity-fishing', 'version': '0.0...",6,9740,224186,7910f73a0c3e908d711d6e8e9b94e367,[],ES,[energía],[@GobiernodeChile],...,,Mon Jan 01 00:05:19 +0000 2018,[],,{'hashed': '7910f73a0c3e908d711d6e8e9b94e367'},False,True,False,2018-01-01 00:05:19+00:00,2018-01-01 00:07:39+00:00
1,947621953870983169,"{'software': 'entity-fishing', 'version': '0.0...",6,7467,7926,9c0e91fbbfa2db3434f1b6cf91bed903,"[NewYear2018, maga, americafirst, realtalk, Ha...",EN,[gas],[@realDonaldTrump],...,"{'software': 'vaderSentiment', 'version': '3.3...",Mon Jan 01 00:14:14 +0000 2018,[],,{'hashed': '9c0e91fbbfa2db3434f1b6cf91bed903'},False,True,False,2018-01-01 00:14:14+00:00,2018-01-01 00:36:26+00:00
2,947629767532171264,"{'software': 'entity-fishing', 'version': '0.0...",0,31,39,65fd240595e50d8afbf89015db295266,[],EN,[climate],[@realDonaldTrump],...,"{'software': 'vaderSentiment', 'version': '3.3...",Mon Jan 01 00:45:17 +0000 2018,[],,{'hashed': '65fd240595e50d8afbf89015db295266'},False,False,True,2018-01-01 00:45:17+00:00,2018-01-01 16:47:19+00:00
3,947634759508942848,"{'software': 'entity-fishing', 'version': '0.0...",0,630,2705905,32730e15a7ca0bf0686ea8d4a3a46ab9,[pilotauctionfacility],EN,[climate],[],...,"{'software': 'vaderSentiment', 'version': '3.3...",Mon Jan 01 01:05:07 +0000 2018,"[{'short': 'https://t.co/aHYsz9SKBg', 'resolve...",The official World Bank Twitter feed. The Worl...,"{'userName': 'WorldBank', 'hashed': '32730e15a...",False,False,True,2018-01-01 01:05:07+00:00,2018-01-01 01:06:03+00:00
4,947638885668081664,"{'software': 'entity-fishing', 'version': '0.0...",1,7266,20407,4c75618b5c0a8645f0757d418cf88024,[],ES,[gas],[@NicolasMaduro],...,,Mon Jan 01 01:21:31 +0000 2018,[],,{'hashed': '4c75618b5c0a8645f0757d418cf88024'},False,True,False,2018-01-01 01:21:31+00:00,2018-01-01 05:23:06+00:00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1174354,1663688236961796096,"{'software': 'entity-fishing', 'version': '0.0...",13,1003,33351,585b1540233e7e59e3f4aa6e9bd4aa08,[HungerHotspots],EN,[climate],[],...,"{'software': 'vaderSentiment', 'version': '3.3...",Tue May 30 23:26:00 +0000 2023,"[{'short': 'https://t.co/JUcnzbd8pK', 'resolve...",UN Assistant Secretary-General for Humanitaria...,"{'userName': 'JoyceMsuya', 'hashed': '585b1540...",False,False,True,2023-05-30 23:26:00+00:00,2023-05-31 11:07:17+00:00
1174355,1663688398752690185,"{'software': 'entity-fishing', 'version': '0.0...",58,822,91652,1509aee35a9df705df7b1fe934ca6123,[RenacimientoDelSur],ES,[energía],"[@NicolasMaduro, @LuchoXBolivia]",...,,Tue May 30 23:26:38 +0000 2023,"[{'short': 'https://t.co/RE1Xwv1rkw', 'resolve...",Canciller de la República Bolivariana de Venez...,"{'userName': 'yvangil', 'hashed': '1509aee35a9...",False,True,False,2023-05-30 23:26:38+00:00,2023-05-31 06:04:15+00:00
1174356,1663688908238987265,"{'software': 'entity-fishing', 'version': '0.0...",3,2392,1529,1b31ab3179478e14d13a3afb27bb4a13,[],ES,[pandemia],[@sanchezcastejon],...,,Tue May 30 23:28:40 +0000 2023,[],,{'hashed': '1b31ab3179478e14d13a3afb27bb4a13'},True,False,False,2023-05-30 23:28:40+00:00,2023-05-31 10:10:48+00:00
1174357,1663692867812950026,"{'software': 'entity-fishing', 'version': '0.0...",214,16839,16595,b014981d42eff47e6b784f800a294e37,[KatiePersinger],EN,[covid],[@cabinetofficeuk],...,"{'software': 'vaderSentiment', 'version': '3.3...",Tue May 30 23:44:24 +0000 2023,"[{'short': 'https://t.co/NGXGZntZ5B', 'resolve...",,{'hashed': 'b014981d42eff47e6b784f800a294e37'},True,False,False,2023-05-30 23:44:24+00:00,2023-05-31 09:17:07+00:00


## Create Dataframe Containing only Original Tweets with Most Up-to-Date Metadata

In [12]:
retweet_is_na = df_joint[df_joint['retweetId'].isna() & ~df_joint['tweetId'].isin(max_scores_first['tweetId'].unique())]

In [13]:
unique_tweets_with_freshest_retweet_metadata = retweet_is_na.append(max_scores_first,sort=True)

In [14]:
len(unique_tweets_with_freshest_retweet_metadata)

2048232

In [15]:
unique_tweets_with_freshest_retweet_metadata

Unnamed: 0,dt,entities,favorites,followees,followers,hashedUserName,hashtags,language,matchingKeywords,matchingUserMentions,...,sentimentPositive,sentiments,timeStamp,topic_climate_change,topic_covid_19,topic_energy_security,tweetId,urls,userBio,userName
11,2018-06-19 10:09:45+00:00,"{'software': 'entity-fishing', 'version': '0.0...",0,126,914692,de562ec84c7998e0b09a4f03841143ed,"[Merkel, PetersbergDialogue]",DE,[Klima],[],...,0.202,"{'software': 'vaderSentiment', 'version': '3.3...",Tue Jun 19 10:09:45 +0000 2018,True,False,False,1009015370823368704,"[{'short': 'https://t.co/sVqDkd4JD9', 'resolve...",Sprecher der Bundesregierung und Chef des Bund...,"{'userName': 'RegSprecher', 'hashed': 'de562ec..."
74,2018-11-11 20:51:56+00:00,"{'software': 'entity-fishing', 'version': '0.0...",0,134,909805,de562ec84c7998e0b09a4f03841143ed,"[Merkel, ParisPeaceForum]",DE,[Ukraine],[@poroshenko],...,0.062,"{'software': 'vaderSentiment', 'version': '3.3...",Sun Nov 11 20:51:56 +0000 2018,False,False,True,1061723222079664131,"[{'short': 'https://t.co/JzuavXcXOm', 'resolve...",Sprecher der Bundesregierung und Chef des Bund...,"{'userName': 'RegSprecher', 'hashed': 'de562ec..."
105,2019-01-22 14:52:15+00:00,"{'software': 'entity-fishing', 'version': '0.0...",0,1053,652136,dd4c8ed8f1015bd4826be9de24b355a2,[Sicherheitsrat:],DE,[Klima],[@UN],...,0.000,"{'software': 'vaderSentiment', 'version': '3.3...",Tue Jan 22 14:52:15 +0000 2019,True,False,False,1087724631128244224,"[{'short': 'https://t.co/Rn50XYKPOh', 'resolve...",Aktuelle Nachrichten aus dem Auswärtigen Amt -...,"{'userName': 'AuswaertigesAmt', 'hashed': 'dd4..."
120,2019-02-12 13:08:14+00:00,"{'software': 'entity-fishing', 'version': '0.0...",0,3106,307342,3584d2d73a504767f27ba07decb7dd70,[Ukraine],DE,[Ukraine],[@UN],...,0.366,"{'software': 'vaderSentiment', 'version': '3.3...",Tue Feb 12 13:08:14 +0000 2019,False,False,True,1095308599654580225,[],Bundesaußenminister & Saarländer. MdB für den ...,"{'userName': 'HeikoMaas', 'hashed': '3584d2d73..."
176,2019-05-16 13:09:28+00:00,"{'software': 'entity-fishing', 'version': '0.0...",0,144,916022,de562ec84c7998e0b09a4f03841143ed,[Merkel],DE,"[Gas, Klima]",[@MinPres],...,0.111,"{'software': 'vaderSentiment', 'version': '3.3...",Thu May 16 13:09:28 +0000 2019,True,False,True,1129010981504475136,[],Sprecher der Bundesregierung und Chef des Bund...,"{'userName': 'RegSprecher', 'hashed': 'de562ec..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1174354,2023-05-30 23:26:00+00:00,"{'software': 'entity-fishing', 'version': '0.0...",13,1003,33351,585b1540233e7e59e3f4aa6e9bd4aa08,[HungerHotspots],EN,[climate],[],...,0.040,"{'software': 'vaderSentiment', 'version': '3.3...",Tue May 30 23:26:00 +0000 2023,True,False,False,1663688236961796096,"[{'short': 'https://t.co/JUcnzbd8pK', 'resolve...",UN Assistant Secretary-General for Humanitaria...,"{'userName': 'JoyceMsuya', 'hashed': '585b1540..."
1174355,2023-05-30 23:26:38+00:00,"{'software': 'entity-fishing', 'version': '0.0...",58,822,91652,1509aee35a9df705df7b1fe934ca6123,[RenacimientoDelSur],ES,[energía],"[@NicolasMaduro, @LuchoXBolivia]",...,,,Tue May 30 23:26:38 +0000 2023,False,False,True,1663688398752690185,"[{'short': 'https://t.co/RE1Xwv1rkw', 'resolve...",Canciller de la República Bolivariana de Venez...,"{'userName': 'yvangil', 'hashed': '1509aee35a9..."
1174356,2023-05-30 23:28:40+00:00,"{'software': 'entity-fishing', 'version': '0.0...",3,2392,1529,1b31ab3179478e14d13a3afb27bb4a13,[],ES,[pandemia],[@sanchezcastejon],...,,,Tue May 30 23:28:40 +0000 2023,False,True,False,1663688908238987265,[],,{'hashed': '1b31ab3179478e14d13a3afb27bb4a13'}
1174357,2023-05-30 23:44:24+00:00,"{'software': 'entity-fishing', 'version': '0.0...",214,16839,16595,b014981d42eff47e6b784f800a294e37,[KatiePersinger],EN,[covid],[@cabinetofficeuk],...,0.000,"{'software': 'vaderSentiment', 'version': '3.3...",Tue May 30 23:44:24 +0000 2023,False,True,False,1663692867812950026,"[{'short': 'https://t.co/NGXGZntZ5B', 'resolve...",,{'hashed': 'b014981d42eff47e6b784f800a294e37'}


## Pickle Dataframe Containing only Original Tweets with Most Up-to-Date Metadata

In [16]:
unique_tweets_with_freshest_retweet_metadata.to_pickle("joint_tweetplomacy_23_unique_tweets_with_freshest_retweet_metadata.pkl")