### Collection tweets based on key words

This notebook will show you how to use the tweepy python library to collect tweets from Twitter based on key words

### STEP 1: PYTHON PACKAGES INSTALLATION

Install the following python packages that will help you to collect data from twitter.com

In [1]:
!pip install tweepy



In [2]:
!pip install unidecode

Collecting unidecode
  Downloading Unidecode-1.3.4-py3-none-any.whl (235 kB)
[K     |████████████████████████████████| 235 kB 5.2 MB/s 
[?25hInstalling collected packages: unidecode
Successfully installed unidecode-1.3.4


### STEP 2: IMPORT IMPORTANT PACKAGES

In [3]:
#import dependencies
import tweepy
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import json
from unidecode import unidecode
import time
import datetime
from tqdm import tqdm 
import pandas as pd  
import numpy as np 

### STEP 3: AUTHENTICATING TWITTER'S API

In [4]:
consumer_key = 'XeNVFTZF6hmtwX1amRdi6YyeU'
consumer_secret = 'gWcL5O1PfZxoe1zOcZ0INbZ8CG3r5mCSjkabIT5H39KUAZSZzF'

access_token = '1207771465879359507-Ihkj4DHyZ3ySX1XAHtFvVUwswjSULp'
access_secret = 'bGYKr4NlKGOaG0dRUCsxyNupCgE3S14XVpSKzitvPZhGx'

### STEP 4: CONNECT TO TWITTER API USING THE SECRET KEY AND ACCESS TOKEN

In [5]:
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)

api = tweepy.API(auth)

### STEP 5: DEFINE A FUNCTION THAT WILL TAKE THE SEARCH QUERY

In [6]:
def tweetSearch(query, limit):
    """
    This function will search a query provided in the twitter and,
    retun a list of all tweets that have a query. 
    """

    # Create a blank variable
    tweets = []

    # Iterate through Twitter using Tweepy to find our query with our defined limit
    for page in tweepy.Cursor(
        api.search, q=query, count=limit, tweet_mode="extended"
    ).pages(limit):
        for tweet in page:
            tweets.append(tweet)

    # return tweets
    return tweets

### CREATE A FUNCTION TO SAVE THE TWEETS INTO A DATAFRAME

In [7]:
def tweets_to_data_frame(tweets):
    """
    This function will receive tweets and collect specific data from it such as place, tweet's text,likes 
    retweets and save them into a pandas data frame.
    
    This function will return a pandas data frame that contains data from twitter.
    """
    df = pd.DataFrame(data=[tweet.full_text.encode('utf-8') for tweet in tweets], columns=["Tweets"])

    df["id"] = np.array([tweet.id for tweet in tweets])
    df["lens"] = np.array([len(tweet.full_text) for tweet in tweets])
    df["date"] = np.array([tweet.created_at for tweet in tweets])
    df["place"] = np.array([tweet.place for tweet in tweets])
    df["coordinateS"] = np.array([tweet.coordinates for tweet in tweets])
    df["lang"] = np.array([tweet.lang for tweet in tweets])
    df["source"] = np.array([tweet.source for tweet in tweets])
    df["likes"] = np.array([tweet.favorite_count for tweet in tweets])
    df["retweets"] = np.array([tweet.retweet_count for tweet in tweets])
    return df

### STEP 7: ADD TWITTER HASHTAGS RELATED TO GENDER-BASED VIOLENCE

In [8]:
# add hashtags in the following list
hashtags = [
'#GBV',
'#sexism',
'#rape'    
]

### STEP 8: RUN BOTH FUNCTIONS TO COLLECT DATA FROM TWITTER RELATED TO THE HASHTAGS LISTED ABOVE

In [9]:
total_tweets = 0

"""
The following for loop will collect a tweets that have the hashtags
 mentioned in the list and save the tweets into csv file
"""

for n in tqdm(hashtags):
    # first we fetch all tweets that have specific hashtag
    hash_tweets = tweetSearch(query=n,limit=7000)
    total_tweets += int(len(hash_tweets))
    
    # second we convert our tweets into datarame
    df = tweets_to_data_frame(hash_tweets)
    
    #third we save the dataframe into csv file
    df.to_csv("tweets.csv".format(n))

100%|██████████| 3/3 [00:39<00:00, 13.28s/it]


In [10]:
df

Unnamed: 0,Tweets,id,lens,date,place,coordinateS,lang,source,likes,retweets
0,b'RT @DawitTe70002088: Sexual violence is bein...,1517403679074267137,143,2022-04-22 07:23:23,,,en,Twitter for Android,0,288
1,b'RT @DawitTe70002088: Sexual violence is bein...,1517403534450429952,143,2022-04-22 07:22:49,,,en,Twitter for Android,0,288
2,b'\xe0\xa6\xb9\xe0\xa6\xbe\xe0\xa6\xa4\xe0\xa7...,1517401934487261185,121,2022-04-22 07:16:27,,,bn,Twitter Web App,0,0
3,b'@Chella38641 @sumanthraman \xe0\xae\xa8\xe0\...,1517401747165425664,93,2022-04-22 07:15:43,,,ta,Twitter for Android,0,0
4,b'\xe0\xb0\xaf\xe0\xb1\x81\xe0\xb0\xb5\xe0\xb0...,1517400665915813889,147,2022-04-22 07:11:25,,,te,Twitter Web App,0,0
...,...,...,...,...,...,...,...,...,...,...
4366,b'#South Africa: #rape crisis (Cape Town) 0214...,1514444853131812864,50,2022-04-14 03:26:04,,,en,PixelTweeter,0,0
4367,b'Never be ashamed of #rape be angry and tell ...,1514443768488943622,75,2022-04-14 03:21:46,,,en,Twitter for Android,0,0
4368,b'One of my cousin just raped. \xf0\x9f\x98\x9...,1514442493068255232,59,2022-04-14 03:16:41,,,en,Twitter for iPhone,0,0
4369,b'https://t.co/8I1KiwWpFD #Dairy #DairyKills #...,1514439575086710787,198,2022-04-14 03:05:06,,,und,Twitter for iPhone,14,6


### SHOW TOTAL NUMBER OF TWEETS COLLECTED

In [11]:
# show total number of tweets collected
print("total_tweets: {}".format(total_tweets))

total_tweets: 6187


For more tweepy configuration please read the tweepy documentation here (https://docs.tweepy.org/en/latest/)