# Introduction

This notebook's main purpose is scraping tweets from the tweeter API and exporting the tweets as dataframes to be consumed by other processes.

Two twitter API endpoints will be utilized:

1. Standard search API: To query tweets using key phrases.
2. Get trends/places API: To query trending hashtags

## 1. Import Libraries

In [1]:
## Import libraries
import tweepy
import configparser
import pandas as pd
from datetime import date

## 2. Twitter Authentication

Set up twitter authentication. Credentials are stored separately in a config file.

In [2]:
## Read in API Configs

## Create config parser instance
config = configparser.ConfigParser()

## Read credentials from config file
config.read('config.ini')

api_key = config['twitter']['api_key']
api_key_secret = config['twitter']['api_key_secret']

access_token = config['twitter']['access_token']
access_token_secret = config['twitter']['access_token_secret']


In [3]:
## Authenticate 

auth = tweepy.OAuth1UserHandler(api_key,api_key_secret)

auth.set_access_token(access_token,access_token_secret)

api = tweepy.API(auth)

## 3. Harvest Tweets

### 3.1 Standard search API

Returns a collection of relevant Tweets matching a specified query.

In [4]:
%%time

## List query phrases
topics = ["Africa","Kenya","Mauritius","South Africa"]

## Tweets to be returned by API
tweet_count = 100

## List containers for API output
tweets = []
time_stamps = []
screen_names = []
topic_country = []

## Query API
for topic in topics:
    
    for tweet in api.search_tweets(q=topic,count=tweet_count,lang='en',result_type="recent"):
        tweets.append(tweet.text)
        time_stamps.append(tweet.created_at)
        screen_names.append(tweet.user.screen_name)
        topic_country.append(topic)
    


CPU times: user 188 ms, sys: 46.9 ms, total: 235 ms
Wall time: 6.36 s


In [5]:

## Create df from API output
topic_df = pd.DataFrame(list(zip(screen_names,topic_country ,tweets,time_stamps)),
               columns =['screen_name', 'Source','tweets','time_stamp'])
topic_df

Unnamed: 0,screen_name,Source,tweets,time_stamp
0,SelamTsegay1927,Africa,RT @Wedimekelle12: |‘.. A communication blacko...,2022-03-31 08:28:40+00:00
1,brie_ake,Africa,RT @Robelgz: “'Catastrophic' humanitarian cris...,2022-03-31 08:28:40+00:00
2,firdausamo,Africa,RT @mohammedhersi: This was found to TOP prior...,2022-03-31 08:28:40+00:00
3,Tasko_twa,Africa,RT @TimesLIVE: Businessman Malcom X has pledge...,2022-03-31 08:28:40+00:00
4,thefield_in,Africa,#CWC22 #SAvENG\n\nDefending champions #TeamEng...,2022-03-31 08:28:39+00:00
...,...,...,...,...
395,prof_oak123,South Africa,Match was over after first innings. \nNo way s...,2022-03-31 08:26:23+00:00
396,onecrapguy,South Africa,No snow here today I’m pleased to confirm.\n.\...,2022-03-31 08:26:20+00:00
397,karabo80630538,South Africa,"@MmusiMaimane kana ha ore We , ore wena le Man...",2022-03-31 08:26:19+00:00
398,MohamedHamdhoon,South Africa,RT @RYOmoha: In South&amp; North American and ...,2022-03-31 08:26:19+00:00


In [6]:
## Topic distribution
topic_df['Source'].value_counts()

Africa          100
Kenya           100
Mauritius       100
South Africa    100
Name: Source, dtype: int64

### 3.2 Get trends near a location API

#### 3.2.1 Query Trends

Returns the top 50 trending topics for a specific id, if trending information is available for it. Note: The id parameter for this endpoint is the "where on earth identifier" or WOEID.

In [9]:
%%time
## Access trending tweets near my location (Kenya)

# WOEID for Kenya (Where On Earth IDentifier)
woeid = 23424863

# fetching the trends
trends = api.get_place_trends(id = woeid)

# printing the information
print("The top trends for the location are :")
print("")

## Topic placeholder
trending_topics = []

## Query and list trends
for value in trends:
    for trend in value['trends']:
        print(trend['name'])
        trending_topics.append(trend['name'])

The top trends for the location are :

#BBIFinalVerdict
CJ Martha Koome
The BBI
#JKLive
#LetThePeopleDecide
#mainaandkingangi
The President
#BBIJudgement
Supreme Court
Constitution
Basic Structure Doctrine
Attorney General
Susan Kihika
BBI Bill
Court of Appeal
Junet
moses kajwang
Wanjiku
Null and Void
Reggae
Junior
Kileleshwa
mithika linturi
Judiciary
Alai
Maraga
churchill show experience
Cameroon
violent sugoi man
Hon Agnes Kagure
mercy mathai
Airtel
Daily Nation
HELB
Nyando
Jeff
KCPE
majimaji tosha
Building Bridges Initiative
IEBC
Benji
MCAs
kuria
The Messiah
Mendy
Ledama
Bruce Willis
Mane
Senegal
Mighty Diamonds
CPU times: user 24.7 ms, sys: 8.38 ms, total: 33.1 ms
Wall time: 878 ms


#### 3.2.2 Query Tweets for Trends

Query tweets for the above trending hashtags

In [12]:
%%time
# Return the most recent tweets for each trend

## tweet count for each hashtag
tweet_count = 50

## List containers for API output
trending_tweets = []
trending_time_stamps = []
trending_screen_names = []
trending_topic = []

## Query tweets from trends

for topic in trending_topics:
    
    for tweet in api.search_tweets(q=topic,count=50,lang='en'):
        trending_tweets.append(tweet.text)
        trending_time_stamps.append(tweet.created_at)
        trending_screen_names.append(tweet.user.screen_name)
        trending_topic.append(topic)
    



CPU times: user 1.76 s, sys: 428 ms, total: 2.19 s
Wall time: 1min 4s


In [13]:
## Create df from API output
trends_df = pd.DataFrame(list(zip(trending_screen_names,trending_topic,trending_tweets,trending_time_stamps)),
               columns =['screen_name','hashtag','tweet','time_stamp'])
trends_df

Unnamed: 0,screen_name,hashtag,tweet,time_stamp
0,EliasKabere,#BBIFinalVerdict,RT @Belive_Kinuthia: “IEBC was legally constit...,2022-03-31 08:47:01+00:00
1,Channel54News,#BBIFinalVerdict,"KENYA:#BBIFinalVerdict \n\n"" If the Supreme Co...",2022-03-31 08:47:00+00:00
2,KoneMoheavy,#BBIFinalVerdict,RT @BravinYuri: Summary of CJ Martha Koome's v...,2022-03-31 08:47:00+00:00
3,GodfearingDude,#BBIFinalVerdict,RT @ntvkenya: CJ Koome: I endorse the findings...,2022-03-31 08:46:59+00:00
4,godwin_sakaya,#BBIFinalVerdict,#Supreme court Judge William Ouko has acted th...,2022-03-31 08:46:59+00:00
...,...,...,...,...
2494,abdiazizhashim1,Mighty Diamonds,The BBI Susan Kihika Sonko Junet Odingas Ledam...,2022-03-31 08:30:00+00:00
2495,exclusiveska,Mighty Diamonds,RT @BigshipSounds: The Mighty Diamonds 🔥🔥 http...,2022-03-31 08:29:55+00:00
2496,Breasman1,Mighty Diamonds,RT @VPRecords: Devastated to hear of the passi...,2022-03-31 08:26:54+00:00
2497,royalrampnews,Mighty Diamonds,MIGHTY DIAMONDS Singer Shot &amp; Killed https...,2022-03-31 08:25:20+00:00


In [14]:
## Hashtag distribution
trends_df.hashtag.value_counts()

#BBIFinalVerdict               50
CJ Martha Koome                50
Cameroon                       50
violent sugoi man              50
Hon Agnes Kagure               50
mercy mathai                   50
Airtel                         50
Daily Nation                   50
HELB                           50
Nyando                         50
Jeff                           50
KCPE                           50
majimaji tosha                 50
Building Bridges Initiative    50
IEBC                           50
MCAs                           50
kuria                          50
The Messiah                    50
Mendy                          50
Ledama                         50
Bruce Willis                   50
Mane                           50
Senegal                        50
churchill show experience      50
Maraga                         50
Alai                           50
Judiciary                      50
The BBI                        50
#JKLive                        50
#LetThePeopleD

## Export Data

Export dataframes as csvs

In [18]:
## Set export file names

today = date.today()
topic_df_name =  'Continent & Country Tweets {}.csv'.format(today)
trends_df_name = 'Location Trend Tweets {}.csv'.format(today)

In [19]:
## Export dataframes

topic_df.to_csv(topic_df_name,index=False)

trends_df.to_csv(trends_df_name,index= False)