# Easily Get TWEETS about a TOPIC on Twitter    
In this video we will see how to get the tweets related to covid19, and finally a pandas dataframe containing those tweets.

In [6]:
# Useful Modules
import tweepy as tw
import pandas as pd
import json

### 1- Load Credentials For Authentication      

To access the Twitter API, you will need your Twitter developer credentials. I made a separate video explaining how to get them (link: https://www.youtube.com/watch?v=PqqXjwoDQiY).   
To make sure nobody else access those credentials but me, I created a json file containing all the information. So the following function is used to load them. 

In [7]:
def load_credentials(path_to_file):

    with open(path_to_file) as json_file:
        credentials = json.load(json_file)
    return credentials

### 1- Access the API   
Once I have the credentials, it can then easily access the API 

In [8]:
def get_API(credentials):
    auth = tw.OAuthHandler(credentials['consumer_key'], credentials['consumer_secret'])
    auth.set_access_token(credentials['access_token'], credentials['access_token_secret'])
    
    return tw.API(auth, wait_on_rate_limit=True)

In [9]:
credentials = load_credentials('./data/credentials.json')
api = get_API(credentials)

### 2- Search Tweets     
At this stage, I already have the API, so I can start searching for tweets related to a topic, a given date (date_since) and the number of tweets we want to retrieve

In [10]:
def search_tweets(search_word, date_since, limit=20):
    tweets_cursor = tw.Cursor(api.search,
              q=search_words,
              lang="en",
              since=date_since).items(limit)
    
    return tweets_cursor

Lets say we want tweets containing the term #covid19 since October 1st.

In [11]:
# Define the search term and the date_since date as variables
search_words = "#covid19"
date_since = "2020-10-01" # October 1st

In the function above, we used .Cursor() to search twitter for tweets containing the term #covid19. We can restrict the number of tweets returned by specifying a number in the .items() method. .items(15) will return 15 of the most recent tweets.

In [12]:
covid_cursor = search_tweets(search_words, date_since)

.Cursor() returns an object that you can iterate or loop over to access the data collected. Each item in the iterator has various attributes that you can access to get information about each tweet including:

- the text of the tweet  
- the person who posted the tweet  
- the date the tweet was posted   
- and more.    

In [13]:
i=0
for tweet in covid_cursor:
    i=i+1
    print(f"Tweet N° {i}: {tweet.text}\n")

Tweet N° 1: RT @FaceTheNation: “There’s not going to be an intervention that really thwarts this, short of the ability to get a vaccine, which is proba…

Tweet N° 2: RT @QuickTake: "A 2nd Jacinda Ardern term is going to be full of all sorts of different challenges to the ones that characterized the first…

Tweet N° 3: RT @MickyJo98017844: So my genius of a nephew decided to tell his distraught little sister that Christmas is canceled because Santa Claus h…

Tweet N° 4: « One recent study of 100 recovered adults found that 78 of them showed signs of heart damage. We have no idea whet… https://t.co/LZKvdFNJLB

Tweet N° 5: @SenKamalaHarris Trump has failed America on #covid19. He threatens our health &amp; national security. If a Covid19 di… https://t.co/ZU3pbfkeKa

Tweet N° 6: RT @AnkitLal: I would miss Durga Puja at CR Park.

But strongly believe that the decision by local people to not have public celebrations w…

Tweet N° 7: @fuseboo @Rickie_Special Since #COVID19 stops us from travel

### Transform the tweets into Pandas DataFrame

In [14]:
"""
Function creating pandas dataframe from Cursor
"""
def create_df_from_cursor(tweet_cursor):
   
    all_tweets = []
    
    for tweet in tweet_cursor:
        parsed_tweet = {}
        parsed_tweet['date'] = tweet.created_at
        parsed_tweet['author'] = tweet.user.name
        parsed_tweet['twitter_name'] = tweet.user.screen_name
        parsed_tweet['text'] = tweet.text
        parsed_tweet['number_of_likes'] = tweet.favorite_count
        parsed_tweet['number_of_retweets'] = tweet.retweet_count
        
        all_tweets.append(parsed_tweet)
    
    tweets_df = pd.DataFrame(all_tweets)
    
    # Remove duplicates
    tweets_df = tweets_df.drop_duplicates( "text" , keep='first')
    
    return tweets_df 

### 3- Final Step - Putting All Together

In [15]:
covid_cursor = search_tweets(search_words, date_since)

In [16]:
df = create_df_from_cursor(covid_cursor)

In [17]:
df.shape

(17, 6)

In [18]:
df.head()

Unnamed: 0,date,author,twitter_name,text,number_of_likes,number_of_retweets
0,2020-10-19 10:44:28,PerSueTheDreamNow,suevess,RT @FaceTheNation: “There’s not going to be an...,0,152
1,2020-10-19 10:44:28,Owen Franks,TheGingerJourno,"RT @QuickTake: ""A 2nd Jacinda Ardern term is g...",0,7
2,2020-10-19 10:44:28,Duchess of Dalston #FBPE #FBR,DalstonOf,RT @MickyJo98017844: So my genius of a nephew ...,0,3
3,2020-10-19 10:44:27,Olivier Racle,olivieracle,« One recent study of 100 recovered adults fou...,0,0
4,2020-10-19 10:44:27,EJ,congosdad,@SenKamalaHarris Trump has failed America on #...,0,0
