# Twitter sentiment Analysis using TextBlob and tweepy

This is a little project to test out the TextBlob library for sentiment analysis

## import + configurations

In here we import different libraries to use in our code. Like **tweepy** (twitter python API), **re** for regex expressions and **TextBlob** for sentiment analysis. We also use **csv** to save the result into a .csv file.

In [16]:
import tweepy
import sys
import csv
import re
from textblob import TextBlob

We are also using dict to format the text so that we don't have encoding errors

In [17]:
non_bmp_map = dict.fromkeys(range(0x10000, sys.maxunicode + 1), 0xfffd)


After that, we set the twitter API configurations and get an instance of the API.

In [18]:
consumer_key = 'CONSUMER_KEY'
consumer_secret = 'CONSUMER_SECRET'
access_token = 'ACCESS_TOKEN'
access_secret = 'ACCESS_SECRET'

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
 
api = tweepy.API(auth,wait_on_rate_limit=True)

Then we set some configuration variables like the search query *(searchQuery)*, the number of tweets *(maxTweets)* and the *headers* for the csv file.

In [29]:
searchQuery = 'Trump'
maxTweets = 1000
acceptedCharsRegex = '[^0-9a-zA-Z.:/()&@-_=+;?!*\']+'
# header for csv file
headers = [
    'tweet', 
    'created_at',
    'word_count',
    'favorite_count', 
    'retweet_count',
    'user_followers_count',
    'user_following_count',
    'user_friends_count',
    'user_verified',
    'subjectivity', 
    'polarity', 
    'sentiment'
]

We define a new function *(get_sentiment)* that returns the label for our data. It uses a *threshold* as a configuration.

In [30]:
threshold = {'neutral_min': -0.02, 'neutral_max': 0.25}

# function to get the sentiment
def get_sentiment(text, analysis):
    if(analysis.sentiment.polarity < threshold['neutral_min']):
        return "Negative"
    elif(analysis.sentiment.polarity >= threshold['neutral_max']):
        return "Positive"
    else:
        return "Neutral"

In here, we create our *.csv* file and we write the *headers* in it. Then, we loop for tweets using the *searchQuery* specified in the configuration section.<br>
For every tweet, we get it's full text and we create with it a TextBlob instance. Then we get some *features* to save into the *.csv* file.<br>
And lastly, we return, with the *get_sentiment* function, the label for that tweet.

In [31]:
with open('trump_twitter.csv', 'w', newline='') as csv_file:
    writer = csv.writer(csv_file, delimiter=',')
    writer.writerow(headers)
    for tweet in tweepy.Cursor(api.search,q=searchQuery, tweet_mode='extended').items(maxTweets):
        # getting the tweet dict
        # print(tweet.__dict__)
        
        # cleaning the tweet text of special chars
        tweetText = re.sub(acceptedCharsRegex, ' ', tweet.full_text.translate(non_bmp_map))
        
        # creating the TextBlobObject for analysis
        analysis = TextBlob(tweetText)
        
        # saving all the information into an array
        tweetInfos = [
            tweetText, 
            tweet.created_at.strftime('%Y-%m-%d'),
            analysis.words.__sizeof__(), 
            tweet.favorite_count,
            tweet.retweet_count,
            tweet.user.followers_count,
            tweet.user.following,
            tweet.user.friends_count,
            tweet.user.verified,
            analysis.sentiment.subjectivity, 
            analysis.sentiment.polarity,
            get_sentiment(tweetText,analysis)
        ]
        
        # print(tweetInfos)
        
        # writing the array in the csv file
        writer.writerow(tweetInfos)
