# Sentiment Analysis over Tesla's Cybertruck realease, from twitter data

Problem: a company wants to know the sentiment over a new product right after the release. 

Why is this usefull: This can be usefull to do pre-releases, feel the overall sentiment about the product over the internet and do the needed changes if needed. Also for after releases to predict how well the sales are going to be or to find insights to chages of product, comunication etc

Tech:
1. Webscrapping via API or not
2. NPL

Tasks/Brundown:
1. Get data from twitter from 2 days after the realease of Tesla's Cybertruck. It was released on november 21, 2019
2. Run a sentiment analysis over the tweets
3. Determine the overall sentiment about the product and give insights about possible changes on the product, future comunication, marketing etc

- Day 1 - 11/03/2021: I tryed to find the best library to get the tweets. tweepy was the first attempt via twitter API, but I couldn't figure it out how to filter by the tweet date, so I used snscrape that web scrapes twitter and other social medias. Figure it out how to get the tweets based on words and dates
- Day 2 - 12/03/2021: Clean the tweet for analysis, run sentiment analysis on the model with TextBlob. Took a while to figure it out the right functions to clean the tweets.
- Day 3 - 15/03/2021: Adapted the model to only english tweets. Realized filtering tweets by only one word can return too many tweets that has no relation to the feel of the product, so to be more assertive, I ran with "cybertruck" and "looks", with this we can take more insights for example, to the design area etc
- Day 4 -  - Find the most used words so we can have an idea of the context for the positive/negative tweets

In [87]:
import snscrape.modules.twitter as sntwitter
import pandas as pd
import re
import string
from textblob import TextBlob

In [88]:
# Setting number of tweets to return
maxTweets = 2000

# Creating list to append tweet data to
tweets_list = []

# Using TwitterSearchScraper to scrape data and append tweets to list, filtering by word, language and date range
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('cybertruck looks lang:en since:2019-11-22 until:2019-11-24').get_items()):
    if i>maxTweets:
        break
    tweets_list.append([tweet.content])


In [89]:
# Creating a dataframe from the tweets list above
tweets_df = pd.DataFrame(tweets_list, columns=['Tweet'])

# Display first 5 entries from dataframe
tweets_df.head()

Unnamed: 0,Tweet
0,Not in the truck market but totally digging th...
1,"Tesla received 146,000 Cybertruck pre-orders i..."
2,#Cybertruck looks like a pinewood derby car cr...
3,#CyberTruck in (confirmed) Matte Black looks r...
4,The Tesla cybertruck looks like a PS1 car from...


In [90]:
tweets_df['Tweet'][3]

'#CyberTruck in (confirmed) Matte Black looks real good! 👍🏻 https://t.co/cJtR1cCVTK'

In [91]:
# We notice a lot of non words that can be bad for the sentment analysis, like @, RT, numbers etc
tweets_df

Unnamed: 0,Tweet
0,Not in the truck market but totally digging th...
1,"Tesla received 146,000 Cybertruck pre-orders i..."
2,#Cybertruck looks like a pinewood derby car cr...
3,#CyberTruck in (confirmed) Matte Black looks r...
4,The Tesla cybertruck looks like a PS1 car from...
...,...
1996,I actually like the look of the cybertruck. Bu...
1997,".@Tesla unveils its #Cybertruck, with a price ..."
1998,The cybertruck is the dumbest and best thing I...
1999,why does Tesla’s new cybertruck look like some...


In [92]:
#cleaning the text function
def cleanTxt(text):
    text = re.sub(r'@[A-Za-z0-9]+', '', text) #remove @ mentions
    text = re.sub(r'_[A-Za-z0-9]+', '', text) #remove underscore _
    text = re.sub(r'\n', ' ', text) #remove new line code
    text = re.sub(r'#', '', text) #remove hashtag symbol
    text = re.sub(r'RT[\s]+', '', text) #remove Retweet
    text = re.sub(r'https?:\/\/\S+', '', text) #remove hyperlink
    
    return text

In [93]:
#cleaning the text
tweets_df['Tweet'] = tweets_df['Tweet'].apply(cleanTxt)

tweets_df

Unnamed: 0,Tweet
0,Not in the truck market but totally digging th...
1,"Tesla received 146,000 Cybertruck pre-orders i..."
2,Cybertruck looks like a pinewood derby car cra...
3,CyberTruck in (confirmed) Matte Black looks re...
4,The Tesla cybertruck looks like a PS1 car from...
...,...
1996,I actually like the look of the cybertruck. Bu...
1997,". unveils its Cybertruck, with a price startin..."
1998,The cybertruck is the dumbest and best thing I...
1999,why does Tesla’s new cybertruck look like some...


In [94]:
#create a function to get subjectivity
def getSubjectivity(text):
    return TextBlob(text).sentiment.subjectivity

#create a function to get polarity
def getPolarity(text):
    return TextBlob(text).sentiment.polarity

#crate the 2 new collumns with the anaçysis
tweets_df['Subjectivity'] = tweets_df['Tweet'].apply(getSubjectivity)
tweets_df['Polarity'] = tweets_df['Tweet'].apply(getPolarity)

tweets_df.head()

Unnamed: 0,Tweet,Subjectivity,Polarity
0,Not in the truck market but totally digging th...,0.527778,0.083333
1,"Tesla received 146,000 Cybertruck pre-orders i...",0.5,-0.225
2,Cybertruck looks like a pinewood derby car cra...,0.214286,-0.071429
3,CyberTruck in (confirmed) Matte Black looks re...,0.777778,0.411111
4,The Tesla cybertruck looks like a PS1 car from...,0.0,0.0


In [95]:
tweets_df['Polarity'].describe()

count    2001.000000
mean        0.044887
std         0.324268
min        -1.000000
25%        -0.088333
50%         0.000000
75%         0.234127
max         1.000000
Name: Polarity, dtype: float64

In [96]:
tweets_df.loc[tweets_df['Polarity']  == 1]

Unnamed: 0,Tweet,Subjectivity,Polarity
199,man the Cybertruck looks awesome. Stretching...,1.0,1.0
376,We meet again Tesla.😍 No Cybertruck here. Took...,1.0,1.0
384,The cybertruck looks awesome. if you could c...,1.0,1.0
456,The Cybertruck looks awesome to me,1.0,1.0
511,"in the meantime, before production starts, if...",0.9,1.0
655,I don't care what anyone says. The Cybertruck ...,1.0,1.0
742,do not change the design of the Cybertruck it...,1.0,1.0
763,The cybertruck looks fucking awesome,1.0,1.0
996,Best design part of 's cybertruck is the fact ...,0.3,1.0
1539,how can you make a pick up truck aerodynamic...,0.75,1.0


In [97]:
#create a function to determine if the tweet has positive or negative sentiment
def getSentiment(score):
    if score < 0:
        return "negative"
    elif score == 0:
        return 'neutral'
    else:
        return "positive"
    
#create the column with the sentiment
tweets_df['Sentiment'] = tweets_df['Polarity'].apply(getSentiment)
tweets_df

Unnamed: 0,Tweet,Subjectivity,Polarity,Sentiment
0,Not in the truck market but totally digging th...,0.527778,0.083333,positive
1,"Tesla received 146,000 Cybertruck pre-orders i...",0.500000,-0.225000,negative
2,Cybertruck looks like a pinewood derby car cra...,0.214286,-0.071429,negative
3,CyberTruck in (confirmed) Matte Black looks re...,0.777778,0.411111,positive
4,The Tesla cybertruck looks like a PS1 car from...,0.000000,0.000000,neutral
...,...,...,...,...
1996,I actually like the look of the cybertruck. Bu...,0.383333,0.200000,positive
1997,". unveils its Cybertruck, with a price startin...",0.500000,0.100000,positive
1998,The cybertruck is the dumbest and best thing I...,0.361905,0.511905,positive
1999,why does Tesla’s new cybertruck look like some...,0.443939,-0.015152,negative


In [98]:
tweets_df['Tweet'][3]

'CyberTruck in (confirmed) Matte Black looks real good! 👍🏻 '

In [99]:
tweets_df['Sentiment'].value_counts()

positive    991
negative    599
neutral     411
Name: Sentiment, dtype: int64

In [168]:
#it appears that the overal sentiment over the tweets are positive, but, there are much more positives than negatives

In [100]:
tweets_df.loc[tweets_df['Sentiment'] == "positive"]

Unnamed: 0,Tweet,Subjectivity,Polarity,Sentiment
0,Not in the truck market but totally digging th...,0.527778,0.083333,positive
3,CyberTruck in (confirmed) Matte Black looks re...,0.777778,0.411111,positive
5,"Ok I'm just gonna say it, I think the new Tesl...",0.477273,0.318182,positive
6,I think the Cybertruck has real potential. Nee...,0.633333,0.433333,positive
17,Just imagine Tesla's cybertruck was built by I...,0.650000,0.350000,positive
...,...,...,...,...
1994,The problem with the Cybertruck isn’t that it ...,1.000000,0.250000,positive
1996,I actually like the look of the cybertruck. Bu...,0.383333,0.200000,positive
1997,". unveils its Cybertruck, with a price startin...",0.500000,0.100000,positive
1998,The cybertruck is the dumbest and best thing I...,0.361905,0.511905,positive


In [101]:
tweets_df['Tweet'][9]

' Sorry, the Cybertruck looks like it was designed with 8-bit software.  fugly'

In [102]:
tweets_df.loc[tweets_df['Sentiment'] == "negative"]

Unnamed: 0,Tweet,Subjectivity,Polarity,Sentiment
1,"Tesla received 146,000 Cybertruck pre-orders i...",0.500000,-0.225000,negative
2,Cybertruck looks like a pinewood derby car cra...,0.214286,-0.071429,negative
7,The Cybertruck would look so sick in Vantablack,0.857143,-0.714286,negative
9,"Sorry, the Cybertruck looks like it was desig...",1.000000,-0.500000,negative
12,Why the Tesla Cybertruck Looks So Weird - WIRE...,1.000000,-0.500000,negative
...,...,...,...,...
1984,Here's why the Tesla Cybertruck has its crazy ...,0.900000,-0.600000,negative
1986,I know I’m a little late to chime in on this C...,0.508333,-0.094271,negative
1987,Here's why the Tesla Cybertruck has its crazy ...,0.900000,-0.600000,negative
1995,Current pick up trucks look ugly as hell any...,0.645833,-0.004167,negative


In my opinion, this would be enough for a business team to take insights. Hand picked tweets that does not match the sentiment sould not invalidate the results. The overall sentiment about cybertruck is positive

In [None]:
def funcao(a,b,c):
    

In [15]:
a = "jose"
b = "aria"

c = [a,b]
sort = sorted(c)
print(sort)
''.join(sort)

['aria', 'jose']


'ariajose'

In [14]:
sorted("jose"+"maria")

['a', 'a', 'e', 'i', 'j', 'm', 'o', 'r', 's']

In [12]:
a = [−1, −3]
def minpositive(a):
    A = set(a)
    ans = 1
    while ans in A:
       ans += 1
    return ans

minpositive(a)

SyntaxError: invalid character in identifier (<ipython-input-12-46ea2ed6f5e1>, line 1)

In [None]:
def solution(A):
    A_sort = A.sort