<font color="#4b76b7">To start practicing, you will need to make a copy of it. Go to File > Save a Copy in Drive. You can then use the new copy that will appear in the new tab.</font>


# AfterWork Data Science: Getting Started with NLP Project

### Prerequisites

In [40]:
# Importing the required libraries

import pandas as pd # library for data manipulation
import numpy as np  # librariy for scientific computations
import re           # regex library to perform text preprocessing
import string       # library to work with strings
import nltk         # library for natural language processing
import scipy        # scientific computing 
import seaborn as sns # library for data visualisation

# to display all columns
pd.set_option('display.max.columns', None)

# to display the entire contents of a cell
pd.set_option('display.max_colwidth', None)

# Library for Stop words
!pip3 install wordninja
!pip3 install textblob
import wordninja 
from textblob import TextBlob

nltk.download('stopwords')
from nltk.corpus import stopwords
stop = stopwords.words('english')

# Library for Lemmatization
nltk.download('wordnet')
from textblob import Word

# Library for Noun count
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

# Library for TD-IDF
from sklearn.feature_extraction.text import TfidfVectorizer 

# Library for metrics
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


### 1. Importing our Data

In [41]:
# Question: Given a new tweets, create a sentiment analysis model that will 
# predict whether a tweet will contain positive or negative sentiment.
# ---
# Dataset url = https://bit.ly/31kqByD 
# ---
#
df = pd.read_csv('https://bit.ly/31kqByD', encoding='latin-1')
df.head()

Unnamed: 0.1,Unnamed: 0,0,1467810369,Mon Apr 06 22:19:45 PDT 2009,NO_QUERY,_TheSpecialOne_,"@switchfoot http://twitpic.com/2y1zl - Awww, that's a bummer. You shoulda got David Carr of Third Day to do it. ;D"
0,346508,0,2016177685,Wed Jun 03 06:18:50 PDT 2009,NO_QUERY,UriGrey,Obama forges his Muslim alliance against the civilized world - http://tinyurl.com/pqcops . And he didn't even drop in for a cup of tea
1,883537,4,1686152287,Sun May 03 04:02:08 PDT 2009,NO_QUERY,MariesolW,"Had the most spectacular prom ever but now my bed is serenading me and i must answer, sweet dreams my friends what a wonderful day"
2,764173,0,2298725623,Tue Jun 23 12:02:12 PDT 2009,NO_QUERY,ColleenBurns,I am overwhelmed today taking a moment to eat and pray!!!!
3,638701,0,2234530495,Thu Jun 18 23:13:54 PDT 2009,NO_QUERY,queenarchy,@lindork Tres sad. I was totally a Max fan. #SYTYCD
4,664821,0,2244623416,Fri Jun 19 14:59:46 PDT 2009,NO_QUERY,reinventingjess,"Crap, I was counting down the hours until my dad could come home &amp; help me watch my son but now he said is going out to dinner first"


### 2. Data Exploration

In [42]:
# We can determine the size of our dataset
# ---
#
df.shape

(10000, 7)

Seems this dataset will need some data cleaning i.e. columns. We also don't need some columns to perform create our model. We will drop those columns.

### 3. Data Preparation

#### Basic Data Cleaning Techniques

In [43]:
# We rename the columns for ease of referencing our columns later on
# ---
#
df.columns = ['id', 'target', 't_id', 'created_at', 'query', 'user', 'text']
df.head()

Unnamed: 0,id,target,t_id,created_at,query,user,text
0,346508,0,2016177685,Wed Jun 03 06:18:50 PDT 2009,NO_QUERY,UriGrey,Obama forges his Muslim alliance against the civilized world - http://tinyurl.com/pqcops . And he didn't even drop in for a cup of tea
1,883537,4,1686152287,Sun May 03 04:02:08 PDT 2009,NO_QUERY,MariesolW,"Had the most spectacular prom ever but now my bed is serenading me and i must answer, sweet dreams my friends what a wonderful day"
2,764173,0,2298725623,Tue Jun 23 12:02:12 PDT 2009,NO_QUERY,ColleenBurns,I am overwhelmed today taking a moment to eat and pray!!!!
3,638701,0,2234530495,Thu Jun 18 23:13:54 PDT 2009,NO_QUERY,queenarchy,@lindork Tres sad. I was totally a Max fan. #SYTYCD
4,664821,0,2244623416,Fri Jun 19 14:59:46 PDT 2009,NO_QUERY,reinventingjess,"Crap, I was counting down the hours until my dad could come home &amp; help me watch my son but now he said is going out to dinner first"


In [44]:
# We retain the relevant columns by dropping the columns we don't need 
# for creating a sentiment analysis model. 
# ---
#
df = df.drop(['id', 't_id', 'created_at', 'query', 'user'], axis = 1)
df.head()

Unnamed: 0,target,text
0,0,Obama forges his Muslim alliance against the civilized world - http://tinyurl.com/pqcops . And he didn't even drop in for a cup of tea
1,4,"Had the most spectacular prom ever but now my bed is serenading me and i must answer, sweet dreams my friends what a wonderful day"
2,0,I am overwhelmed today taking a moment to eat and pray!!!!
3,0,@lindork Tres sad. I was totally a Max fan. #SYTYCD
4,0,"Crap, I was counting down the hours until my dad could come home &amp; help me watch my son but now he said is going out to dinner first"


In [45]:
# Understanding the distribution of target
# ---
#
df.target.value_counts() 

0    5067
4    4933
Name: target, dtype: int64

In [46]:
# Let's determine whether our columns have the right data types
# ---
#
df.dtypes

target     int64
text      object
dtype: object

In [47]:
# What values are in our target variable?
# ---
#
df.target.unique()

array([0, 4])

These are the two classes to which each document (text) belongs. The target value 0 means a text with a negative sentiment, while that of 4 means a text with a positive sentiment. 

In [48]:
# Let's check for missing values 
# ---
# 
df.isnull().sum()

target    0
text      0
dtype: int64

We don't have any missing values, so we are good to go.

#### Text Processing

In [49]:
# Text Cleaning: Removing all urls/links
# ---
# 
df['text'] =  df['text'].apply(lambda x: re.sub(r'http\S+|www\S+|https\S+','', str(x)))
df[['text']].head()

Unnamed: 0,text
0,Obama forges his Muslim alliance against the civilized world - . And he didn't even drop in for a cup of tea
1,"Had the most spectacular prom ever but now my bed is serenading me and i must answer, sweet dreams my friends what a wonderful day"
2,I am overwhelmed today taking a moment to eat and pray!!!!
3,@lindork Tres sad. I was totally a Max fan. #SYTYCD
4,"Crap, I was counting down the hours until my dad could come home &amp; help me watch my son but now he said is going out to dinner first"


In [50]:
# Text Cleaning: Removing @ and # characters or replace them with space
# ---
df['text'] = df.text.str.replace('#','')
df['text'] = df.text.str.replace('@','')

In [51]:
# Text Cleaning: Conversion to lowercase

df['text'] = df.text.apply(lambda x: " ".join(x.lower() for x in x.split()))
df[['text']].head(10)

Unnamed: 0,text
0,obama forges his muslim alliance against the civilized world - . and he didn't even drop in for a cup of tea
1,"had the most spectacular prom ever but now my bed is serenading me and i must answer, sweet dreams my friends what a wonderful day"
2,i am overwhelmed today taking a moment to eat and pray!!!!
3,lindork tres sad. i was totally a max fan. sytycd
4,"crap, i was counting down the hours until my dad could come home &amp; help me watch my son but now he said is going out to dinner first"
5,"dcbtv dcbtv i had to go check some things, buy others and look for other things"
6,smrorke why are you never on gmail anymore
7,"alex_jeffreys i'd have loved to have come, just a couple of unfortunate things as such held me back!"
8,brrrr ! heading to work.... chilly today
9,gabriiiiella i neeed to talk to youu.. good newsss!!!!


In [52]:
# Text Cleaning: Splitting concatenated words
# ---
# Performing the split
df['text'] = df.text.apply(lambda x: wordninja.split(str(TextBlob(x))))  
df['text'] = df.text.str.join(' ')
df[['text']].head(5) 


Unnamed: 0,text
0,obama forges his muslim alliance against the civilized world and he didn't even drop in for a cup of tea
1,had the most spectacular prom ever but now my bed is serenading me and i must answer sweet dreams my friends what a wonderful day
2,i am overwhelmed today taking a moment to eat and pray
3,lin dork tres sad i was totally a max fan sytycd
4,crap i was counting down the hours until my dad could come home amp help me watch my son but now he said is going out to dinner first


In [53]:
# Text Cleaning: Removing punctuation characters
# ---

df['text'] = df.text.str.replace('[^\w\s]','') 


In [54]:
# Text Cleaning: Removing stop words

df['text'] = df.text.apply(lambda x: " ".join(x for x in x.split() if x not in stop))

# Previewing our dataset

df[['text']].head(10)



Unnamed: 0,text
0,obama forges muslim alliance civilized world didnt even drop cup tea
1,spectacular prom ever bed serenading must answer sweet dreams friends wonderful day
2,overwhelmed today taking moment eat pray
3,lin dork tres sad totally max fan sytycd
4,crap counting hours dad could come home amp help watch son said going dinner first
5,dc b tv dc b tv go check things buy others look things
6,mr ke never gmail anymore
7,alex jeffrey id loved come couple unfortunate things held back
8,br rrr heading work chilly today
9,ga bri iii ella nee ed talk u good new sss


In [55]:
# Text Cleaning: Lemmatization
# ---
# YOUR CODE GOES BELOW
#

# For lemmatization, we will need to download wordnet
#

df['text'] = df.text.apply(lambda x: " ".join([Word(word).lemmatize() for word in x.split()])) 
df.head()


Unnamed: 0,target,text
0,0,obama forge muslim alliance civilized world didnt even drop cup tea
1,4,spectacular prom ever bed serenading must answer sweet dream friend wonderful day
2,0,overwhelmed today taking moment eat pray
3,0,lin dork tres sad totally max fan sytycd
4,0,crap counting hour dad could come home amp help watch son said going dinner first


We won't remove numerics because we could loose meaning of our text if we lost the numerics. We could also further prepare our text by performing spelling correction but this is a resource intensive process that we will skip for now.

#### Feature Engineering Techniques 

In [57]:
# Feature Construction: Length of tweet
# ---

df['length_of_tweet'] = df.text.str.len()



In [58]:
# Feature Construction: Word count 
# ---
df['word_count'] = df.text.apply(lambda x: len(str(x).split(" ")))
df[['text', 'word_count']].sample(5)


Unnamed: 0,text,word_count
5269,well well well really praying good thing wednesday could big break,11
5939,robin taylor roth couldnt tame hair save life lol food could eat fruit day,14
8235,jasmin love sure friend whats stupid friend ter b c uz hc g app wont work,16
5881,gunn fall asleep watching ubisoft conf rence j tv,9
9449,tummy ache start contest soon,5


In [60]:
# Feature Construction: Word density (Average no. of words / tweet)
# ---

def avg_word(sentence):
  words = sentence.split()
  try:
    z = (sum(len(word) for word in words)/len(words))
  except ZeroDivisionError:
    z = 0 
  return z
df['avg_word_length'] = df.text.apply(lambda x: avg_word(x)) 
df.head()

Unnamed: 0,target,text,word_count,length_of_tweet,avg_word_length
0,0,obama forge muslim alliance civilized world didnt even drop cup tea,11,67,5.181818
1,4,spectacular prom ever bed serenading must answer sweet dream friend wonderful day,12,81,5.833333
2,0,overwhelmed today taking moment eat pray,6,40,5.833333
3,0,lin dork tres sad totally max fan sytycd,8,40,4.125
4,0,crap counting hour dad could come home amp help watch son said going dinner first,15,81,4.466667


In [62]:
# Feature Construction: Noun count
# ---
pos_dic = {
    'noun' : ['NN','NNS','NNP','NNPS'],
    'pron' : ['PRP','PRP$','WP','WP$'],
    'verb' : ['VB','VBD','VBG','VBN','VBP','VBZ'],
    'adj' :  ['JJ','JJR','JJS'],
    'adv' : ['RB','RBR','RBS','WRB']
}


# We create the function to check and get the part of speech tag count of a words in a given sentence


def pos_check(x, flag):
    cnt = 0
    try:
        wiki = TextBlob(x)
        for tup in wiki.tags:
            ppo = list(tup)[1]
            if ppo in pos_dic[flag]:
                cnt += 1
    except:
        pass
    return cnt




In [64]:
# Noun Count
# ---
df['noun_count'] = df.text.apply(lambda x: pos_check(x, 'noun'))
df.head()


Unnamed: 0,target,text,word_count,length_of_tweet,avg_word_length,noun_count
0,0,obama forge muslim alliance civilized world didnt even drop cup tea,11,67,5.181818,6
1,4,spectacular prom ever bed serenading must answer sweet dream friend wonderful day,12,81,5.833333,5
2,0,overwhelmed today taking moment eat pray,6,40,5.833333,4
3,0,lin dork tres sad totally max fan sytycd,8,40,4.125,5
4,0,crap counting hour dad could come home amp help watch son said going dinner first,15,81,4.466667,8


In [65]:
# Feature Construction: Verb count


# ---
df['verb_count'] = df.text.apply(lambda x: pos_check(x, 'verb'))
df.head()



Unnamed: 0,target,text,word_count,length_of_tweet,avg_word_length,noun_count,verb_count
0,0,obama forge muslim alliance civilized world didnt even drop cup tea,11,67,5.181818,6,2
1,4,spectacular prom ever bed serenading must answer sweet dream friend wonderful day,12,81,5.833333,5,3
2,0,overwhelmed today taking moment eat pray,6,40,5.833333,4,2
3,0,lin dork tres sad totally max fan sytycd,8,40,4.125,5,1
4,0,crap counting hour dad could come home amp help watch son said going dinner first,15,81,4.466667,8,5


In [68]:
# Feature Construction: Adjective count / Tweet
# ---
# ---
df['Adj_count'] = df.text.apply(lambda x: pos_check(x, 'adj'))
df.head()



Unnamed: 0,target,text,word_count,length_of_tweet,avg_word_length,noun_count,verb_count,Adjective_count,Adv_count,Adj_count
0,0,obama forge muslim alliance civilized world didnt even drop cup tea,11,67,5.181818,6,2,0,2,1
1,4,spectacular prom ever bed serenading must answer sweet dream friend wonderful day,12,81,5.833333,5,3,0,1,2
2,0,overwhelmed today taking moment eat pray,6,40,5.833333,4,2,0,0,0
3,0,lin dork tres sad totally max fan sytycd,8,40,4.125,5,1,0,1,1
4,0,crap counting hour dad could come home amp help watch son said going dinner first,15,81,4.466667,8,5,0,1,0


In [67]:
# Feature Construction: Adverb count / Tweet
# ---
df['Adv_count'] = df.text.apply(lambda x: pos_check(x, 'adv'))
df.head()



Unnamed: 0,target,text,word_count,length_of_tweet,avg_word_length,noun_count,verb_count,Adjective_count,Adv_count
0,0,obama forge muslim alliance civilized world didnt even drop cup tea,11,67,5.181818,6,2,0,2
1,4,spectacular prom ever bed serenading must answer sweet dream friend wonderful day,12,81,5.833333,5,3,0,1
2,0,overwhelmed today taking moment eat pray,6,40,5.833333,4,2,0,0
3,0,lin dork tres sad totally max fan sytycd,8,40,4.125,5,1,0,1
4,0,crap counting hour dad could come home amp help watch son said going dinner first,15,81,4.466667,8,5,0,1


In [69]:
# Feature Construction: Pronoun 
# ---

df['Pron_count'] = df.text.apply(lambda x: pos_check(x, 'pron'))
df.head()


Unnamed: 0,target,text,word_count,length_of_tweet,avg_word_length,noun_count,verb_count,Adjective_count,Adv_count,Adj_count,Pron_count
0,0,obama forge muslim alliance civilized world didnt even drop cup tea,11,67,5.181818,6,2,0,2,1,0
1,4,spectacular prom ever bed serenading must answer sweet dream friend wonderful day,12,81,5.833333,5,3,0,1,2,0
2,0,overwhelmed today taking moment eat pray,6,40,5.833333,4,2,0,0,0,0
3,0,lin dork tres sad totally max fan sytycd,8,40,4.125,5,1,0,1,1,0
4,0,crap counting hour dad could come home amp help watch son said going dinner first,15,81,4.466667,8,5,0,1,0,0


In [70]:
# Feature Construction: Subjectivity
# ---
def get_subjectivity(text):
    textblob = TextBlob(text)
    subj = textblob.sentiment.subjectivity
    return subj

df['subjectivity'] = df.text.apply(get_subjectivity)
df[['text', 'subjectivity']].sample(10)


Unnamed: 0,text,subjectivity
3133,lab risa photo working,0.0
307,panda baggins uhh h maybe vodka brown sugar crushed ice lime house cai pir vka absolutely love btw back uni,0.233333
6513,quarter queen im jealous give sleep,0.0
8203,fix reply jie catering ok thanks,0.35
6892,fear ne cotton fear ne well co ooo l,0.0
9069,dont understand twitter yet,0.0
5941,milk tea life,0.0
6166,damn forgot dawson creek tv earlier,0.5
5830,lol love selling book back guy ooo ooo hot,0.5375
1897,mtv canada sorry guy apparently none text getting night,0.675


In [72]:
# Feature Construction: Polarity
# ---
def get_polarity(text):
    textblob = TextBlob(text)
    pol = textblob.sentiment.polarity
    return pol

df['polarity'] = df.text.apply(get_polarity)
df[['text', 'polarity']].sample(10)


Unnamed: 0,text,polarity
6305,sims 3 keep crashing ooo ooo cool,0.35
3815,im thinking road trip byron bay summer sound like good idea,0.55
7493,forgot pretty shoe vega,0.25
8783,tracey win worry happy mother day x,0.8
7555,enjoying perk jour got ticket africa fashion week,0.5
7860,quite impressed internet still work garden feel like summer,1.0
3360,wish could quite weekend going happen weekend,0.0
515,nine ace picture blank hot riding always feel good live central location biking lucky,0.236616
9450,rockstar seed walking montgomery inn c cy,0.0
9416,m kwik lolol kno basic cable suited fine upgraded,0.405556


In [74]:
# Feature Construction: Word Level N-Gram TF-IDF Feature 


tfidf = TfidfVectorizer(max_features=1000, lowercase=True, analyzer='word', ngram_range=(1,3),  stop_words= 'english')
df_word_vect = tfidf.fit_transform(df.text) 




In [75]:
# Feature Construction: Character Level N-Gram TF-IDF Feature
# ---

tfidf = TfidfVectorizer(max_features=1000, lowercase=True, analyzer='char', ngram_range=(1,3),  stop_words= 'english')
df_char_vect = tfidf.fit_transform(df.text)

In [76]:
#previewing
df.head()

Unnamed: 0,target,text,word_count,length_of_tweet,avg_word_length,noun_count,verb_count,Adjective_count,Adv_count,Adj_count,Pron_count,subjectivity,polarity
0,0,obama forge muslim alliance civilized world didnt even drop cup tea,11,67,5.181818,6,2,0,2,1,0,0.9,0.4
1,4,spectacular prom ever bed serenading must answer sweet dream friend wonderful day,12,81,5.833333,5,3,0,1,2,0,0.85,0.65
2,0,overwhelmed today taking moment eat pray,6,40,5.833333,4,2,0,0,0,0,0.0,0.0
3,0,lin dork tres sad totally max fan sytycd,8,40,4.125,5,1,0,1,1,0,0.875,-0.25
4,0,crap counting hour dad could come home amp help watch son said going dinner first,15,81,4.466667,8,5,0,1,0,0,0.566667,-0.275


In [77]:
# Let's prepare the constructed features for modeling
# ---
#
X_metadata = np.array(df.iloc[:, 2:12])
X_metadata

array([[11.        , 67.        ,  5.18181818, ...,  1.        ,
         0.        ,  0.9       ],
       [12.        , 81.        ,  5.83333333, ...,  2.        ,
         0.        ,  0.85      ],
       [ 6.        , 40.        ,  5.83333333, ...,  0.        ,
         0.        ,  0.        ],
       ...,
       [ 8.        , 45.        ,  4.75      , ...,  2.        ,
         0.        ,  0.83333333],
       [ 6.        , 34.        ,  4.83333333, ...,  2.        ,
         0.        ,  0.56785714],
       [10.        , 44.        ,  3.5       , ...,  0.        ,
         0.        ,  0.6       ]])

In [78]:
# We combine our two tfidf (sparse) matrices and X_metadata
# ---
#
X = scipy.sparse.hstack([df_word_vect, df_char_vect,  X_metadata])
X

<10000x2010 sparse matrix of type '<class 'numpy.float64'>'
	with 944965 stored elements in COOrdinate format>

In [79]:
# Getting our response variable
# ---
#
y = np.array(df.iloc[:, 0])
y

array([0, 4, 0, ..., 0, 4, 0])

### 4. Data Modelling

During this step, we will use machine learning algorithms to train and test our sentiment analysis models.

In [80]:
# Splitting our data
# ---
#
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [81]:
# Fitting our model
# ---
#

# Importing the algorithms
from sklearn.naive_bayes import MultinomialNB 
from sklearn.linear_model import LogisticRegression

nb_classifier = MultinomialNB() 
lr_classifier = LogisticRegression(max_iter=1000) 

# Training our model
nb_classifier.fit(X_train, y_train) 
lr_classifier.fit(X_train, y_train)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


LogisticRegression(max_iter=1000)

In [82]:
# Making predictions
# ---
#
y_predict_nb = nb_classifier.predict(X_test) 
y_predict_lr = lr_classifier.predict(X_test)

In [83]:
# Evaluating the Models
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

# Accuracy scores
# ---
#
print("Naive Bayes Classifier:\n", accuracy_score(y_test, y_predict_nb)) 
print("Logistic Regression Classifier: \n", accuracy_score(y_test, y_predict_lr))

Naive Bayes Classifier:
 0.727
Logistic Regression Classifier: 
 0.733


In [84]:
# Confusion matrices
# ---
# 
print("Naive Bayes Classifier: \n", confusion_matrix(y_test, y_predict_nb)) 
print("Logistic Regression Classifier: \n", confusion_matrix(y_test, y_predict_lr))

Naive Bayes Classifier: 
 [[761 289]
 [257 693]]
Logistic Regression Classifier: 
 [[762 288]
 [246 704]]


In [85]:
# Classification Reports
# ---
#
print("Naive Bayes Classifier: \n", classification_report(y_test, y_predict_nb)) 
print("Logistic Regression Classifier: \n", classification_report(y_test, y_predict_lr))

Naive Bayes Classifier: 
               precision    recall  f1-score   support

           0       0.75      0.72      0.74      1050
           4       0.71      0.73      0.72       950

    accuracy                           0.73      2000
   macro avg       0.73      0.73      0.73      2000
weighted avg       0.73      0.73      0.73      2000

Logistic Regression Classifier: 
               precision    recall  f1-score   support

           0       0.76      0.73      0.74      1050
           4       0.71      0.74      0.73       950

    accuracy                           0.73      2000
   macro avg       0.73      0.73      0.73      2000
weighted avg       0.73      0.73      0.73      2000



**Evaluation our Models**

* **Accuracy:** the percentage of texts that were assigned the correct topic.
* **Precision:** the percentage of texts the classifier classified correctly out of the total number of texts it predicted for each topic
* **Recall:** the percentage of texts the model predicted for each topic out of the total number of texts it should have predicted for that topic.
* **F1 Score:** the average of both precision and recall.

To improve our model, we can try perfoming other text processing techniques that would better prepare our data for fitting our model. We can also use different vectorizing techniques, implement other machine learning models and perform hyperparameter tuning.

### 5. Recommendations


Our best model had an accuracy of 73.25% and use it for classifying newer tweets. We can improve this performance by performing hyperparameter tuning and feature engineering methods. 