## Aspect-Based Sentiment Analysis: Findings from Natural Language
#### Code File \#3: Implementations for our Proposed Models

Tahmeed Tureen - University of Michigan, Ann Arbor<br>
Python file: <b>asba-proposed-model-implementations.ipynb</b> <br>
Description: Code that implements the our proposed models/algorithms for 2014 Task 4 (Pontiki et al.; 2014)

In [1]:
# Load up relevant libraries
import math
import pickle

In [42]:
# Read in Data
train_data = pickle.load(open("pickled_data/pickled_train_data.pkl", "rb"))
print(train_data.shape)
test_data = pickle.load(open("pickled_data/pickled_test_data.pkl", "rb"))
print(test_data.shape)
semEval_test_data = pickle.load(open("pickled_data/semEval_TestData.pkl", "rb"))
print(semEval_test_data.shape)

(3156, 7)
(557, 7)
(1025, 4)


### Phase A: Tasks 1
We will first do aspect term (subtask 1) and aspect category classification (subtask 3) before we perform sentiment analysis

In [43]:
# Load up spaCy for POS Tagging and Dependency Parsing
import spacy
from spacy import displacy

In [44]:
my_spacy = spacy.load("en_core_web_md") # load up spaCy

In [188]:
stopwords = set(
    ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours', 'yourself', 'yourselves',
     'he', 'him', 'his', 'himself', 'she', 'her', 'hers', 'herself', 'it', 'its', 'itself', 'they', 'them', 'their',
     'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', 'these', 'those', 'am', 'is', 'are', 'was',
     'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the',
     'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against',
     'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in',
     'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why',
     'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only',
     'own', 'same', 'so', 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', 'should', 'now'])

In [45]:
# test if my_spacy works first
random_sen_spacy = my_spacy(u"The chinese pasta was great, but the waiter, Jason, was rude! I love Lucy, the tv show")

# list(random_sen_spacy.ents)
# list(random_sen_spacy.sents)
# for word in random_sen_spacy:
#     print(word.text, word.pos_, word.dep_)

Define the F1 score function

In [106]:
def F1_SemEval(predictions, truth):
    # need to calculate precision, recall
    intersect_SnG = 0 # intersection of extractions and truths
    cap_S = 0.0 # set of extractions
    cap_G = 0.0 # set of truths
    
    for i in range(len(predictions)):
        current_pred = predictions[i]
        current_truth = truth[i]
#         print(current_pred)
#         print(current_truth)
        
        # numerator for both precision and recall (number of terms in prediction that is also in the truth)
        intersect_SnG += len([term for term in current_pred if term in current_truth])
        cap_S += len(current_pred)
        cap_G += len(current_truth)
#         print("SnG", intersect_SnG)
#         print("S:", cap_S)
#         print("G", cap_G)
        
    # After loop is over we can now calculate the Precision and Recall values
    prec = float(intersect_SnG) / float(cap_S)
    recall = float(intersect_SnG) / float(cap_G)
    
    f1_score = ( 2.0 * prec * recall ) / (prec + recall)    
    return f1_score, prec, recall

#### Task 1 Approach:
There is no model training associated with this method

- Create a word bank of all of the aspect terms in the training data (of all lower cases)
- Part-Of-Speech (POS) tag the reviews
- Use the POS tagger to only extract nouns
- filter out all nouns that do not belong in the training data

In [159]:
truth_aspTerms = [] # make a vector of the ground truth aspect terms but lower cased

for terms in test_data.Aspect_Term:
    curr_review = []
    
    for term in terms:
        curr_review.append(term.lower())
        
    truth_aspTerms.append(curr_review)

# print(truth_aspTerms[15:20])
test_data.Aspect_Term[15:20]

3171                       [Lassi]
3172                      [coffee]
3173                            []
3174    [wine list, food, service]
3175    [wine list, food, service]
Name: Aspect_Term, dtype: object

#### Lowercases
Convert everything to lower case, let's see if this helps us improve our F1 Score

In [181]:
# Create the word bank
train_aspTermsBank = []
for terms in train_data.Aspect_Term:
    for term in terms:
        train_aspTermsBank.append(term.lower()) # convert to lower case
        
train_aspTermsBank = set(train_aspTermsBank)

len(train_aspTermsBank) # goes down to 1059 when we convert to lower case

1059

In [182]:
# Let's see how much performance goes up/down when we just do this
aspTerms_test = []
for review in test_data.Review:
    review = review.split()
    current_rev = []
    
    for term in review:
        term = term.lower()
        
        if term in train_aspTermsBank:
            current_rev.append(term)
    aspTerms_test.append(current_rev)

In [183]:
F1_SemEval(predictions=aspTerms_test, truth=truth_aspTerms) # goes up by only 1%

(0.5477707006369427, 0.5569948186528497, 0.5388471177944862)

#### Incorporate POS tags for extraction

Use the POS tagger from the spaCy library and strip n-gram Noun objects. Let's see how well this does!

In [184]:
# Let's incorporate spaCy
aspTerms_test = []

for review in test_data.Review:
    asp_TERMS = []
    
    curr_review = review.lower()
    curr_review = my_spacy(curr_review) # convert to spaCy object
    
    for n_chunk in curr_review.noun_chunks:
        asp_term = ''
        for obj in n_chunk:
            # If not a noun, then skip... we aren't interested
            if (obj.pos_ != "NOUN"):
                continue
            if (obj.pos_ == "NOUN"): # we interested
                asp_term += obj.text + " " # add white space incase we have a compound
        
        asp_term = asp_term[:-1] # drop the hanging white space at the end
        
        if (len(asp_term) > 0):
            asp_TERMS.append(asp_term) # append the extracted aspect term to the list of extractions
        
    aspTerms_test.append(asp_TERMS) # append to the full extraction list

In [185]:
aspTerms_test[0:5]

[['evening', 'evening', 'friends'],
 ['lamb meat'],
 ['pad thai'],
 ['time', 'food quality', 'service', 'part', 'reason'],
 ['time', 'food quality', 'service', 'part', 'reason']]

In [224]:
truth_aspTerms[0:5]

[[],
 ['lamb meat'],
 ['pad thai'],
 ['food quality', 'service'],
 ['food quality', 'service']]

In [179]:
F1_SemEval(aspTerms_test, truth_aspTerms)

(0.5777777777777778, 0.4476584022038568, 0.8145363408521303)

Our recall shoots up immensely and our F1 score goes up by approx. 5% than the baseline! It looks like our precision goes down and this may be a consequence of extracting too many words... nouns that don't even belong to any of the aspect categories!

#### Incorporate the trained aspect term words

In [190]:
# Let's incorporate spaCy
aspTerms_test = []

for review in test_data.Review:
    asp_TERMS = []
    
    curr_review = review.lower()
    curr_review = my_spacy(curr_review) # convert to spaCy object
    
    for n_chunk in curr_review.noun_chunks:
        asp_term = ''
        for obj in n_chunk:
            # If not a noun, then skip... we aren't interested
            if (obj.pos_ != "NOUN"):
                continue
            if (obj.pos_ == "NOUN"): # we interested
                asp_term += obj.text + " " # add white space incase we have a compound
        
        asp_term = asp_term[:-1] # drop the hanging white space at the end
        
        if (len(asp_term) > 0) and asp_term in train_aspTermsBank: 
            asp_TERMS.append(asp_term) # append the extracted aspect term to the list of extractions
        
    aspTerms_test.append(asp_TERMS) # append to the full extraction list

In [192]:
F1_SemEval(aspTerms_test, truth_aspTerms)

(0.746971736204576, 0.8066860465116279, 0.6954887218045113)

In [242]:
print(aspTerms_test[5:10])
print(truth_aspTerms[5:10])
"hot tea" in train_aspTermsBank

[['pizza', 'service'], ['pizza', 'service'], ['waiter', 'wine'], ['oil'], ['place', 'food']]
[['pizza', 'service'], ['pizza', 'service'], ['waiter', 'red wine', 'hot tea', 'outside'], ['oil'], ['place', 'food']]


False

Dang! The F1 Score goes up to 74%... this is great but we can probably do better

#### Incorporate word similarities using pre-trained Word Embeddings from Wikipedia

The word embeddings were provided by the SI 630: NLP Instructional Team

In [193]:
import gensim # for reading in Word Embeddings

In [194]:
# Load up word2vec word embeddings from wikipedia
# word2vec_wiki = gensim.models.KeyedVectors.load_word2vec_format('wiki.word2vec.min-100.bin',
#                                                                 binary = False, unicode_errors = 'ignore')
# Pickle this file
# pickle.dump(word2vec_wiki, open("pickled_data/pickled_wordEmbeddings_wiki.pkl", 'wb'))

In [246]:
# Load up Pickled word embeddings
word2vec_wiki = pickle.load(open("pickled_data/pickled_wordEmbeddings_wiki.pkl", 'rb'))

In [260]:
print(word2vec_wiki.n_similarity("pizza", "food"))
print(word2vec_wiki.n_similarity("waiter", "service"))
print(word2vec_wiki.n_similarity("music", "ambience"))
print(word2vec_wiki.n_similarity("bill", "price"))

0.85072654
0.9314064
0.92584974
0.87186205


We're gonna make a simple empirical decision and say that any aspect term that has a word similarity greater than 0.85 with any of the aspect categories can be included into the aspect term extraction... let's see if this helps us improve our F1 score

In [317]:
# Let's incorporate spaCy
aspTerms_test = []
sim_threshold = 0.95

for review in test_data.Review:
    asp_TERMS = []
    
    curr_review = review.lower()
    curr_review = my_spacy(curr_review) # convert to spaCy object
    
    for n_chunk in curr_review.noun_chunks:
        asp_term = ''
        for obj in n_chunk:
            # If not a noun, then skip... we aren't interested
            if (obj.pos_ != "NOUN"):
                continue
            if (obj.pos_ == "NOUN"): # we interested
                asp_term += obj.text + " " # add white space incase we have a compound
        
        asp_term = asp_term[:-1] # drop the hanging white space at the end
        
        if (len(asp_term) > 0) and asp_term in train_aspTermsBank: 
            asp_TERMS.append(asp_term) # append the extracted aspect term to the list of extractions
            
        elif (len(asp_term) > 0):
            asp_term_no_WS = asp_term.replace(" ", "_") # replace white space with underscore o.w. n_similarity won't work
            
            if word2vec_wiki.n_similarity("food", asp_term_no_WS) > sim_threshold:
                asp_TERMS.append(asp_term)
            
            elif word2vec_wiki.n_similarity("service", asp_term_no_WS) > sim_threshold:
                asp_TERMS.append(asp_term)
            
            elif word2vec_wiki.n_similarity("ambience", asp_term_no_WS) > sim_threshold:
                asp_TERMS.append(asp_term)
            
            elif word2vec_wiki.n_similarity("price", asp_term_no_WS) > sim_threshold:
                asp_TERMS.append(asp_term)
                
    aspTerms_test.append(asp_TERMS) # append to the full extraction list

In [318]:
print(F1_SemEval(aspTerms_test, truth_aspTerms), "threshold: 0.98")

(0.6893716970052847, 0.6486187845303868, 0.7355889724310777) threshold: 0.98


In [311]:
print(aspTerms_test[0:5])
print(truth_aspTerms[0:5])

[['evening', 'evening', 'friends'], [], ['pad thai'], ['food quality', 'service'], ['food quality', 'service']]
[[], ['lamb meat'], ['pad thai'], ['food quality', 'service'], ['food quality', 'service']]


Adding the word similarities feature doesn't seem to help too much... we played around with the similarity thresholds... and we haven't beaten the previous score...

### Future Steps

For now, we stop here with this task... there are multiple ways we can go about to improve the score

- Better Word Embeddings
- Dependency Parsing