# Sentiment Analysis Assessment - Solution

## Task #1: Perform vector arithmetic on your own words
Write code that evaluates vector arithmetic on your own set of related words. The goal is to come as close to an expected word as possible. Please feel free to share success stories in the Q&A Forum for this section!

In [4]:
# Import spaCy and load the language library. Remember to use a larger model!
import spacy
nlp = spacy.load('en_core_web_md')


In [5]:
# Choose the words you wish to compare, and obtain their vectors
token1 = nlp.vocab['Conspiracy'].vector
token2 = nlp.vocab['Government'].vector
token3 = nlp.vocab['Cover-up'].vector
token4 = nlp.vocab['Propaganda'].vector



In [6]:
# Import spatial and define a cosine_similarity function
from scipy import spatial

cosine_similarity = lambda x, y: 1 - spatial.distance.cosine(x, y)

In [8]:
# Write an expression for vector arithmetic
# For example: new_vector = word1 - word2 + word3
new_vector = token1 - token2 + token3 - token4


In [9]:
# List the top ten closest vectors in the vocabulary to the result of the expression above
comp_similarities = []

for word in nlp.vocab:
    if word.has_vector:
        if word.is_lower:
            if word.is_alpha:
                similarity = cosine_similarity(new_vector, word.vector)
                comp_similarities.append((word, similarity))

computed_similarities = sorted(comp_similarities, key=lambda item: -item[1])

print([w[0].text for w in computed_similarities[:10]])

['ai', 'got', 'gon', 'goin', 'na', 'ta', 'nuff', 'ca', 'wo', 'let']


#### CHALLENGE: Write a function that takes in 3 strings, performs a-b+c arithmetic, and returns a top-ten result

In [11]:
def vector_math(a,b,c):
    a = nlp.vocab[a].vector
    b = nlp.vocab[b].vector
    c = nlp.vocab[c].vector
    
    new_vector = a-b+c
    
    similarities = []
    
    for word in nlp.vocab:
        if word.has_vector:
            if word.is_alpha:
                if word.is_lower:
                    similarity = cosine_similarity(word.vector, new_vector)
                    similarities.append((word,similarity))

    similarities = sorted(similarities, key=lambda item: -item[1])
    print([w[0].text for w in similarities[:10]]) 

In [12]:
# Test the function on known words:
vector_math('king','man','woman')

['king', 'and', 'that', 'havin', 'where', 'she', 'they', 'woman', 'somethin', 'there']


## Task #2: Perform VADER Sentiment Analysis on your own review
Write code that returns a set of SentimentIntensityAnalyzer polarity scores based on your own written review.

In [14]:
# Import SentimentIntensityAnalyzer and create an sid object
import nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer

sid = SentimentIntensityAnalyzer()

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\joseg\AppData\Roaming\nltk_data...


In [18]:
# Write a review as one continuous string (multiple sentences are ok)
review = '''Over the years, I have seen this film a total of 6 times. And on each rewatch, 
            I always find something new to pull from it. A new way to look at its themes and its dynamics. 
            While the show is complex and layered what makes it truly impressive is how much it changes 
            depending on who is watching it. I am not the same person 8 years ago which is why this film 
            just keeps getting better the older I get. One day I relate to this character and then in the 
            next day I relate to this other character. One could say that this movie grew up alongside with 
            me and that is indeed the case. I think that is a sign of a great movie. One that changes 
            something inside you after you have watched it. With that, I can honestly say that this 
            movie has implanted itself in my mind and in my spirit. If that is what you are looking for, 
            please do give this film a watch. It might not be that enjoyable or easy to understand on 
            first watch, but it will stick with you leaving you no choice but to think about it for the 
            rest of your life.'''

In [21]:
# Obtain the sid scores for your review
sid_scores = sid.polarity_scores(review)

sid_scores

{'neg': 0.022, 'neu': 0.893, 'pos': 0.084, 'compound': 0.8223}

### CHALLENGE: Write a function that takes in a review and returns a score of "Positive", "Negative" or "Neutral"

In [22]:
def review_rating(string):
    sid_scores
    
    # Determine the sentiment label based on the compound score
    if sid_scores['compound'] >= 0.05:
        return 'Positive'
    elif sid_scores['compound'] <= -0.05:
        return 'Negative'
    else:
        return 'Neutral'
    
    
    
    
    
    

In [23]:
# Test the function on your review above:
review_rating(review)

'Positive'