### Lab 2 : Sentiment Analysis using SentiWordNet

#### Sentiment analysis
Sentiment analysis, the process of determining the emotional tone behind words, can be approached using various techniques. Here are three primary categories: lexicon-based, machine learning-based, and deep learning-based techniques.

**SentiWordNet:** Each word is annotated with positive, negative, and objective scores.

In [2]:
import nltk
nltk.download('wordnet')
nltk.download('sentiwordnet')

[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\bhawa\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package sentiwordnet to
[nltk_data]     C:\Users\bhawa\AppData\Roaming\nltk_data...
[nltk_data]   Package sentiwordnet is already up-to-date!


True

In [5]:
import nltk
from nltk.corpus import wordnet as wn
from nltk.corpus import sentiwordnet as swn
from nltk.tokenize import word_tokenize

Stopwords are common words in a language (like "the", "is", "in", etc.) that are often removed from text data because they don't carry significant meaning or contribute much to the analysis. Removing stopwords is a common preprocessing step in natural language processing (NLP) tasks.

In [6]:
# Ensure necessary NLTK data is downloaded
nltk.download('wordnet')
nltk.download('sentiwordnet')
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\bhawa\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package sentiwordnet to
[nltk_data]     C:\Users\bhawa\AppData\Roaming\nltk_data...
[nltk_data]   Package sentiwordnet is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\bhawa\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\bhawa\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping taggers\averaged_perceptron_tagger.zip.


True

In [7]:
# Function to map NLTK POS tags to WordNet POS tags
def penn_to_wn(tag):
    if tag.startswith('N'):
        return wn.NOUN
    if tag.startswith('V'):
        return wn.VERB
    if tag.startswith('J'):
        return wn.ADJ
    if tag.startswith('R'):
        return wn.ADV
    return None

# Function to get sentiment scores for a text
def get_sentiment_scores(text):
    sentiment = 0.0
    tokens = word_tokenize(text)
    tagged = nltk.pos_tag(tokens)
    for word, tag in tagged:
        wn_tag = penn_to_wn(tag)
        if wn_tag:
            synsets = wn.synsets(word, pos=wn_tag)
            if synsets:
                swn_synset = swn.senti_synset(synsets[0].name())
                sentiment += swn_synset.pos_score() - swn_synset.neg_score()
    return sentiment

# Sample text for sentiment analysis
text = "The movie was fantastic! I really enjoyed it."

# Get sentiment score
sentiment_score = get_sentiment_scores(text)

print(f"Text: {text}")
print(f"SentiWordNet Sentiment Score: {sentiment_score}")


Text: The movie was fantastic! I really enjoyed it.
SentiWordNet Sentiment Score: 1.5


In [8]:
text = "The movie was fantastic! I really enjoyed it."
tokens = word_tokenize(text)

In [9]:
tokens

['The', 'movie', 'was', 'fantastic', '!', 'I', 'really', 'enjoyed', 'it', '.']

In [10]:
tagged = nltk.pos_tag(tokens)
tagged

[('The', 'DT'),
 ('movie', 'NN'),
 ('was', 'VBD'),
 ('fantastic', 'JJ'),
 ('!', '.'),
 ('I', 'PRP'),
 ('really', 'RB'),
 ('enjoyed', 'VBD'),
 ('it', 'PRP'),
 ('.', '.')]

In [12]:
def penn_to_wn(tag):
    if tag.startswith('N'):
        return wn.NOUN
    if tag.startswith('V'):
        return wn.VERB
    if tag.startswith('J'):
        return wn.ADJ
    if tag.startswith('R'):
        return wn.ADV
    return None

In [None]:
for word, tag in tagged:
    if tag.startswith('B-'):

In [14]:
for word, tag in tagged:
        wn_tag = penn_to_wn(tag)
        if wn_tag:
            synsets = wn.synsets(word, pos=wn_tag)
            print(synsets)

[Synset('movie.n.01')]
[Synset('be.v.01'), Synset('be.v.02'), Synset('be.v.03'), Synset('exist.v.01'), Synset('be.v.05'), Synset('equal.v.01'), Synset('constitute.v.01'), Synset('be.v.08'), Synset('embody.v.02'), Synset('be.v.10'), Synset('be.v.11'), Synset('be.v.12'), Synset('cost.v.01')]
[Synset('antic.s.01'), Synset('fantastic.s.02'), Synset('fantastic.s.03'), Synset('fantastic.s.04'), Synset('fantastic.s.05')]
[Synset('truly.r.01'), Synset('actually.r.01'), Synset('in_truth.r.01'), Synset('very.r.01')]
[Synset('enjoy.v.01'), Synset('enjoy.v.02'), Synset('love.v.02'), Synset('enjoy.v.04'), Synset('delight.v.02')]


In [13]:
wn.synsets?

[1;31mSignature:[0m [0mwn[0m[1;33m.[0m[0msynsets[0m[1;33m([0m[0mlemma[0m[1;33m,[0m [0mpos[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m [0mlang[0m[1;33m=[0m[1;34m'eng'[0m[1;33m,[0m [0mcheck_exceptions[0m[1;33m=[0m[1;32mTrue[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Load all synsets with a given lemma and part of speech tag.
If no pos is specified, all synsets for all parts of speech
will be loaded.
If lang is specified, all the synsets associated with the lemma name
of that language will be returned.
[1;31mFile:[0m      c:\users\bhawa\miniconda3\envs\datascience\lib\site-packages\nltk\corpus\reader\wordnet.py
[1;31mType:[0m      method

In [17]:
words = ["good", "honest"]
tagged=nltk.pos_tag(words)
tagged

[('good', 'JJ'), ('honest', 'NN')]

In [20]:
wn_tags=[]
for word, tag in tagged:
    wn_tags.append(penn_to_wn(tag))
wn_tags

['a', 'n']

In [21]:
wn.synsets("good", pos="a")

[Synset('good.a.01'),
 Synset('full.s.06'),
 Synset('good.a.03'),
 Synset('estimable.s.02'),
 Synset('beneficial.s.01'),
 Synset('good.s.06'),
 Synset('good.s.07'),
 Synset('adept.s.01'),
 Synset('good.s.09'),
 Synset('dear.s.02'),
 Synset('dependable.s.04'),
 Synset('good.s.12'),
 Synset('good.s.13'),
 Synset('effective.s.04'),
 Synset('good.s.15'),
 Synset('good.s.16'),
 Synset('good.s.17'),
 Synset('good.s.18'),
 Synset('good.s.19'),
 Synset('good.s.20'),
 Synset('good.s.21')]