# Natural Language Processing: Word-level Tasks

Hi everyone! Today, we're exploring natural language processing. We'll be taking a look at word-level NLP problems, features that we can extract at a word-level, and common word-level tasks such as dictionary-based sentiment analysis. 

### WordNet lemmas and stems 

In [None]:
# Import the lemmatizer and stemmer that uses the WordNet lexical resource
from nltk.stem.wordnet import WordNetLemmatizer
from nltk.stem import PorterStemmer, LancasterStemmer
wnl = WordNetLemmatizer()
ps = PorterStemmer()
ls = LancasterStemmer()

In [None]:
# Lemmatize words (uses known grammar rules)
print(wnl.lemmatize('corpus')) # Already in root form
print(wnl.lemmatize('corpora')) # Root is corpus

In [None]:
# Lemmatization might not be good for proper nouns
wnl.lemmatize('Alexas')

In [None]:
# Stem words using PorterStemmer (uses suffix stripping) 
print(ps.stem('corpus')) # Removes the 's'
print(ps.stem('corpora')) # Does not change
print(ps.stem('destabilized'))

In [None]:
# Stemming can be appropriate for finding the root word of proper nouns
ps.stem("Alexas")

In [None]:
# Stem words using LancasterStemmer (uses rules-based algorithm stripping)
print(ls.stem('corpus'))
print(ls.stem('corpora'))
print(ls.stem('destabilized'))

### WordNet synsets

In [None]:
# Retrieve the synset of "plane" from WordNet
from nltk.corpus import wordnet
plane = wordnet.synsets('plane')
plane

In [None]:
# Retrieve the first definition of "plane"
plane[0].definition()

In [None]:
# Use the first definition of "plane" in a sentence
plane[0].examples()

In [None]:
# Get related names to the first definition of "plane"
plane[0].lemma_names()

In [None]:
# Get the part-of-speech of the first definition of "plane"
plane[0].pos()
# 'n' is short for noun

In [None]:
# We can get different synonyms of plane and their corresponding attributes
print(plane[4].definition())
print(plane[4].examples())
print(plane[4].lemma_names())

### WordNet word similarity 

Path similarity denotes the similarity between two word senses based on the shortest path that connects the senses in the is-a (hypernym/hypnoym) taxonomy.

In [None]:
dog = wordnet.synsets('dog')[0]
cat = wordnet.synsets('cat')[0]
terrier = wordnet.synsets('terrier')[0]

In [None]:
# Similarity between "dog" and "cat"
dog.path_similarity(cat)

In [None]:
# Similarity between "dod" and "terrier"
dog.path_similarity(terrier)

### SentiWordNet sentiment analysis

In [None]:
from nltk.corpus import sentiwordnet
word = sentiwordnet.senti_synsets('happy')
word = list(word)[0]
word

In [None]:
print("Positive:", word.pos_score())
print("Negative:", word.neg_score())
print("Objective:", word.obj_score())