## Introduction to NLP and spaCy

In [3]:
import spacy

In [7]:
# Load en_core_web_sm and create an nlp object
nlp = spacy.load("en_core_web_sm")

In [8]:
text = "This is a sample text for demonstration."

In [9]:
# Create a Doc container for the text object
doc = nlp(text)
doc

This is a sample text for demonstration.

In [10]:
# Create a list containing the text of each token in the Doc container
print([token for token in doc])

[This, is, a, sample, text, for, demonstration, .]


- ###### Tokenization with spaCy

In [1]:
import spacy

In [3]:
nlp =  spacy.load("en_core_web_sm")

In [4]:
text = "I have bought several of the Vitality canned dog food products and have found them all to be of good quality. The product looks more like a stew than a processed meat and it smells better. My Labrador is finicky and she appreciates this product better than  most."

In [5]:
doc = nlp(text)

In [15]:
tokens = [token for token in doc]

In [16]:
print(tokens)

[I, have, bought, several, of, the, Vitality, canned, dog, food, products, and, have, found, them, all, to, be, of, good, quality, ., The, product, looks, more, like, a, stew, than, a, processed, meat, and, it, smells, better, ., My, Labrador, is, finicky, and, she, appreciates, this, product, better, than,  , most, .]


In [17]:
tokens = [token.text for token in doc]

In [20]:
print(tokens)

['I', 'have', 'bought', 'several', 'of', 'the', 'Vitality', 'canned', 'dog', 'food', 'products', 'and', 'have', 'found', 'them', 'all', 'to', 'be', 'of', 'good', 'quality', '.', 'The', 'product', 'looks', 'more', 'like', 'a', 'stew', 'than', 'a', 'processed', 'meat', 'and', 'it', 'smells', 'better', '.', 'My', 'Labrador', 'is', 'finicky', 'and', 'she', 'appreciates', 'this', 'product', 'better', 'than', ' ', 'most', '.']


- ###### Lemmatization with spaCy

In [1]:
import spacy

In [2]:
nlp = spacy.load("en_core_web_sm")

In [3]:
text = "I have bought several of the Vitality canned dog food products and have found them all to be of good quality. The product looks more like a stew than a processed meat and it smells better. My Labrador is finicky and she appreciates this product better than  most."

In [4]:
doc = nlp(text)

In [5]:
tokens = [token.text for token in doc]

In [6]:
lemmas = [token.lemma_ for token in doc]

In [11]:
for token in range(len(doc)):
    print(tokens[token], " : ", lemmas[token])

I  :  I
have  :  have
bought  :  buy
several  :  several
of  :  of
the  :  the
Vitality  :  Vitality
canned  :  can
dog  :  dog
food  :  food
products  :  product
and  :  and
have  :  have
found  :  find
them  :  they
all  :  all
to  :  to
be  :  be
of  :  of
good  :  good
quality  :  quality
.  :  .
The  :  the
product  :  product
looks  :  look
more  :  more
like  :  like
a  :  a
stew  :  stew
than  :  than
a  :  a
processed  :  process
meat  :  meat
and  :  and
it  :  it
smells  :  smell
better  :  well
.  :  .
My  :  my
Labrador  :  Labrador
is  :  be
finicky  :  finicky
and  :  and
she  :  she
appreciates  :  appreciate
this  :  this
product  :  product
better  :  well
than  :  than
   :   
most  :  most
.  :  .


- ###### Sentence segmentation with spaCy

In [34]:
import spacy

In [35]:
nlp = spacy.load("en_core_web_sm")

In [36]:
texts = ['I have bought several of the Vitality canned dog food products and have found them all to be of good quality. The product looks more like a stew than a processed meat and it smells better. My Labrador is finicky and she appreciates this product better than  most.', 'Product arrived labeled as Jumbo Salted Peanuts...the peanuts were actually small sized unsalted. Not sure if this was an error or if the vendor intended to represent the product as "Jumbo".', 'This is a confection that has been around a few centuries.  It is a light, pillowy citrus gelatin with nuts - in this case Filberts. And it is cut into tiny squares and then liberally coated with powdered sugar.  And it is a tiny mouthful of heaven.  Not too chewy, and very flavorful.  I highly recommend this yummy treat.  If you are familiar with the story of C.S. Lewis\' "The Lion, The Witch, and The Wardrobe" - this is the treat that seduces Edmund into selling out his Brother and Sisters to the Witch.', 'If you are looking for the secret ingredient in Robitussin I believe I have found it.  I got this in addition to the Root Beer Extract I ordered (which was good) and made some cherry soda.  The flavor is very medicinal.', 'Great taffy at a great price.  There was a wide assortment of yummy taffy.  Delivery was very quick.  If your a taffy lover, this is a deal.', 'I got a wild hair for taffy and ordered this five pound bag. The taffy was all very enjoyable with many flavors: watermelon, root beer, melon, peppermint, grape, etc. My only complaint is there was a bit too much red/black licorice-flavored pieces (just not my particular favorites). Between me, my kids, and my husband, this lasted only two weeks! I would recommend this brand of taffy -- it was a delightful treat.', "This saltwater taffy had great flavors and was very soft and chewy.  Each candy was individually wrapped well.  None of the candies were stuck together, which did happen in the expensive version, Fralinger's.  Would highly recommend this candy!  I served it at a beach-themed party and everyone loved it!", 'This taffy is so good.  It is very soft and chewy.  The flavors are amazing.  I would definitely recommend you buying it.  Very satisfying!!', "Right now I'm mostly just sprouting this so my cats can eat the grass. They love it. I rotate it around with Wheatgrass and Rye too", 'This is a very healthy dog food. Good for their digestion. Also good for small puppies. My dog eats her required amount at every feeding.']

In [37]:
documents = [nlp(text) for text in texts]

In [38]:
sentences = []
for doc in documents:
    sentences.append([s for s in doc.sents])

In [40]:
sentences

[[I have bought several of the Vitality canned dog food products and have found them all to be of good quality.,
  The product looks more like a stew than a processed meat and it smells better.,
  My Labrador is finicky and she appreciates this product better than  most.],
 [Product arrived labeled as Jumbo Salted Peanuts...the peanuts were actually small sized unsalted.,
  Not sure if this was an error or if the vendor intended to represent the product as "Jumbo".],
 [This is a confection that has been around a few centuries.  ,
  It is a light, pillowy citrus gelatin with nuts - in this case Filberts.,
  And it is cut into tiny squares and then liberally coated with powdered sugar.  ,
  And it is a tiny mouthful of heaven.  ,
  Not too chewy, and very flavorful.  ,
  I highly recommend this yummy treat.  ,
  If you are familiar with the story of C.S. Lewis' "The Lion, The Witch, and The Wardrobe" - this is the treat that seduces Edmund into selling out his Brother and Sisters to the 

- ###### Linguistic features in spaCy

- ###### POS tagging with spaCy

In [1]:
import spacy

In [2]:
nlp = spacy.load("en_core_web_sm")

In [3]:
texts = ['What is the arrival time in San francisco for the 7:55 AM flight leaving Washington?', 'Cheapest airfare from Tacoma to Orlando is 650 dollars.', 'Round trip fares from Pittsburgh to Philadelphia are under 1000 dollars!']

In [10]:
documents = [nlp(text) for text in texts]

In [11]:
documents

[What is the arrival time in San francisco for the 7:55 AM flight leaving Washington?,
 Cheapest airfare from Tacoma to Orlando is 650 dollars.,
 Round trip fares from Pittsburgh to Philadelphia are under 1000 dollars!]

In [13]:
for token in documents:
    print(token)

What is the arrival time in San francisco for the 7:55 AM flight leaving Washington?
Cheapest airfare from Tacoma to Orlando is 650 dollars.
Round trip fares from Pittsburgh to Philadelphia are under 1000 dollars!


In [16]:
for doc in documents:
    for token in doc:
        print(token, token.pos_)

What PRON
is AUX
the DET
arrival NOUN
time NOUN
in ADP
San PROPN
francisco PROPN
for ADP
the DET
7:55 NUM
AM PROPN
flight NOUN
leaving VERB
Washington PROPN
? PUNCT
Cheapest ADJ
airfare NOUN
from ADP
Tacoma PROPN
to ADP
Orlando PROPN
is AUX
650 NUM
dollars NOUN
. PUNCT
Round ADJ
trip NOUN
fares NOUN
from ADP
Pittsburgh PROPN
to ADP
Philadelphia PROPN
are AUX
under ADP
1000 NUM
dollars NOUN
! PUNCT


- ###### NER with spaCy