# A Brief Introduction to POS Tagging

Identifying the part of speech associated with a particular word is complex, even for humans. Let's talk through how a computer would go about doing this. We have to get really basic. 

In Python there are a few base level units of data. There are more, but to begin let's just look at two:

* Integers
* Strings

In [2]:
# an integer is a whole number that you do number like things to.
4 + 4

8

In [3]:
our_int = 4
our_int + 15

19

In contrast, a string is something we can do word-like things to:

In [4]:
"four".upper()

'FOUR'

In [5]:
our_string = "four"

our_string + our_string
# what happened here?

'fourfour'

Numbers can do numerical things and strings (word-type bits) can do word-type things. You can, of course, go way deeper in Python with data types, but a one more example of things you can do to strings (word-type data):

In [6]:
# we can print each letter out - a string is made up of its constituent pieces.
for letter in our_string:
    print(letter)

f
o
u
r


In [7]:
# is 4 equal to four?
our_int == our_string

False

Python is very literal the number four is not equal to the word four. Similarly, we can see that a word is not equal to its individual letters (we could combine those letters and get a different result.

In [8]:
['f','o','u','r'] == 'four'

False

Given how difficult it is for Python to parse these basic elements - how can it do more complicated things? like recognize the part of speech for a word? for a poem? We don't have to work from scratch - people build on the work of others. We can test out a basic part of speech tagger, but in order to do so we have to feed a series of words (a list) rather than a single word.

In [9]:
import nltk
nltk.pos_tag(["A", "sentence", "made", "of", "words"])

[('A', 'DT'), ('sentence', 'NN'), ('made', 'VBN'), ('of', 'IN'), ('words', 'NNS')]

But how did that work? Time for…

## Pause for a Brief interlude on Nicholas Sparks

Given these basic building blocks, let's take a poem and try to work out how we would tag it convert it into a read out of parts of speech. We'll be about as hand-wave-y as you can possibly be here, only gesturing at what the code does on a macro level. Let's run it on "The Rape of the Lock" by Alexander Pope, a text marked up for parts of speech in the same way by Josephine Miles.

In [10]:
# Take a poem (exists at belief.txt) and read it in

filename = "Pope-RapeoftheLock.txt"

with open(filename, 'r') as filein:
    text = filein.read()

# by default this is going to be the whole text as one long strings (line breaks
# are represented by a \n character)

print(text[0:1000])

Canto I

What dire offence from am'rous causes springs, 
What mighty contests rise from trivial things, 
I sing — This verse to Caryl, Muse! is due: 
This, ev'n Belinda may vouchsafe to view: 
Slight is the subject, but not so the praise,
If She inspire, and He approve my lays. 

Say what strange motive, Goddess! could compel
A well-bred Lord t' assault a gentle Belle?
O say what stranger cause, yet unexplor'd,
Could make a gentle Belle reject a Lord?
In tasks so bold, can little men engage,
And in soft bosoms dwells such mighty Rage? 

Sol thro' white curtains shot a tim'rous ray,
And oped those eyes that must eclipse the day:
Now lap-dogs give themselves the rousing shake,
And sleepless lovers, just at twelve, awake:
Thrice rung the bell, the slipper knock'd the ground,
And the press'd watch return'd a silver sound.
Belinda still her downy pillow prest, 
Her guardian Sylph prolong'd the balmy rest:
'Twas He had summon'd to her silent bed
The morning-dream that hover'd o'er her head;


In [11]:
# tag the poem!

import nltk
nltk.pos_tag(text[0:50])

[('C', 'VB'), ('a', 'DT'), ('n', 'JJ'), ('t', 'NN'), ('o', 'IN'), (' ', 'NN'), ('I', 'PRP'), ('\n', 'VBP'), ('\n', 'JJ'), ('W', 'NNP'), ('h', 'NN'), ('a', 'DT'), ('t', 'NN'), (' ', 'NNP'), ('d', 'NN'), ('i', 'NN'), ('r', 'VBP'), ('e', 'NN'), (' ', 'NNP'), ('o', 'VBZ'), ('f', 'JJ'), ('f', 'JJ'), ('e', 'NN'), ('n', 'JJ'), ('c', 'NN'), ('e', 'NN'), (' ', 'NNP'), ('f', 'NN'), ('r', 'NN'), ('o', 'IN'), ('m', 'NN'), (' ', 'VBP'), ('a', 'DT'), ('m', 'NN'), ("'", 'POS'), ('r', 'NN'), ('o', 'IN'), ('u', 'JJ'), ('s', 'NN'), (' ', 'NNP'), ('c', 'VBZ'), ('a', 'DT'), ('u', 'JJ'), ('s', 'NN'), ('e', 'NN'), ('s', 'NN'), (' ', 'NNP'), ('s', 'NN'), ('p', 'NN'), ('r', 'NN')]

Oops that didn't work. Remember that the POS tagger we're using requires a list of words, and it read our file in as one long string. By default a string is divided into characters - it doesn't know what a "word" is. So we have to break that poem into words.

In [12]:
# break a text into a series of words

words = nltk.word_tokenize(text)
    
words[0:50]

['Canto', 'I', 'What', 'dire', 'offence', 'from', "am'rous", 'causes', 'springs', ',', 'What', 'mighty', 'contests', 'rise', 'from', 'trivial', 'things', ',', 'I', 'sing', '—', 'This', 'verse', 'to', 'Caryl', ',', 'Muse', '!', 'is', 'due', ':', 'This', ',', 'ev', "'", 'n', 'Belinda', 'may', 'vouchsafe', 'to', 'view', ':', 'Slight', 'is', 'the', 'subject', ',', 'but', 'not', 'so']

In [13]:
tag_pairs = nltk.pos_tag(words)

just_tags = []
for tag_pair in tag_pairs:
    just_tags.append(tag_pair[1])
    
' '.join(just_tags)[:1000]

"NNP PRP WP VBD NN IN JJ NNS NNS , WP VBD NNS NN IN JJ NNS , PRP VBP PDT DT NN TO NNP , NNP . VBZ JJ : DT , NN '' JJ NNP MD VB TO VB : NNP VBZ DT NN , CC RB IN DT NN , IN PRP VBP , CC PRP VB PRP$ NNS . VB WP JJ NN , NNP . MD VB DT JJ NNP NN '' NN DT JJ NNP . NNP VBP WP NN NN , CC JJ NN , NNP VB DT JJ NNP VBP DT NNP . IN NNS RB RB , MD VB NNS VB , CC IN JJ NNS NNS JJ JJ NNP . NNP NN '' JJ NNS VBD DT JJ NN , CC VBD DT NNS WDT MD VB DT NN : RB VBP JJ PRP DT VBG NN , CC NN NNS , RB IN NN , NN : NNP VBD DT NN , DT NN NN VBD DT NN , CC DT NN MD VB NN MD DT JJ NN . NNP RB PRP$ NN NN NN , NNP NN NNP RB VBD DT NN NN : CC PRP VBD VBN MD TO PRP$ JJ NN DT NN IN NN MD VB PRP$ NN : NNP NNP RBR NN IN DT JJ NNP , ( IN NN '' NNS IN NN NN MD PRP$ NN TO VB ) NNP MD TO PRP$ VB PRP$ VBG NNS TO VB , CC RB IN NNS VBD , CC VBP MD TO VB . NNP IN NNS , EX JJ MD VB IN NN JJ NNS IN NNP . IN JJ CD NN NN . MD VB JJ NN , IN PDT DT NNP CC PDT DT NNP VBP VBN : IN NN NNS IN NN NNS VBN , DT NN NN , CC DT JJ JJ , CC NNS 

But that is a little unhelpful, because it takes the tags and combines them one long line of text. This is poetry, and we want to respect the lines. Until now we have been arbitraily cutting off our output to make it more legible after the first 1000 characters, 100 words, etc. But, again, that's not super meaningful for poetry. Having things in terms of lines will make it easier to select a smaller unit of the poem.

In [14]:
with open(filename, 'r') as filein:
    lines = filein.readlines()

tokenized_lines = []
for line in lines:
    words = nltk.word_tokenize(line)
    tokenized_lines.append(words)

tokenized_lines[0:20]

[['Canto', 'I'], [], ['What', 'dire', 'offence', 'from', "am'rous", 'causes', 'springs', ','], ['What', 'mighty', 'contests', 'rise', 'from', 'trivial', 'things', ','], ['I', 'sing', '—', 'This', 'verse', 'to', 'Caryl', ',', 'Muse', '!', 'is', 'due', ':'], ['This', ',', 'ev', "'", 'n', 'Belinda', 'may', 'vouchsafe', 'to', 'view', ':'], ['Slight', 'is', 'the', 'subject', ',', 'but', 'not', 'so', 'the', 'praise', ','], ['If', 'She', 'inspire', ',', 'and', 'He', 'approve', 'my', 'lays', '.'], [], ['Say', 'what', 'strange', 'motive', ',', 'Goddess', '!', 'could', 'compel'], ['A', 'well-bred', 'Lord', 't', "'", 'assault', 'a', 'gentle', 'Belle', '?'], ['O', 'say', 'what', 'stranger', 'cause', ',', 'yet', 'unexplor', "'d", ','], ['Could', 'make', 'a', 'gentle', 'Belle', 'reject', 'a', 'Lord', '?'], ['In', 'tasks', 'so', 'bold', ',', 'can', 'little', 'men', 'engage', ','], ['And', 'in', 'soft', 'bosoms', 'dwells', 'such', 'mighty', 'Rage', '?'], [], ['Sol', 'thro', "'", 'white', 'curtains', '

In [15]:
# now that we have the lines as a series of words (or tokens) - let's go through and tag them

tagged_lines = []
for line in tokenized_lines:
    tagged_lines.append(nltk.pos_tag(line))

just_tags_for_lines = []

for line in tagged_lines:
    this_line = []
    for tag_pair in line:
        this_line.append(tag_pair[0] +'/'+ tag_pair[1])
    just_tags_for_lines.append(this_line)

just_tags_for_lines[0:30]

[['Canto/NNP', 'I/PRP'], [], ['What/WP', 'dire/VBD', 'offence/NN', 'from/IN', "am'rous/JJ", 'causes/NNS', 'springs/NNS', ',/,'], ['What/WP', 'mighty/NN', 'contests/NNS', 'rise/VBP', 'from/IN', 'trivial/JJ', 'things/NNS', ',/,'], ['I/PRP', 'sing/VBG', '—/RBR', 'This/DT', 'verse/NN', 'to/TO', 'Caryl/NNP', ',/,', 'Muse/NNP', '!/.', 'is/VBZ', 'due/JJ', ':/:'], ['This/DT', ',/,', 'ev/NN', "'/''", 'n/JJ', 'Belinda/NNP', 'may/MD', 'vouchsafe/VB', 'to/TO', 'view/NN', ':/:'], ['Slight/NNP', 'is/VBZ', 'the/DT', 'subject/NN', ',/,', 'but/CC', 'not/RB', 'so/IN', 'the/DT', 'praise/NN', ',/,'], ['If/IN', 'She/PRP', 'inspire/VBP', ',/,', 'and/CC', 'He/PRP', 'approve/VB', 'my/PRP$', 'lays/NNS', './.'], [], ['Say/NNP', 'what/WP', 'strange/JJ', 'motive/NN', ',/,', 'Goddess/NNP', '!/.', 'could/MD', 'compel/VB'], ['A/DT', 'well-bred/JJ', 'Lord/NNP', 't/NN', "'/''", 'assault/NN', 'a/DT', 'gentle/JJ', 'Belle/NNP', '?/.'], ['O/NNP', 'say/VBP', 'what/WP', 'stranger/NN', 'cause/NN', ',/,', 'yet/CC', 'unexplor/

In [16]:
# but that is kind of gross to read, so let's put things back together as a poem,
# without the brackets, commas, etc. that python requires to run

transformed_poem = []
for line in just_tags_for_lines:
    transformed_poem.append('  '.join(line))
    
    
for line in transformed_poem[0:30]:
    print(line)

Canto/NNP  I/PRP

What/WP  dire/VBD  offence/NN  from/IN  am'rous/JJ  causes/NNS  springs/NNS  ,/,
What/WP  mighty/NN  contests/NNS  rise/VBP  from/IN  trivial/JJ  things/NNS  ,/,
I/PRP  sing/VBG  —/RBR  This/DT  verse/NN  to/TO  Caryl/NNP  ,/,  Muse/NNP  !/.  is/VBZ  due/JJ  :/:
This/DT  ,/,  ev/NN  '/''  n/JJ  Belinda/NNP  may/MD  vouchsafe/VB  to/TO  view/NN  :/:
Slight/NNP  is/VBZ  the/DT  subject/NN  ,/,  but/CC  not/RB  so/IN  the/DT  praise/NN  ,/,
If/IN  She/PRP  inspire/VBP  ,/,  and/CC  He/PRP  approve/VB  my/PRP$  lays/NNS  ./.

Say/NNP  what/WP  strange/JJ  motive/NN  ,/,  Goddess/NNP  !/.  could/MD  compel/VB
A/DT  well-bred/JJ  Lord/NNP  t/NN  '/''  assault/NN  a/DT  gentle/JJ  Belle/NNP  ?/.
O/NNP  say/VBP  what/WP  stranger/NN  cause/NN  ,/,  yet/CC  unexplor/JJ  'd/NN  ,/,
Could/NNP  make/VB  a/DT  gentle/JJ  Belle/NNP  reject/VBP  a/DT  Lord/NNP  ?/.
In/IN  tasks/NNS  so/RB  bold/RB  ,/,  can/MD  little/VB  men/NNS  engage/VB  ,/,
And/CC  in/IN  soft/JJ  bosoms/NNS  d

In [19]:
#  let's make that a function for ease of use and just operate on the first two stanzas:
def nltk_pos_transform(filename):
    """Given a filename, take a poem and transform it into its POS tags"""
    with open(filename, 'r') as filein:
        lines = filein.readlines()

    tokenized_lines = []
    for line in lines:
        words = nltk.word_tokenize(line)
        tokenized_lines.append(words)

    tagged_lines = []
    for line in tokenized_lines:
        tagged_lines.append(nltk.pos_tag(line))

    just_tags_for_lines = []

    for line in tagged_lines[:15]:
        this_line = []
        for tag_pair in line:
            this_line.append(tag_pair[0] + '/' + tag_pair[1])
        just_tags_for_lines.append(this_line)

    # reconstituting them now
    transformed_poem = []
    for line in just_tags_for_lines:
        transformed_poem.append('  '.join(line))


    for line in transformed_poem:
        print(line)

nltk_pos_transform('Pope-RapeoftheLock.txt')


Canto/NNP  I/PRP

What/WP  dire/VBD  offence/NN  from/IN  am'rous/JJ  causes/NNS  springs/NNS  ,/,
What/WP  mighty/NN  contests/NNS  rise/VBP  from/IN  trivial/JJ  things/NNS  ,/,
I/PRP  sing/VBG  —/RBR  This/DT  verse/NN  to/TO  Caryl/NNP  ,/,  Muse/NNP  !/.  is/VBZ  due/JJ  :/:
This/DT  ,/,  ev/NN  '/''  n/JJ  Belinda/NNP  may/MD  vouchsafe/VB  to/TO  view/NN  :/:
Slight/NNP  is/VBZ  the/DT  subject/NN  ,/,  but/CC  not/RB  so/IN  the/DT  praise/NN  ,/,
If/IN  She/PRP  inspire/VBP  ,/,  and/CC  He/PRP  approve/VB  my/PRP$  lays/NNS  ./.

Say/NNP  what/WP  strange/JJ  motive/NN  ,/,  Goddess/NNP  !/.  could/MD  compel/VB
A/DT  well-bred/JJ  Lord/NNP  t/NN  '/''  assault/NN  a/DT  gentle/JJ  Belle/NNP  ?/.
O/NNP  say/VBP  what/WP  stranger/NN  cause/NN  ,/,  yet/CC  unexplor/JJ  'd/NN  ,/,
Could/NNP  make/VB  a/DT  gentle/JJ  Belle/NNP  reject/VBP  a/DT  Lord/NNP  ?/.
In/IN  tasks/NNS  so/RB  bold/RB  ,/,  can/MD  little/VB  men/NNS  engage/VB  ,/,
And/CC  in/IN  soft/JJ  bosoms/NNS  d

In [20]:
import spacy
import en_core_web_sm

# let's do the same thing with spacy

def spacy_pos_transform(filename):
    """Given a filename, take a poem and transform it into its POS tags"""
    nlp = en_core_web_sm.load()
    with open(filename, 'r') as filein:
        lines = filein.readlines()

    spacy_lines = []
    for line in lines[:15]:
        this_line = []
        doc = nlp(line)
        for token in doc:
            if token.tag_ != "_SP":
                this_line.append(token.text + '/' + token.tag_) 
        spacy_lines.append(this_line)
    # reconstituting them now
    transformed_poem = []
    for line in spacy_lines:
        transformed_poem.append('  '.join(line))


    for line in transformed_poem:
        print(line)

spacy_pos_transform('Pope-RapeoftheLock.txt')


Canto/NNP  I/CD

What/WP  dire/JJ  offence/NN  from/IN  am'rous/JJ  causes/NNS  springs/NNS  ,/,
What/WDT  mighty/JJ  contests/NNS  rise/VBP  from/IN  trivial/JJ  things/NNS  ,/,
I/PRP  sing/VBP  —/:  This/DT  verse/NN  to/IN  Caryl/NNP  ,/,  Muse/NNP  !/.  is/VBZ  due/JJ  :/:
This/DT  ,/,  ev'n/NNP  Belinda/NNP  may/MD  vouchsafe/VB  to/TO  view/VB  :/:
Slight/JJ  is/VBZ  the/DT  subject/NN  ,/,  but/CC  not/RB  so/RB  the/DT  praise/NN  ,/,
If/IN  She/PRP  inspire/VBP  ,/,  and/CC  He/PRP  approve/VBP  my/PRP$  lays/NNS  ./.

Say/VB  what/WP  strange/JJ  motive/NN  ,/,  Goddess/NNP  !/.  could/MD  compel/VB
A/DT  well/RB  -/HYPH  bred/VBN  Lord/NNP  t/NNP  '/''  assault/NN  a/DT  gentle/JJ  Belle/NNP  ?/.
O/UH  say/VB  what/WP  stranger/NN  cause/VBP  ,/,  yet/CC  unexplor'd/JJ  ,/,
Could/MD  make/VB  a/DT  gentle/JJ  Belle/NNP  reject/VB  a/DT  Lord/NNP  ?/.
In/IN  tasks/NNS  so/RB  bold/RB  ,/,  can/MD  little/JJ  men/NNS  engage/VB  ,/,
And/CC  in/IN  soft/JJ  bosoms/NNS  dwells/N

Let's compare the two against each other:

In [21]:
print('NLTK transform results:')
nltk_pos_transform('Pope-RapeoftheLock.txt')
print('=========')
print('Spacy transform results:')
spacy_pos_transform('Pope-RapeoftheLock.txt')

NLTK transform results:
Canto/NNP  I/PRP

What/WP  dire/VBD  offence/NN  from/IN  am'rous/JJ  causes/NNS  springs/NNS  ,/,
What/WP  mighty/NN  contests/NNS  rise/VBP  from/IN  trivial/JJ  things/NNS  ,/,
I/PRP  sing/VBG  —/RBR  This/DT  verse/NN  to/TO  Caryl/NNP  ,/,  Muse/NNP  !/.  is/VBZ  due/JJ  :/:
This/DT  ,/,  ev/NN  '/''  n/JJ  Belinda/NNP  may/MD  vouchsafe/VB  to/TO  view/NN  :/:
Slight/NNP  is/VBZ  the/DT  subject/NN  ,/,  but/CC  not/RB  so/IN  the/DT  praise/NN  ,/,
If/IN  She/PRP  inspire/VBP  ,/,  and/CC  He/PRP  approve/VB  my/PRP$  lays/NNS  ./.

Say/NNP  what/WP  strange/JJ  motive/NN  ,/,  Goddess/NNP  !/.  could/MD  compel/VB
A/DT  well-bred/JJ  Lord/NNP  t/NN  '/''  assault/NN  a/DT  gentle/JJ  Belle/NNP  ?/.
O/NNP  say/VBP  what/WP  stranger/NN  cause/NN  ,/,  yet/CC  unexplor/JJ  'd/NN  ,/,
Could/NNP  make/VB  a/DT  gentle/JJ  Belle/NNP  reject/VBP  a/DT  Lord/NNP  ?/.
In/IN  tasks/NNS  so/RB  bold/RB  ,/,  can/MD  little/VB  men/NNS  engage/VB  ,/,
And/CC  in/IN

Can be difficult to compare. Let's make a function that compares the two outputs and gives a 1 if they are the same or a 0 if they are different. And since you might want to upload your text, let's change things slightly. Rather than use an external file, the following code block will just take a long pasted string. So you could paste your own poem from on the web if you'd like!  



In [22]:
our_text = """
Canto I

What dire offence from am'rous causes springs, 
What mighty contests rise from trivial things, 
I sing — This verse to Caryl, Muse! is due: 
This, ev'n Belinda may vouchsafe to view: 
Slight is the subject, but not so the praise,
If She inspire, and He approve my lays. 

Say what strange motive, Goddess! could compel
A well-bred Lord t' assault a gentle Belle?
O say what stranger cause, yet unexplor'd,
Could make a gentle Belle reject a Lord?
In tasks so bold, can little men engage,
And in soft bosoms dwells such mighty Rage? 
"""

def spacy_pos_transform(text):
    """Given a string pasted in, take a poem and transform it into its POS tags"""
    nlp = en_core_web_sm.load()
    lines = text.split('\n')
    spacy_lines = []
    for line in lines:
        this_line = []
        doc = nlp(line)
        for token in doc:
            if token.tag_ != "_SP":
                this_line.append(token.text + '/' + token.tag_) 
        spacy_lines.append(this_line)
    # reconstituting them now
    transformed_poem = []
    for line in spacy_lines:
        transformed_poem.append('  '.join(line))

    return transformed_poem

def nltk_pos_transform(text):
    """Given a string pasted in, take a poem and transform it into its POS tags"""
    lines = text.split('\n')
    tokenized_lines = []
    for line in lines:
        words = nltk.word_tokenize(line)
        tokenized_lines.append(words)
    tagged_lines = []
    for line in tokenized_lines:
        tagged_lines.append(nltk.pos_tag(line))
    just_tags_for_lines = []

    for line in tagged_lines:
        this_line = []
        for tag_pair in line:
            this_line.append(tag_pair[0] + '/' + tag_pair[1])
        just_tags_for_lines.append(this_line)
    # reconstituting them now
    transformed_poem = []
    for line in just_tags_for_lines:
        transformed_poem.append('  '.join(line))
    return transformed_poem

def binary_poem(spacy_text, nltk_text):
    binary_poem = []
    line_counter = 0
    for line in spacy_text:
        this_line = []
        spacy_line = nltk.word_tokenize(line)
        nltk_line = nltk.word_tokenize(nltk_text[line_counter])
        for num, word in enumerate(spacy_line[:-1], start=0):
            try:
                if word == nltk_line[num]:
                    this_line.append(1)
                else:
                    this_line.append(word+ '|' + nltk_line[num].split('/')[1])
            except:
                pass
        binary_poem.append(this_line)
        line_counter += 1
    return binary_poem

spacy_text = spacy_pos_transform(our_text)
nltk_text = nltk_pos_transform(our_text)
binary_poem = binary_poem(spacy_text, nltk_text)

print('NLTK transform results:')
for line in nltk_text:
    print(line)
print('=========')
print('Spacy transform results:')
for line in spacy_text:
    print(line)
print('Comparison legend: 1 is the same tag in both systems. If differen,t it prints the word followed by the spacy tag and then the nltk tag, separated by a |')
for line in binary_poem:
    line = [str(item) for item in line]
    print('  '.join(line))

NLTK transform results:

Canto/NNP  I/PRP

What/WP  dire/VBD  offence/NN  from/IN  am'rous/JJ  causes/NNS  springs/NNS  ,/,
What/WP  mighty/NN  contests/NNS  rise/VBP  from/IN  trivial/JJ  things/NNS  ,/,
I/PRP  sing/VBG  —/RBR  This/DT  verse/NN  to/TO  Caryl/NNP  ,/,  Muse/NNP  !/.  is/VBZ  due/JJ  :/:
This/DT  ,/,  ev/NN  '/''  n/JJ  Belinda/NNP  may/MD  vouchsafe/VB  to/TO  view/NN  :/:
Slight/NNP  is/VBZ  the/DT  subject/NN  ,/,  but/CC  not/RB  so/IN  the/DT  praise/NN  ,/,
If/IN  She/PRP  inspire/VBP  ,/,  and/CC  He/PRP  approve/VB  my/PRP$  lays/NNS  ./.

Say/NNP  what/WP  strange/JJ  motive/NN  ,/,  Goddess/NNP  !/.  could/MD  compel/VB
A/DT  well-bred/JJ  Lord/NNP  t/NN  '/''  assault/NN  a/DT  gentle/JJ  Belle/NNP  ?/.
O/NNP  say/VBP  what/WP  stranger/NN  cause/NN  ,/,  yet/CC  unexplor/JJ  'd/NN  ,/,
Could/NNP  make/VB  a/DT  gentle/JJ  Belle/NNP  reject/VBP  a/DT  Lord/NNP  ?/.
In/IN  tasks/NNS  so/RB  bold/RB  ,/,  can/MD  little/VB  men/NNS  engage/VB  ,/,
And/CC  in/I

In [62]:
our_text = """
Canto I

What dire offence from am'rous causes springs, 
What mighty contests rise from trivial things, 
I sing — This verse to Caryl, Muse! is due: 
This, ev'n Belinda may vouchsafe to view: 
Slight is the subject, but not so the praise,
If She inspire, and He approve my lays. 

Say what strange motive, Goddess! could compel
A well-bred Lord t' assault a gentle Belle?
O say what stranger cause, yet unexplor'd,
Could make a gentle Belle reject a Lord?
In tasks so bold, can little men engage,
And in soft bosoms dwells such mighty Rage? 

Sol thro' white curtains shot a tim'rous ray,
And oped those eyes that must eclipse the day:
Now lap-dogs give themselves the rousing shake,
And sleepless lovers, just at twelve, awake:
Thrice rung the bell, the slipper knock'd the ground,
And the press'd watch return'd a silver sound.
Belinda still her downy pillow prest, 
Her guardian Sylph prolong'd the balmy rest:
'Twas He had summon'd to her silent bed
The morning-dream that hover'd o'er her head;
A Youth more glitt'ring than a Birth-night Beau,
(That ev'n in slumber caus'd her cheek to glow)
Seem'd to her ear his winning lips to lay,
And thus in whispers said, or seem'd to say. 
Fairest of mortals, thou distinguish'd care
Of thousand bright Inhabitants of Air! 
If e'er one vision touch.'d thy infant thought, 
Of all the Nurse and all the Priest have taught;
Of airy Elves by moonlight shadows seen, 
The silver token, and the circled green, 
Or virgins visited by Angel-pow'rs, 
With golden crowns and wreaths of heav'nly flow'rs; 
Hear and believe! thy own importance know,
Nor bound thy narrow views to things below. 
Some secret truths, from learned pride conceal'd,
To Maids alone and Children are reveal'd:
What tho' no credit doubting Wits may give? 
The Fair and Innocent shall still believe.
Know, then, unnumber'd Spirits round thee fly, 
The light Militia of the lower sky: 
These, tho' unseen, are ever on the wing, 
Hang o'er the Box, and hover round the Ring. 
Think what an equipage thou hast in Air,
And view with scorn two Pages and a Chair. 
As now your own, our beings were of old, 
And once inclos'd in Woman's beauteous mould; 
Thence, by a soft transition, we repair 
From earthly Vehicles to these of air.
Think not, when Woman's transient breath is fled 
That all her vanities at once are dead; 
Succeeding vanities she still regards, 
And tho' she plays no more, o'erlooks the cards. 
Her joy in gilded Chariots, when alive,
And love of Ombre, after death survive. 
For when the Fair in all their pride expire, 
To their first Elements their Souls retire: 
The Sprites of fiery Termagants in Flame 
Mount up, and take a Salamander's name.
Soft yielding minds to Water glide away, 
And sip, with Nymphs, their elemental Tea. 
The graver Prude sinks downward to a Gnome, 
In search of mischief still on Earth to roam. 
The light Coquettes in Sylphs aloft repair,
And sport and flutter in the fields of Air.  
"""


def spacy_pos_transform(text):
    """Given a string pasted in, take a poem and transform it into its POS tags"""
    nlp = en_core_web_sm.load()
    lines = text.split('\n')
    spacy_tags_sorted = {}
    for line in lines:
        doc = nlp(line)
        for token in doc:
            if token.tag_ != "_SP":
                if token.tag_ in spacy_tags_sorted:
                    spacy_tags_sorted[token.tag_].append(token.text)
                else:
                    spacy_tags_sorted[token.tag_] = [token.text]

    return spacy_tags_sorted

POS_mapping = {'ADJ': ['CD', 'DT', 'PDT','VBN', 'VBG', 'JJ', 'JJR', 'JJS'], 'NOUN': ['NN', 'NNS', 'NNP', 'NNPS'], 'VB': ['VB']}

spacy_tags_transformed = spacy_pos_transform(our_text)

leftover_tags = list(spacy_tags_transformed.keys())

for key, value in POS_mapping.items():
    print('===============')
    print('Words tagged with variations of ' + key + '\n')
    for item in value:
        if item in spacy_tags_transformed:
            print(item + '  ' + '  '.join(spacy_tags_transformed[item]) + '\n')
            leftover_tags.remove(item)
    print('\n')

print('leftover tags that were not accounted for in the mappings Brad gave me:')
leftover_tags

Words tagged with variations of ADJ

CD  twelve  thousand  one  two

DT  This  This  the  the  A  a  a  a  a  those  the  the  the  the  the  the  a  the  The  A  a  the  the  The  the  Some  no  The  The  the  These  the  the  the  an  a  a  these  all  no  the  the  all  The  a  The  a  The  the

PDT  all  all

VBN  bred  summon'd  taught  seen  circled  visited  bound  learned  reveal'd  inclos'd  fled  gilded

VBG  rousing  doubting  Succeeding  yielding

JJ  dire  am'rous  mighty  trivial  due  Slight  strange  gentle  unexplor'd  gentle  little  soft  such  mighty  white  tim'rous  awake  silver  downy  silent  glitt'ring  bright  airy  golden  own  narrow  secret  Fair  Innocent  round  light  scorn  own  old  beauteous  soft  earthly  transient  dead  alive  first  fiery  Soft  elemental  light

JJR  lower

JJS  Fairest



Words tagged with variations of NOUN

NN  offence  verse  subject  praise  motive  assault  stranger  Rage  thro  ray  day  lap  shake  sleepless  bell  slip

['PRP', 'WP', 'IN', ',', 'WDT', 'VBP', ':', '.', 'VBZ', 'MD', 'TO', 'CC', 'RB', 'PRP$', 'HYPH', "''", 'UH', 'POS', 'VBD', '``', 'RBR', '-LRB-', '-RRB-', 'WRB', 'RP']

If you're interested in digging deeper into the systems each of these tagging systems uses for part of speech:

* NLTK is trained on a wall street journal corpus: https://stackoverflow.com/questions/32016545/how-does-nltk-pos-tag-work/41384824#:~:text=This%20basically%20means%20that%20it,not%20the%20guess%20was%20correct. It actually uses weighted averages.
* More information on POS tagging systems - https://universaldependencies.org/docs/u/pos/

* Spacy uses - OntoNotes Release 5.0 is the final release of the OntoNotes project, a collaborative effort between BBN Technologies, the University of Colorado, the University of Pennsylvania and the University of Southern Californias Information Sciences Institute. The goal of the project was to annotate a large corpus comprising various genres of text (news, conversational telephone speech, weblogs, usenet newsgroups, broadcast, talk shows) in three languages (English, Chinese, and Arabic) with structural information (syntax and predicate argument structure) and shallow semantics (word sense linked to an ontology and coreference).



Discussion Questions:

* How could you imagine context playing a role here?
* What are some other literary applications for POS tagging questions?
* For supervised learning problems?
* What other kinds of research questions are available here?

