# Parts of Speech Assessment

For this assessment we'll be using the short story [The Tale of Peter Rabbit](https://en.wikipedia.org/wiki/The_Tale_of_Peter_Rabbit) by Beatrix Potter (1902). <br>The story is in the public domain; the text file was obtained from [Project Gutenberg](https://www.gutenberg.org/ebooks/14838.txt.utf-8).

In [None]:
# RUN THIS CELL to perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')
from spacy import displacy

**1. Create a Doc object from the file `peterrabbit.txt`**<br>
> HINT: Use `with open('../TextFiles/peterrabbit.txt') as f:`

In [None]:
with open("peterrabbit.txt") as f:
    s = f.read()
doc = nlp(s)

In [None]:
# Enter your code here:
for sent_num,sent in enumerate(doc.sents, start = 1):
    if (sent_num == 3):
        for token in sent:
            print(f'{token.text:{10}} {token.pos_:{8}}  {token.tag_:{6}} {spacy.explain(token.tag_)}')
        break



'          VERB      VBP    verb, non-3rd person singular present
Now        ADV       RB     adverb
my         PRON      PRP$   pronoun, possessive
dears      NOUN      NNS    noun, plural
,          PUNCT     ,      punctuation mark, comma
'          PUNCT     ''     closing quotation mark
said       VERB      VBD    verb, past tense
old        ADJ       JJ     adjective (English), other noun-modifier (Chinese)
Mrs.       PROPN     NNP    noun, proper singular
Rabbit     PROPN     NNP    noun, proper singular
one        NUM       CD     cardinal number
morning    NOUN      NN     noun, singular or mass
,          PUNCT     ,      punctuation mark, comma
'          PUNCT     ``     opening quotation mark
you        PRON      PRP    pronoun, personal
may        AUX       MD     verb, modal auxiliary
go         VERB      VB     verb, base form
into       ADP       IN     conjunction, subordinating or preposition

          SPACE     _SP    whitespace
the        DET       DT     determin

**3. Provide a frequency list of POS tags from the entire document**

In [None]:
pos_frequency = doc.count_by(spacy.attrs.POS)
for key, value in pos_frequency.items():
    print(f'{key}.\t{doc.vocab[key].text:{5}}\t{value}')



86.	ADV  	63
98.	SCONJ	19
90.	DET  	89
92.	NOUN 	172
95.	PRON 	110
100.	VERB 	135
93.	NUM  	8
84.	ADJ  	53
97.	PUNCT	167
89.	CCONJ	61
103.	SPACE	97
96.	PROPN	69
85.	ADP  	123
87.	AUX  	49
94.	PART 	28


**4. CHALLENGE: What percentage of tokens are nouns?**<br>
HINT: the attribute ID for 'NOUN' is 91

In [None]:
total_noun_tags = 0
for key, value in pos_frequency.items():
    total_noun_tags += value
percent = pos_frequency[92] * 100 / total_noun_tags
print(f"The percentage of nouns in tokens are : {percent:{5}}")


The percentage of nouns in tokens are : 13.837489943684634


**5. Display the Dependency Parse for the third sentence**

In [None]:
for num,sent in enumerate(doc.sents, start = 1):
    if (num == 3):
        for token in sent:
            print(f'{token.text:{10}} {token.pos_:{8}}  {token.dep_:{6}} {spacy.explain(token.dep_)}')
        displacy.render(sent, style='dep', jupyter=True, options={'distance': 110})
        break

'          VERB      punct  punctuation
Now        ADV       advmod adverbial modifier
my         PRON      poss   possession modifier
dears      NOUN      nsubj  nominal subject
,          PUNCT     punct  punctuation
'          PUNCT     punct  punctuation
said       VERB      ROOT   root
old        ADJ       amod   adjectival modifier
Mrs.       PROPN     compound compound
Rabbit     PROPN     nsubj  nominal subject
one        NUM       nummod numeric modifier
morning    NOUN      npadvmod noun phrase as adverbial modifier
,          PUNCT     punct  punctuation
'          PUNCT     punct  punctuation
you        PRON      nsubj  nominal subject
may        AUX       aux    auxiliary
go         VERB      ccomp  clausal complement
into       ADP       prep   prepositional modifier

          SPACE     dep    unclassified dependent
the        DET       det    determiner
fields     NOUN      pobj   object of preposition
or         CCONJ     cc     coordinating conjunction
down       ADP 

**6. Show the first two named entities from Beatrix Potter's *The Tale of Peter Rabbit* **

In [None]:
named_entity = list(doc.ents)[0:2]
print("The first two named entities are: ")
for i in named_entity:
    print(i)


The first two named entities are: 
four
were--

          Flopsy


**7. How many sentences are contained in *The Tale of Peter Rabbit*?**

In [None]:
sentence_count = len(list(doc.sents))
print("The number of sentences are: ", sentence_count)

The number of sentences are:  54


**8. CHALLENGE: How many sentences contain named entities?**

In [None]:
ner_sentences = 0
for sent in doc.sents:
    if sent.ents:
        ner_sentences += 1
print("The number of sentences containing named entities are: ", ner_sentences)



The number of sentences containing named entities are:  34


**9. CHALLENGE: Display the named entity visualization for `list_of_sents[0]` from the previous problem**

In [None]:
for num,sent in enumerate(doc.sents, start = 1):
    if (num == 1):
        displacy.render(sent, style='ent', jupyter=True, options={'distance': 110})
        break
