$\textbf{Parts-of-Speech Tagging}$
-

- The process of tagging words in a textual input with their appropriate part of speech

\begin{gather}
\begin{bmatrix}
\begin{array}{c|c}
Noun & \text{The name of a person, place, thing, or idea} \\
\hline
Verb & \text{The action or being} \\
\hline
Adjective & \text{This modifies or describes a noun or a pronoun} \\
\hline
Adverb & \text{This modifies or describes a verb, adjective, or another adverb Pronoun - The word to be used in place of a noun} \\
\hline
Preposition & \text{The word placed before a noun or pronoun to form a phrase modifying another word in the sentence} \\
\hline
Conjunction & \text{This joins words, phrases, or clauses Interjection - A word used to express emotion} \\
\end{array}
\end{bmatrix}
\end{gather}

- There is no official list of all the parts of speech that exist.
- POS-tagging isn't always a straightforward task and words have different POS-tags depending on the context. A simple example is the word refuse, where if it used as a verb it means to decline an offer, and when used as a noun it is used to refer to something you throw away or rubbish.
- for identifying the POS-tag in the first place, the context is crucial – it is not possible for us to tag a word with its part of speech unless it is in a sentence or phrase.

$\textbf{The Hidden Markov Model and Other POS Models}$
-

- Hidden Markov Models tend to be used whenever there are sequences present – this turns out to be useful because we can use information about the context of a word to predict what the POS-tag might be.

- Apart from statistical models, there are also rule-based POS-taggers, which uses predefined rules to perform the tagging or learns these rules from the corpus. Of course, these methods do not throw away statistical methods, but just relies on them less.

- There are other more naive methods that you can try out, just to attempt to get a feel of the task we are attempting, such as using a regular expressions to evaluate part of speech or simply storing the most likely tag for a word and tag all future occurrences with the same tag. Part of speech tagging has since moved on quite a bit though, and like most computational tasks which are being completed with high levels of accuracy, it is statistical learning or deep learning that is the way to go.

- One of spaCy's very first POS-taggers was an averaged perceptron. A perceptron used for POS-tagging works by learning the probability of the tag of the word based on various features, or information – these can include the tag of the previous word or the last few letters of the word. By positively rewarding correct classification and punishing incorrect classification, this model learns weights which it uses to predict the tag of the new word. Indeed, most supervised machine learning algorithms function on similar principles, and these are the algorithms that perform well in POS-tagging tasks.

$\textbf{Why POS Tag?}$
-

- POS tags have been used historically in natural language processing for a variety of reasons and purposes. One interesting such purpose is speech-to-text conversion and language translation, which is when a powerful POS-tagger can be used to disambiguate homonyms. Consider this example when a human says: I am going to fish a fish, and wishes this sentence to be translated to another language such as French or Spanish; it is vital to know whether fish here is a noun or a verb – unlike English, it is highly likely that in the target language, the word to describe the act of fishing is quite different from that of the animal.


$\textbf{POS Tag in NLTK}$
-

In [2]:
import nltk

text = nltk.word_tokenize("The quick brown fox jumped over the lazy dog.")
nltk.pos_tag(text)

[('The', 'DT'),
 ('quick', 'JJ'),
 ('brown', 'NN'),
 ('fox', 'NN'),
 ('jumped', 'VBD'),
 ('over', 'IN'),
 ('the', 'DT'),
 ('lazy', 'JJ'),
 ('dog', 'NN'),
 ('.', '.')]

$Definitions$

DT - Determiner

JJ - Adjective

NN - Noun

VBD - Verb

IN - Propositions and Conjunctions

$\textbf{POS Tag in spaCY}$
-

In [4]:
import spacy
nlp = spacy.load("en_core_web_sm")

sent = nlp("The quick brown fox jumped over the lazy dog.")

for token in sent:
    print(token.text, token.pos_, token.tag_)

The DET DT
quick ADJ JJ
brown ADJ JJ
fox NOUN NN
jumped VERB VBD
over ADP IN
the DET DT
lazy ADJ JJ
dog NOUN NN
. PUNCT .
