The NLTK module is a huge toolkit designed to help you with the entire Natural Language Processing (NLP) approach. NLTK will provide you with everything from splitting paragraphs to sentences, splitting words, identifying the part of speech, highlighting themes, and even helping your machine understand what the text is about. In this series, we will address the areas of opinion mining or sentiment analysis.

As we learn how to use NLTK for sentiment analysis, we will learn the following:

Participle — Splits the text body into sentences and words.
Part of speech tagging
Machine learning and naive Bayes classifier
How to use Scikit Learn (sklearn) with NLTK together
Training classifiers with data sets
Real-time streaming sentiment analysis with Twitter.
…and more.

In [2]:
pip install nltk



Next, we need to install some components for NLTK. Open python in any of your usual ways and type:


Select Download All for all packages and click Download. This will give you all the word breakers, blockers, other algorithms and all the corpora. If space is an issue, you can choose to manually download all content. The NLTK module will take up approximately 7MB and the entire nltk_data directory will occupy approximately nltk_data , including your nltk_data , parser and corpus.

If you are running a headless version with VPS, you can install everything by running Python and doing the following:


In [6]:
import nltk
nltk.download()
d (for download)
All (for download everything)



SyntaxError: invalid syntax (<ipython-input-6-08e453590c92>, line 3)

This will download everything for you.

Now that you have all the things you need, let’s type some simple words:

Corpus — The body of the text, singular. Corpora is its plural. Example: A collection of medical journals .
Lexicon — vocabulary and its meaning. For example: English dictionary. However, considering that there are different thesaurus in each field. For example, for financial investors, the first meaning of the word Bull is the person who is confident in the market. Compared with the “common English vocabulary”, the first meaning of the word is animal. Therefore, financial investors, doctors, children, mechanics, etc. all have a special vocabulary.
Token — Each “entity” is part of a rule based split. For example, when a sentence is “split” into words, each word is a tag. If you split a paragraph into sentences, each sentence can also be a marker.
These are the most frequently heard words when entering the natural language processing (NLP) field, but we will cover more words in time. So, let’s show an example of how to split something into tags with the NLTK module.


In [9]:
nltk.download('punkt')


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [10]:
from nltk.tokenize import sent_tokenize, word_tokenize
EXAMPLE_TEXT = "Hello Mr. Smith, how are you doing today? The weather is great, and Python is awesome. The sky is pinkish-blue. You shouldn't eat cardboard."
print(sent_tokenize(EXAMPLE_TEXT))

['Hello Mr. Smith, how are you doing today?', 'The weather is great, and Python is awesome.', 'The sky is pinkish-blue.', "You shouldn't eat cardboard."]


At first, you might think that word segmentation by word or sentence is quite a trivial matter. For many sentences, it may be. The first step might be to execute a simple .split('. ') , or by a period, followed by a space split. Then maybe you will introduce some regular expressions, divided by periods, spaces, and then uppercase letters. The problem is something like Mr. Smith , and there are many other things that will cause you trouble. Segmentation by word is also a challenge, especially when considering abbreviations, such as we and we're . NLTK saves you a lot of time with this seemingly simple but very complicated operation.

The above code will output the sentence, divided into a list of sentences, you can use the for loop to traverse.


So here we create the tags, they are all sentences. Let us divide the words by word this time.

In [11]:
print (word_tokenize(EXAMPLE_TEXT))

['Hello', 'Mr.', 'Smith', ',', 'how', 'are', 'you', 'doing', 'today', '?', 'The', 'weather', 'is', 'great', ',', 'and', 'Python', 'is', 'awesome', '.', 'The', 'sky', 'is', 'pinkish-blue', '.', 'You', 'should', "n't", 'eat', 'cardboard', '.']


There are a few things to note here. First, notice that punctuation is treated as a separate tag. Also, note that the word shouldn't separated into should and n't . The last thing to note is that pinkish-blue is indeed treated as "a word", which is what it is. Cool!

Now, looking at the words after these participles, we must start thinking about what our next step might be. We began to think about how to get the meaning by observing these words. We can think about how to put value on many words, but we also see some words that are basically worthless. This is a form of “stop word” that we can also handle. This is what we will discuss in the next tutorial.

II. NLTK and stop words
The idea of ​​natural language processing is to perform some form of analysis or processing. The machine can at least understand the meaning, representation or suggestion of the text to some extent.

This is obviously a huge challenge, but there are steps that anyone can follow. However, the main idea is that computers don’t understand words directly. It is shocking that humans will not. In humans, memory is broken down into electrical signals in the brain in the form of a neural group that emits patterns. There are still many unknown things about the brain, but the more we break down the human brain into basic elements, we will find the basic elements. Well, it turns out that computers store information in a very similar way! If we want to mimic how humans read and understand text, we need a way that is as close as possible. In general, computers use numbers to represent everything, but we often see the use of binary signals directly in programming ( True or False , which can be directly converted to 1 or 0, directly from the presence of an electrical signal (True, 1) or not Exist (False, 0) ). To do this, we need a way to convert words to numeric or signal patterns. Converting data into something that a computer can understand is called "preprocessing." One of the main forms of preprocessing is to filter out useless data. In natural language processing, useless words (data) are called stop words.

We can immediately realize that some words are more meaningful than others. We can also see that some words are useless and are filled words. For example, we use them in English to fill sentences, so there is no such strange sound. One of the most common, unofficial, useless examples is the word umm . People often use umm to fill, more than other words. This word is meaningless unless we are looking for someone who may lack confidence, confusion, or not much. We all do this, there are... oh... many times, you can hear me say umm or uhh in the video. For most analyses, these words are useless.

We don’t want these words to take up space in our database or take up valuable processing time. Therefore, we call these words “useless words” because they are useless and we want to treat them. Another version of the word “stop word” can be written more: the words we stop at.

For example, if you find words that are often used for satire, you may want to stop immediately. Satirical words or phrases will vary depending on the thesaurus and corpus. For the time being, we will treat stop words as words that do not contain any meaning, and we will remove them.

You can do it easily by storing a list of words that you think are stop words. NLTK uses a bunch of words that they think are stop words to get you started, you can access it through the NLTK corpus:



In [14]:
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


True

In [15]:
from nltk.corpus import stopwords

In [16]:
set (stopwords. words ( 'english' ))

{'a',
 'about',
 'above',
 'after',
 'again',
 'against',
 'ain',
 'all',
 'am',
 'an',
 'and',
 'any',
 'are',
 'aren',
 "aren't",
 'as',
 'at',
 'be',
 'because',
 'been',
 'before',
 'being',
 'below',
 'between',
 'both',
 'but',
 'by',
 'can',
 'couldn',
 "couldn't",
 'd',
 'did',
 'didn',
 "didn't",
 'do',
 'does',
 'doesn',
 "doesn't",
 'doing',
 'don',
 "don't",
 'down',
 'during',
 'each',
 'few',
 'for',
 'from',
 'further',
 'had',
 'hadn',
 "hadn't",
 'has',
 'hasn',
 "hasn't",
 'have',
 'haven',
 "haven't",
 'having',
 'he',
 'her',
 'here',
 'hers',
 'herself',
 'him',
 'himself',
 'his',
 'how',
 'i',
 'if',
 'in',
 'into',
 'is',
 'isn',
 "isn't",
 'it',
 "it's",
 'its',
 'itself',
 'just',
 'll',
 'm',
 'ma',
 'me',
 'mightn',
 "mightn't",
 'more',
 'most',
 'mustn',
 "mustn't",
 'my',
 'myself',
 'needn',
 "needn't",
 'no',
 'nor',
 'not',
 'now',
 'o',
 'of',
 'off',
 'on',
 'once',
 'only',
 'or',
 'other',
 'our',
 'ours',
 'ourselves',
 'out',
 'over',
 'own',
 'r

Here’s how to use the stop_words collection to remove stop words from the text:

In [27]:
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
example_sent = "This is a sample sentence, showing off the stop words filtration."
stop_words = set(stopwords.words( 'english' ))
word_tokens = word_tokenize(example_sent)
filtered_sentence = [w for w in word_tokens if not w in stop_words]
filtered_sentence = []
for w in word_tokens:
   if w not in stop_words:
      filtered_sentence.append(w)
print(word_tokens)
print(filtered_sentence)

['This', 'is', 'a', 'sample', 'sentence', ',', 'showing', 'off', 'the', 'stop', 'words', 'filtration', '.']
['This', 'sample', 'sentence', ',', 'showing', 'stop', 'words', 'filtration', '.']


III. NLTK stem extraction
The concept of stemming is a standardized approach. In addition to the tense, many variations of the word have the same meaning.

The reason we extract the stem is to shorten the time of the search and normalize the sentence.

consider:


I was taking a ride in the car.
I was riding in the car.
These two sentences mean the same thing. in the car (in the car) is the same. I (I) is the same. In both cases, ing clearly expresses the past tense, so in the case of trying to figure out the meaning of this past-style activity, is it really necessary to distinguish between riding and taking a ride ?

No, not.

This is just a small example, but imagine every word in English that can be placed on every possible tense and affix on the word. Each version has a separate dictionary entry that will be very redundant and inefficient, especially since once we convert to numbers, the “value” will be the same.

One of the most popular porcelain extraction algorithms is Porter, which existed in 1979.

First, we have to crawl and define our stems:

In [28]:
from nltk.stem import PorterStemmer
from nltk.tokenize import sent_tokenize, word_tokenize
ps = PorterStemmer()

Now let’s choose some words with similar stems, for example:

In [29]:
example_words = [ "python" , "pythoner" , "pythoning" , "pythoned" , "pythonly" ]

In [30]:
for w in example_words:
   print (ps.stem(w))

python
python
python
python
pythonli


Now let’s try to extract stems from a typical sentence instead of some words:

In [31]:
new_text = "It is important to by very pythonly while you are pythoning with python. All pythoners have pythoned poorly at least once."
words = word_tokenize(new_text)
for w in words :
   print(ps.stem(w))

it
is
import
to
by
veri
pythonli
while
you
are
python
with
python
.
all
python
have
python
poorli
at
least
onc
.


Next, we’ll discuss some of the more advanced content of the NLTK module, part-of-speech tagging, where we can use the NLTK module to identify the part of speech of each word in the sentence.

IV. NLTK part-of-speech tagging
A more powerful aspect of the NLTK module is that it can be used for your part-of-speech tagging. It means to mark words in a sentence as nouns, adjectives, verbs, etc. Even more impressive is that it can also be marked by tense, and others. This is a list of tags, their meaning and some examples:

POS tag list:
 CC coordinating conjunction
 CD cardinal digit
 DT determiner
 EX existential there ( like : "there is" ... think of it like "there exists" )
 FW foreign word
 IN preposition/subordinating conjunction
 JJ adjective 'big'
 JJR adjective, comparative 'bigger'
 JJS adjective, superlative 'biggest'
 LS list marker 1 )
 MD modal could, will
 NN noun, singular 'desk'
 NNS noun plural 'desks'
 NNP proper noun, singular 'Harrison'
 NNPS proper noun, plural 'Americans'
 PDT predeterminer 'all the kids'
 POS possessive ending parent 's
 PRP personal pronoun I, he, she
 PRP$ possessive pronoun my, his, hers
 RB adverb very, silently,
 RBR adverb, comparative better
 RBS adverb, superlative best
 RP particle give up
 TO to go 'to' the store.
 UH interjection errrrrrrrm
 VB verb, base form take
 VBD verb, past tense took
 VBG verb, gerund/present participle taking
 VBN verb, past participle taken
 VBP verb, sing. present, non- 3 d take
 VBZ verb, 3 rd person sing. present takes
 WDT wh-determiner which
 WP wh-pronoun who, what
 WP$ possessive wh-pronoun whose
 WRB wh-abverb where , when
How do we use this? When we deal with it, we have to explain a new sentence marker called PunktSentenceTokenizer . This marker enables unsupervised machine learning, so you can actually train on any text you use. First, let's get some imports that we plan to use:

In [32]:
import nltk
from nltk.corpus import state_union
from nltk.tokenize import PunktSentenceTokenizer

Now let’s create the training and test data:

In [34]:
nltk.download('state_union')

[nltk_data] Downloading package state_union to /root/nltk_data...
[nltk_data]   Unzipping corpora/state_union.zip.


True

In [35]:
train_text = state_union.raw( "2005-GWBush.txt" )
sample_text = state_union.raw( "2006-GWBush.txt" )

One is the State of the Union speech since 2005, and the other is the speech of President George W. Bush since 2006.

Next, we can train the Punkt marker as follows:


In [36]:
custom_sent_tokenizer = PunktSentenceTokenizer(train_text)

In [37]:
tokenized = custom_sent_tokenizer.tokenize(sample_text)

Now we can complete the part-of-speech tagging script by creating a function that will traverse and mark the part of speech of each sentence as follows:


In [39]:
nltk.download('averaged_perceptron_tagger')

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


True

In [40]:
def process_content () :
   try :
      for i in tokenized[: 5 ]:
         words = nltk.word_tokenize(i)
         tagged = nltk.pos_tag(words)
         print(tagged)
   except Exception as e:
        print(str(e))
process_content()

[('PRESIDENT', 'NNP'), ('GEORGE', 'NNP'), ('W.', 'NNP'), ('BUSH', 'NNP'), ("'S", 'POS'), ('ADDRESS', 'NNP'), ('BEFORE', 'IN'), ('A', 'NNP'), ('JOINT', 'NNP'), ('SESSION', 'NNP'), ('OF', 'IN'), ('THE', 'NNP'), ('CONGRESS', 'NNP'), ('ON', 'NNP'), ('THE', 'NNP'), ('STATE', 'NNP'), ('OF', 'IN'), ('THE', 'NNP'), ('UNION', 'NNP'), ('January', 'NNP'), ('31', 'CD'), (',', ','), ('2006', 'CD'), ('THE', 'NNP'), ('PRESIDENT', 'NNP'), (':', ':'), ('Thank', 'NNP'), ('you', 'PRP'), ('all', 'DT'), ('.', '.')]
[('Mr.', 'NNP'), ('Speaker', 'NNP'), (',', ','), ('Vice', 'NNP'), ('President', 'NNP'), ('Cheney', 'NNP'), (',', ','), ('members', 'NNS'), ('of', 'IN'), ('Congress', 'NNP'), (',', ','), ('members', 'NNS'), ('of', 'IN'), ('the', 'DT'), ('Supreme', 'NNP'), ('Court', 'NNP'), ('and', 'CC'), ('diplomatic', 'JJ'), ('corps', 'NN'), (',', ','), ('distinguished', 'JJ'), ('guests', 'NNS'), (',', ','), ('and', 'CC'), ('fellow', 'JJ'), ('citizens', 'NNS'), (':', ':'), ('Today', 'VB'), ('our', 'PRP$'), ('nat

V. NLTK block
Now that we know the part of speech, we can pay attention to the so-called block, which divides the word into meaningful blocks. One of the main goals of blocking is to group so-called “noun phrases”. These are phrases that contain one or more words of a noun, which may be descriptive words, a verb, or an adverb. The idea is to combine nouns and words related to them.

In order to block, we combine the part-of-speech tag with a regular expression. Mainly from regular expressions, we want to take advantage of these things:



+ = match 1 or more
? = match 0 or 1 repetitions.
* = match 0 or MORE repetitions   
. = Any character except a new line

In [45]:
import nltk
from nltk.corpus import state_union
from nltk.tokenize import PunktSentenceTokenizer
train_text = state_union.raw( "2005-GWBush.txt" )
sample_text = state_union.raw( "2006-GWBush.txt" )
custom_sent_tokenizer = PunktSentenceTokenizer(train_text)
tokenized = custom_sent_tokenizer.tokenize(sample_text)
def process_content () :
    try :
        for i in tokenized:
            words = nltk.word_tokenize(i)
            tagged = nltk.pos_tag(words)
            chunkGram = r"""Chunk: {<RB.?>*<VB.?>*<NNP>+<NN>?}"""
            chunkParser = nltk.RegexpParser(chunkGram)
            chunked = chunkParser.parse(tagged)
            chunked.draw()
    except Exception as e:
        print(str(e))
process_content()

no display name and no $DISPLAY environment variable


The main line here is:

In [46]:
chunkGram = r"""Chunk: {<RB.?>*<VB.?>*<NNP>+<NN>?}"""

Split this line apart:

<RB.?>* : Zero or more adverbs of any tense, followed by:

<VB.?>* : Zero or more verbs of any tense, followed by:

<NNP>+ : One or more reasonable nouns, followed by:

<NN>? : Zero or one noun singular.

Try a fun mix to group the various instances until you feel familiar.

Not covered in the video, but there is also a reasonable task to actually access a specific block. This is rarely mentioned, but depending on what you are doing, this can be an important step. Suppose you print out the block and you will see the following output:

( S ( Chunk PRESIDENT/NNP GEORGE/NNP W./NNP BUSH/NNP) 'S/POS ( Chunk ADDRESS/NNP BEFORE/NNP A/NNP JOINT/NNP SESSION/NNP OF/NNP THE/NNP CONGRESS/NNP ON/ NNP THE/NNP STATE/NNP OF/NNP THE/NNP UNION/NNP January/NNP) 31 /CD , /, 2006 /CD THE/DT ( Chunk PRESIDENT/NNP ) :/ : ( Chunk Thank/NNP) you/PRP All/DT ./.)

Cool, this helps us visualize, but what if we want to access this data through our program? So what happens here is that our “blocking” variable is an NLTK tree. Each “block” and “non-block” is the “subtree” of the tree. We can refer to them by something like chunked.subtrees . Then we can iterate through these subtrees like this:

In [47]:
for subtree in chunked.subtrees():
        print (subtree)

NameError: name 'chunked' is not defined

complete code

In [48]:
import nltk
from nltk.corpus import state_union
from nltk.tokenize import PunktSentenceTokenizer
train_text = state_union.raw( "2005-GWBush.txt" )
sample_text = state_union.raw( "2006-GWBush.txt" )
custom_sent_tokenizer = PunktSentenceTokenizer(train_text)
tokenized = custom_sent_tokenizer.tokenize(sample_text)
def process_content () :
    try :
        for i in tokenized:
            words = nltk.word_tokenize(i)
            tagged = nltk.pos_tag(words)
            chunkGram = r"""Chunk: {<RB.?>*<VB.?>*<NNP>+<NN>?}"""
            chunkParser = nltk.RegexpParser(chunkGram)
            chunked = chunkParser.parse(tagged)
            print(chunked)
            for subtree in chunked.subtrees(filter= lambda t: t.label() == 'Chunk' ):
                print(subtree)
            chunked.draw()
    except Exception as e:
        print(str(e))
process_content()

(S
  (Chunk PRESIDENT/NNP GEORGE/NNP W./NNP BUSH/NNP)
  'S/POS
  (Chunk ADDRESS/NNP)
  BEFORE/IN
  (Chunk A/NNP JOINT/NNP SESSION/NNP)
  OF/IN
  (Chunk THE/NNP CONGRESS/NNP ON/NNP THE/NNP STATE/NNP)
  OF/IN
  (Chunk THE/NNP UNION/NNP January/NNP)
  31/CD
  ,/,
  2006/CD
  (Chunk THE/NNP PRESIDENT/NNP)
  :/:
  (Chunk Thank/NNP)
  you/PRP
  all/DT
  ./.)
(Chunk PRESIDENT/NNP GEORGE/NNP W./NNP BUSH/NNP)
(Chunk ADDRESS/NNP)
(Chunk A/NNP JOINT/NNP SESSION/NNP)
(Chunk THE/NNP CONGRESS/NNP ON/NNP THE/NNP STATE/NNP)
(Chunk THE/NNP UNION/NNP January/NNP)
(Chunk THE/NNP PRESIDENT/NNP)
(Chunk Thank/NNP)
no display name and no $DISPLAY environment variable


VI. NLTK adds a gap (Chinking)
You may find that after a lot of chunking, there are some words in your block that you don’t want, but you don’t know how to get rid of them by blocking. You may find that adding a gap is your solution.

Adding a gap is similar to a block, which is basically a way to remove a block from a block. The block you removed from the block is your gap.

The code is very similar, you only need to use }{ to code the gap, behind the block, not the {} block.


In [49]:
import nltk
from nltk.corpus import state_union
from nltk.tokenize import PunktSentenceTokenizer
train_text = state_union.raw( "2005-GWBush.txt" )
sample_text = state_union.raw( "2006-GWBush.txt" )
custom_sent_tokenizer = PunktSentenceTokenizer(train_text)
tokenized = custom_sent_tokenizer.tokenize(sample_text)
def process_content () :
    try :
        for i in tokenized[ 5 :]:
            words = nltk.word_tokenize(i)
            Tagged = nltk.pos_tag(words)
            chunkGram = r"""Chunk: {<.*>+} }<VB.?|IN|DT|TO>+{"""
            chunkParser = nltk.RegexpParser(chunkGram)
            chunked = chunkParser.parse(tagged)
            chunked.draw()
    except Exception as e:
        print(str(e))
process_content()

Illegal chunk pattern: {<.*>+} }<VB.?|IN|DT|TO>+{


Now, the main difference is:

}<VB.?| IN |DT| TO >+{
This means that we want to remove one or more verbs, prepositions, qualifiers or to words from the gap.

Now that we have learned how to perform some custom partitioning and adding gaps, let’s discuss the block form that comes with NLTK, which is named entity recognition.

VII. NLTK named entity recognition
One of the most important forms of blocking in natural language processing is called “named entity recognition.” The idea is to let the machine immediately pull out “entities” such as people, places, things, locations, currencies, and more.

This can be a challenge, but NLTK is built for us. NLTK’s named entity recognition has two main options: identifying all named entities, or identifying named entities as their respective types, such as person, location, location, etc.

This is an example:

In [51]:
nltk.download('maxent_ne_chunker')

[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping chunkers/maxent_ne_chunker.zip.


True

In [53]:
nltk.download('words')

[nltk_data] Downloading package words to /root/nltk_data...
[nltk_data]   Unzipping corpora/words.zip.


True

In [54]:
import nltk
from nltk.corpus import state_union
from nltk.tokenize import PunktSentenceTokenizer
train_text = state_union.raw( "2005-GWBush.txt" )
sample_text = state_union.raw( "2006-GWBush.txt" )
custom_sent_tokenizer = PunktSentenceTokenizer(train_text)
tokenized = custom_sent_tokenizer.tokenize(sample_text)
def process_content () :
    try :
        for i in tokenized[ 5 :]:
            words = nltk.word_tokenize(i)
            tagged = nltk.pos_tag(words)
            namedEnt = nltk.ne_chunk(tagged, binary= True )
            namedEnt.draw()
    except Exception as e:
        print(str(e))
process_content()

no display name and no $DISPLAY environment variable


You can see something right away. When binary is false, it also picks the same thing, but breaks the term White House into White and House as if they are different, and we can see it in the option binary = True , named entity The identification says that White House is part of the same named entity, which is correct.

According to your goal, you can use the binary option. If your binary is false , here is what you can get, the type of named entity:

NE Type and Examples
ORGANIZATION - Georgia -Pacific Corp . , WHO
PERSON - Eddy Bonte, President Obama
LOCATION - Murray River, Mount Everest
DATE - June, 2008 - 06 - 29
TIME - two fifty am, 1 : 30 p . m .
MONEY - 175 million Canadian Dollars, GBP 10.40
PERCENT - twenty pct, 18.75 %
FACILITY - Washington Monument, Stonehenge
GPE - South East Asia, Midlothian
Either way, you may find that you need to do more work to get it right, but this feature is very powerful.

In the next tutorial, we will discuss something similar to stem extraction, called “lemmatizing”.

VIII. NLTK morphological restoration
IX. NLTK Corpus
X. NLTK and Wordnet
XI. NLTK text classification
XII. use NLTK to convert words into features
XIII. NLTK Naive Bayes Classifier
XIV. Save the classifier with NLTK
XV. NLTK and Sklearn
XVI. Using the NLTK combination algorithm
XVII. Investigate bias using NLTK
XVIII. Using NLTK to improve training data for sentiment analysis
XIX. Use NLTK to create modules for sentiment analysis
XX. NLTK Twitter sentiment analysis
XXI. Using NLTK to draw Twitter real-time sentiment analysis
XXII. Stanford NER marker and named entity recognition
XXIII. Testing the accuracy of the NLTK and Stanford NER markers
XXIV. testing the speed of the NLTK and Stanford NER markers
https://medium.datadriveninvestor.com/python-data-science-getting-started-tutorial-nltk-2d8842fedfdd