<a href="https://colab.research.google.com/github/teddius/TensorFlow_Chatbot_for_20171128_Talk_BotsHub_Meetup_Vienna/blob/master/spacy_introduction_for_pvi_WS2020.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Spacy Introduction for Programming Voice Interfaces WS2020
* Author: [Andreas Rath](https://github.com/teddius) - andreas.rath@ondewo.com

# How to NLP preprocessing text with Spacy, NLTK and Spelling Correction in Python

Two key libraries for NLP with Python:

* [NLTK](https://www.nltk.org/)
* [Spacy](https://spacy.io/)
* [SymSpell](https://github.com/wolfgarbe/SymSpell/)


### Installing Spacy
You can install Spacy with a simple PIP install. 

In [1]:
!pip install spacy tqdm 



You will need to ensure that you've installed a language with Spacy.  If you do not, you will get the following error:

```
OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem 
to be a shortcut link, a Python package or a valid path to a 
data directory.
```

To install English, use the following command:

In [2]:
!python -m spacy download en

[38;5;2m✔ Download and installation successful[0m
You can now load the model via spacy.load('en_core_web_sm')
[38;5;2m✔ Linking successful[0m
/usr/local/lib/python3.6/dist-packages/en_core_web_sm -->
/usr/local/lib/python3.6/dist-packages/spacy/data/en
You can now load the model via spacy.load('en')


In [3]:
# Import Spacy library
import spacy

# load the English web corpus small which we just downloaded 
nlp = spacy.load("en_core_web_sm")


### First step: Tokenization

Tokenization is the task of splitting text into tokens while removing characters, such as punctuation.

Lets try a couple of examples:

In [4]:
doc = nlp(u"My name is Andreas!")
for token in doc:
    print(token.text)

My
name
is
Andreas
!


In [5]:
# Sometimes tokenization is not perfect => check 1. and 2. will be 4 tokens
doc = nlp(u"Hi Peter, can you please check the 1. and 2. example, while I board the plain to the U.K.?")
for token in doc:
    print(token.text)

Hi
Peter
,
can
you
please
check
the
1
.
and
2
.
example
,
while
I
board
the
plain
to
the
U.K.
?


In [6]:
doc = nlp(u"The URL is http://www.gmx.at.")
for token in doc:
    print(token.text)

The
URL
is
http://www.gmx.at
.


### Second step: Stop Words

Stop words are words which are filtered out before or after processing of natural language text. Spacy includes lists of stop words for each langauge you can use right away. 

For intent classification in voice interfaces it is important to remove stop words like "ah", "mhm", "please", "indeed" etc. etc.  


In [7]:
from spacy.lang.en.stop_words import STOP_WORDS

print(STOP_WORDS)

{'again', 'none', 'what', 'whenever', 'anyone', 'one', 'whom', 'all', 'give', '‘re', 'had', 'six', 'themselves', 'you', 'those', 'rather', 'up', 'whoever', 'as', 'made', 'hereupon', 'in', 'always', 'because', 'seeming', 'bottom', 'but', 'top', 'therein', 'off', 'unless', 'who', 'first', 'such', 'several', 'either', 'former', 'might', 'thereupon', 'has', 'herself', 'five', 'thru', 'thereby', 'doing', 'to', 'serious', 'within', 'however', 'keep', 'nowhere', 'there', 'are', 'some', 'how', 'now', 'alone', 'formerly', 'namely', 'n‘t', 'about', 'along', "'m", 'n’t', 'nobody', 'which', 'whatever', 'very', 'she', 'himself', 'various', 'ours', 'since', 'still', 'him', 'well', 'eight', 'twelve', 'whose', 'will', 'beforehand', 'amount', 'they', 'via', 'even', 'least', 'throughout', 'while', 'his', 'why', 'were', 'nothing', 'another', 'anyway', 'forty', 'seems', 'this', 'until', 'these', 'whither', 'cannot', 'put', 'its', 'whence', "'d", 'most', 'own', 'our', 'her', 'whereupon', 'much', 'ever', 'y

In [8]:
doc = nlp(u"I like a red apple please")
for token in doc:
    if token.text not in STOP_WORDS:
      print(token.text)

I
like
red
apple


In [9]:
doc = nlp(u"I like one red apple please")
for token in doc:
    if token.text not in STOP_WORDS:
      print(token.text)

I
like
red
apple


In [10]:
doc = nlp(u"I would very much like a red apple please")
for token in doc:
    if token.text not in STOP_WORDS:
      print(token.text)

I
like
red
apple


**WARNING: Always check your stop word list if ok for your use case**
STOP WORD lists are great but sometimes contain your "entities" or key words you need for your "intent detection" - so you need to make sure that you remove them from the stop word list so you are not loosing them

In [11]:
doc = nlp(u"Place red apple in box")
for token in doc:
    if token.text not in STOP_WORDS:
      print(token.text)

Place
red
apple
box


In [12]:
doc = nlp(u"Place red apple next to box")
for token in doc:
    if token.text not in STOP_WORDS:
      print(token.text)

Place
red
apple
box


In [13]:
doc = nlp(u"Place red apple on top of box")
for token in doc:
    if token.text not in STOP_WORDS:
      print(token.text)

Place
red
apple
box


### Third step: Text Normalization 

#### Stemming


In linguistic morphology and information retrieval, stemming is the process of reducing inflected (or sometimes derived) words to their word stem, base or root form—generally a written word form. 
=> https://en.wikipedia.org/wiki/Stemming 




In [14]:
# Porter Stemmer

import nltk
from nltk.stem.porter import *
stemmer = PorterStemmer()
tokens = ['compute', 'computer', 'computed', 'computing']
for token in tokens:
    print(token + ' --> ' + stemmer.stem(token))

compute --> comput
computer --> comput
computed --> comput
computing --> comput


In [15]:
# Snowball Stemmer => a bit better than PorterStemmer
from nltk.stem.snowball import SnowballStemmer
stemmer = SnowballStemmer(language='english')
tokens = ['compute', 'computer', 'computed', 'computing']
for token in tokens:
    print(token + ' --> ' + stemmer.stem(token))

compute --> comput
computer --> comput
computed --> comput
computing --> comput


#### Lemmatization

Lemmatization is a Natural Language Processing technique that proposes to reduce a word to its Lemma, or Canonical Form => called "dictionary form" 
=> https://en.wikipedia.org/wiki/Lemmatisation

In [16]:
# Lemmatization function
def lemmatize(sentence_list, nlp):
    new_norm=[]
    print("Lemmatizing Sentences")
    for sentence in tqdm(sentence_list):
        new_norm.append(lemmatize_text(sentence, nlp).strip())
    return new_norm

# Lemmatization is language dependent hence we need to pass Spacy "nlp" object 
def lemmatize_text(sentence, nlp):
    sent = ""
    doc = nlp(sentence)
    for token in doc:
        if '@' in token.text:
            sent+=" @MENTION"
        elif '#' in token.text:
            sent+= " #HASHTAG"
        else:
            sent+=" "+token.lemma_
    return sent

In [17]:
lemmatize_text(u"Place red apples on top of two boxes", nlp)

' place red apple on top of two box'

In [18]:
lemmatize_text(u"I went home from his parents", nlp)

' -PRON- go home from -PRON- parent'

#### Spelling correction

Correct the spelling of a word :-)

We will use **SymSpell** spelling correction!

Spelling correction & Fuzzy search: **1 million times faster through Symmetric Delete spelling correction algorithm**

The Symmetric Delete spelling correction algorithm reduces the complexity of edit candidate generation and dictionary lookup for a given Damerau-Levenshtein distance. It is six orders of magnitude faster (than the standard approach with deletes + transposes + replaces + inserts) and language independent.

https://github.com/wolfgarbe/SymSpell


In [19]:
# Symspell 
!pip install symspellpy



In [20]:
from symspellpy.symspellpy import SymSpell, Verbosity
from tqdm.notebook import tqdm
import re, string, json
from itertools import islice
import pkg_resources

max_edit_distance_dictionary= 3 
prefix_length = 4

spellchecker = SymSpell(max_edit_distance_dictionary, prefix_length)

In [21]:
# Load word frequency dictionary 
dictionary_path = pkg_resources.resource_filename(
    "symspellpy", "frequency_dictionary_en_82_765.txt")
spellchecker.load_dictionary(dictionary_path, term_index=0, count_index=1)

# Print out first 5 elements to show dictionary is successfully loaded
print(list(islice(spellchecker.words.items(), 5)))

[('the', 23135851162), ('of', 13151942776), ('and', 12997637966), ('to', 12136980858), ('a', 9081174698)]


**You can also create your own dictionary from plain text file** 

Input text file:
`abc abc-def abc_def abc'def abc qwe qwe1 1qwe q1we 1234 1234`

You can create a dictionary from the file using create_dictionary() as in https://symspellpy.readthedocs.io/en/latest/examples/dictionary.html:

```
from symspellpy import SymSpell

sym_spell = SymSpell()
corpus_path = <path/to/plain/text/file>
sym_spell.create_dictionary(corpus_path)

print(sym_spell.words)
```



In [22]:
# Inspired by article https://towardsdatascience.com/text-normalization-7ecc8e084e31
# and examples taken from https://colab.research.google.com/drive/1U_C_4wAtlWQdaA84yVwHUCdkvQWEd7r9 

def _reduce_exaggerations(text):
    # Auxiliary function to help with exxagerated words.
    # Examples: woooooords -> words,  yaaaaaaaaaaaaaaay -> yay
    correction = str(text)
    return re.sub(r'([\w])\1+', r'\1', correction)

def is_numeric(text):
    for char in text:
        if not (char in "0123456789" or char in ",%.$"):
            return False
    return True

def spell_correction(sentence_list, 
                     max_edit_distance_dictionary= 3,
                     prefix_length = 4):
    # Load word frequency dictionary 
    dictionary_path = pkg_resources.resource_filename(
    "symspellpy", "frequency_dictionary_en_82_765.txt")
    spellchecker.load_dictionary(dictionary_path, term_index=0, count_index=1)
    norm_sents = []
    print("Spell correcting")
    for sentence in tqdm(sentence_list):
        norm_sents.append(spell_correction_text(sentence,
                                                spellchecker,
                                                max_edit_distance_dictionary,
                                                prefix_length))
    return norm_sents

def spell_correction_text(text, 
                          spellchecker, 
                          max_edit_distance_dictionary= 3,
                          prefix_length = 4):
    """
    This function does very simple spell correction normalization using 
    pyspellchecker module. It works over a tokenized sentence and only the 
    token representations are changed.
    """
    if len(text) < 1:
        return ""
    #Spell checker config
    max_edit_distance_lookup = 2
    suggestion_verbosity = Verbosity.TOP # TOP, CLOSEST, ALL
    #End of Spell checker config
    token_list = text.split()
    for word_pos in range(len(token_list)):
        word = token_list[word_pos]
        if word is None:
            token_list[word_pos] = ""
            continue
        if not '\n' in word and word not in string.punctuation and not is_numeric(word) and not (word.lower() in spellchecker.words.keys()):
            suggestions = spellchecker.lookup(word.lower(), suggestion_verbosity, max_edit_distance_lookup)
            #Checks first uppercase to conserve the case.
            upperfirst = word[0].isupper()
            #Checks for correction suggestions.
            if len(suggestions) > 0:
                correction = suggestions[0].term
                replacement = correction
            #We call our _reduce_exaggerations function if no suggestion is found. Maybe there are repeated chars.
            else:
                replacement = _reduce_exaggerations(word)
            #Takes the case back to the word.
            if upperfirst:
                replacement = replacement[0].upper()+replacement[1:]
            word = replacement
            token_list[word_pos] = word
    return " ".join(token_list).strip()

In [23]:
sentence_original="in te dhird qarter oflast jear he had elarned aoubt namials"
sentence_corrected = spell_correction_text(sentence_original,
                                           spellchecker, 
                                           max_edit_distance_dictionary= 10,
                                           prefix_length = 1)
print("Original:  " + sentence_original)
print("Corrected: " + sentence_corrected)

Original:  in te dhird qarter oflast jear he had elarned aoubt namials
Corrected: in the third quarter oblast year he had learned doubt animals


### Fifth step: **Part**-of-speech tagging

You can also obtain the part of speech tag for each word.  Common parts of speech include nouns, verbs, pronouns, and adjectives. => see here a full list of https://spacy.io/api/annotation 

Examples of Universal Part-of-speech tags:
* ADJ	adjective	big, old, green, incomprehensible, first
* ADP	adposition	in, to, during
* ADV	adverb	very, tomorrow, down, where, there
* AUX	auxiliary	is, has (done), will (do), should (do)
* CONJ	conjunction	and, or, but
* CCONJ	coordinating conjunction	and, or, but
* DET	determiner	a, an, the
* INTJ	interjection	psst, ouch, bravo, hello
* NOUN	noun	girl, cat, tree, air, beauty
* NUM	numeral	1, 2017, one, seventy-seven, IV, MMXIV
* PART	particle	’s, not,
* PRON	pronoun	I, you, he, she, myself, themselves, somebody
* PROPN	proper noun	Mary, John, London, NATO, HBO
* PUNCT	punctuation	., (, ), ?
* SCONJ	subordinating conjunction	if, while, that
* SYM	symbol	$, %, §, ©, +, −, ×, ÷, =, :), 😝
* VERB	verb	run, runs, running, eat, ate, eating
* X	other	sfpksdpsxmsa
* SPACE	space	

In [24]:
doc = nlp(u"My name is Andreas.")
for word in doc:  
    print(word.text,  word.pos_)

My DET
name NOUN
is AUX
Andreas PROPN
. PUNCT


In [25]:
doc = nlp(u"Peter has two cats and one dog!")
for word in doc:  
    print(word.text,  word.pos_)

Peter PROPN
has AUX
two NUM
cats NOUN
and CCONJ
one NUM
dog NOUN
! PUNCT


In [26]:
doc = nlp(u"Peter (peter@coolguy.com, www.github.com/peter) has two cats and 1 dog!")
for word in doc:
      if not word.pos_  in ("PUNCT", "CCONJ"):
        print(word.text,  word.pos_)

Peter PROPN
peter@coolguy.com X
www.github.com/peter PROPN
has AUX
two NUM
cats NOUN
1 NUM
dog NOUN


Spacy includes functions to check if parts of a sentence appear to be numbers, acronyms, or other entities.

In [27]:
# Print our Spacy doc
doc = nlp(u"Peter (peter@coolguy.com, https://github.com/peter) has two cats and 1 dog!")
print(f"---\n Spacy doc: '{doc}''")
for word in doc:
    print(f"---\n Word: '{word}''")
    print(f"'{word}' is like number? {word.like_num}")
    print(f"'{word}' is like email? {word.like_email}")
    print(f"'{word}' is like url? {word.like_url}")

---
 Spacy doc: 'Peter (peter@coolguy.com, https://github.com/peter) has two cats and 1 dog!''
---
 Word: 'Peter''
'Peter' is like number? False
'Peter' is like email? False
'Peter' is like url? False
---
 Word: '(''
'(' is like number? False
'(' is like email? False
'(' is like url? False
---
 Word: 'peter@coolguy.com''
'peter@coolguy.com' is like number? False
'peter@coolguy.com' is like email? True
'peter@coolguy.com' is like url? False
---
 Word: ',''
',' is like number? False
',' is like email? False
',' is like url? False
---
 Word: 'https://github.com/peter''
'https://github.com/peter' is like number? False
'https://github.com/peter' is like email? False
'https://github.com/peter' is like url? True
---
 Word: ')''
')' is like number? False
')' is like email? False
')' is like url? False
---
 Word: 'has''
'has' is like number? False
'has' is like email? False
'has' is like url? False
---
 Word: 'two''
'two' is like number? True
'two' is like email? False
'two' is like url? False


# Named Entity Recognition (NER) with Spacy

Named-entity recognition (NER) (also known as (named) entity identification, entity chunking, and entity extraction) is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.


## Rule-based matching



In [28]:
import spacy

# Import the Matcher
from spacy.matcher import Matcher

# Load a model and create the nlp object
nlp = spacy.load("en_core_web_sm")

# Initialize the matcher with the shared vocab
matcher = Matcher(nlp.vocab)

# Add the pattern to the matcher
pattern_1 = [
    {"TEXT": "iPhone"}, 
    {"POS": "NUM", "OP": "1"}
    # Example	Description
    # {"OP": "!"}	Negation: match 0 times
    # {"OP": "?"}	Optional: match 0 or 1 times
    # {"OP": "+"}	Match 1 or more times
    # {"OP": "*"}	Match 0 or more times
]
pattern_2 = [
    {"POS": "NUM", "OP": "1"},
    {"TEXT": "EUR"}
]

matcher.add("PATTERN_1", None, pattern_1)
matcher.add("PATTERN_2", None, pattern_2)

# Process some text
doc = nlp("Upcoming iPhone 5 230 EUR with release date is 23.12.2020")

# Call the matcher on the doc
matches = matcher(doc)

# Iterate over the matches
for match_id, start, end in matches:
    # Get the matched span
    matched_span = doc[start:end]
    print(matched_span.text)

iPhone 5
230 EUR


## Machine-learning based matching

Spacy comes out of the box with pre-train named entity recognition models for organizations, countries, date, money, books, persons, etc. 

=> pretty cool, have a look https://spacy.io/api/annotation#named-entities 
```
TYPE        DESCRIPTION
PERSON      People, including fictional.
NORP        Nationalities or religious or political groups.
FAC         Buildings, airports, highways, bridges, etc.
ORG         Companies, agencies, institutions, etc.
GPE         Countries, cities, states.
LOC         Non-GPE locations, mountain ranges, bodies of water.
PRODUCT     Objects, vehicles, foods, etc. (Not services.)
EVENT       Named hurricanes, battles, wars, sports events, etc.
WORK_OF_ART Titles of books, songs, etc.
LAW         Named documents made into laws.
LANGUAGE    Any named language.
DATE        Absolute or relative dates or periods.
TIME        Times smaller than a day.
PERCENT     Percentage, including ”%“.
MONEY       Monetary values, including unit.
QUANTITY    Measurements, as of weight or distance.
ORDINAL     “first”, “second”, etc.
CARDINAL    Numerals that do not fall under another type.
```



In [29]:
import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("Apple wants to buy two U.K. startup in London headed by German Peter Miller for €1 billion next year")

for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

Apple 0 5 ORG
two 19 22 CARDINAL
U.K. 23 27 GPE
London 39 45 GPE
German 56 62 NORP
Peter Miller 63 75 PERSON
€1 billion 80 90 MONEY
next year 91 100 DATE


In [30]:
from spacy import displacy
from IPython.core.display import display, HTML
html = displacy.render(doc, style="ent")
display(HTML(html))

In [31]:
from spacy import displacy
displacy.render(doc, style='dep', jupyter = True, options = {'distance': 120})


Spacy allows you also to train your own NER pipeline based on your domain vocabulary e.g. voice user interface, chatbot, voicebot etc. https://spacy.io/api/annotation#named-entities

In [32]:
TRAIN_DATA = [('what is the price of polo?', {'entities': [(21, 25, 'PrdName')]}), 
              ('what is the price of ball?', {'entities': [(21, 25, 'PrdName')]}), 
              ('what is the price of jegging?', {'entities': [(21, 28, 'PrdName')]}), 
              ('what is the price of t-shirt?', {'entities': [(21, 28, 'PrdName')]}), 
              ('what is the price of jeans?', {'entities': [(21, 26, 'PrdName')]}), 
              ('what is the price of bat?', {'entities': [(21, 24, 'PrdName')]}), 
              ('what is the price of shirt?', {'entities': [(21, 26, 'PrdName')]}), 
              ('what is the price of bag?', {'entities': [(21, 24, 'PrdName')]}), 
              ('what is the price of cup?', {'entities': [(21, 24, 'PrdName')]}), 
              ('what is the price of jug?', {'entities': [(21, 24, 'PrdName')]}), 
              ('what is the price of plate?', {'entities': [(21, 26, 'PrdName')]}), 
              ('what is the price of glass?', {'entities': [(21, 26, 'PrdName')]}), 
              ('what is the price of moniter?', {'entities': [(21, 28, 'PrdName')]}), 
              ('what is the price of desktop?', {'entities': [(21, 28, 'PrdName')]}), 
              ('what is the price of bottle?', {'entities': [(21, 27, 'PrdName')]}), 
              ('what is the price of mouse?', {'entities': [(21, 26, 'PrdName')]}), 
              ('what is the price of keyboad?', {'entities': [(21, 28, 'PrdName')]}), 
              ('what is the price of chair?', {'entities': [(21, 26, 'PrdName')]}), 
              ('what is the price of table?', {'entities': [(21, 26, 'PrdName')]}), 
              ('what is the price of watch?', {'entities': [(21, 26, 'PrdName')]})]

In [33]:
import random

def train_spacy(data, iterations, nlp=spacy.blank('en')) :

    # spacy.blank('en') creates blank Language class
    training_data = data
    # create the built-in pipeline components and add them to the pipeline
    # nlp.create_pipe works for built-ins that are registered with spaCy
    if 'ner' not in nlp.pipe_names:
        print("Create new NER pipe...")
        ner = nlp.create_pipe('ner')
        nlp.add_pipe(ner, last=True)
    else:
        print("Get exiting NER pipe...")
        ner = nlp.get_pipe("ner")   

    # add labels
    for _, annotations in training_data:
         for ent in annotations.get('entities'):
            ner.add_label(ent[2])

    # get names of other pipes to disable them during training
    other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
    with nlp.disable_pipes(*other_pipes):  # only train NER
        optimizer = nlp.begin_training()
        for itn in range(iterations):
            print("Statring iteration " + str(itn))
            random.shuffle(training_data)
            losses = {}
            for text, annotations in training_data:
                nlp.update(
                    [text],  # batch of texts
                    [annotations],  # batch of annotations
                    drop=0.35,  # dropout - make it harder to memorise data
                    sgd=optimizer,  # callable to update weights
                    losses=losses)
            print(losses)
    return nlp

In [34]:
# Train our custom spacy model

# specify the training data and the number of training iteration
training_iterations=40
nlp_blank_model = spacy.blank('en') # creates blank Language class
nlp = train_spacy(TRAIN_DATA, training_iterations, nlp_blank_model)

Create new NER pipe...
Statring iteration 0
{'ner': 56.68599391818498}
Statring iteration 1
{'ner': 5.28018560021256}
Statring iteration 2
{'ner': 2.081231088360586}
Statring iteration 3
{'ner': 1.9855227574135828}
Statring iteration 4
{'ner': 1.8048718859922444}
Statring iteration 5
{'ner': 1.5616133593668349}
Statring iteration 6
{'ner': 6.320991190412769}
Statring iteration 7
{'ner': 0.9403462220264008}
Statring iteration 8
{'ner': 0.9812280102499094}
Statring iteration 9
{'ner': 2.151801657317973}
Statring iteration 10
{'ner': 2.440918168532199}
Statring iteration 11
{'ner': 1.754771351208338}
Statring iteration 12
{'ner': 2.2679379770854853}
Statring iteration 13
{'ner': 1.8323523977110472}
Statring iteration 14
{'ner': 1.2504291374232614}
Statring iteration 15
{'ner': 0.36129471359205617}
Statring iteration 16
{'ner': 0.43554425108106193}
Statring iteration 17
{'ner': 0.1593883594172609}
Statring iteration 18
{'ner': 0.03863711543412125}
Statring iteration 19
{'ner': 0.0339309389

In [35]:
# Test your model with a text which is very similar 
test_text = "u'what is the price of a bat?"
doc = nlp(test_text)
for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

a bat 23 28 PrdName


In [36]:
#Test your text
test_text = "u'price of bat?"
doc = nlp(test_text)
for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

bat 11 14 PrdName


In [37]:
# Test your model with variations of text 
test_text = "u'what is the price of a bat and a pollo?"
# test_text = "u'what is the price of one bat and two pollo?"
# test_text = "u'price bat and three pollo?"
# test_text = "u'price of bat and three pollo?"

doc = nlp(test_text)
for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

a bat and a pollo 23 40 PrdName


In [38]:
# Save your trained model to disk and load it again

modelfile ="model_file_1.ner.spacy"
# save to disk
nlp.to_disk(modelfile)

# load model from disk
nlp = spacy.load(modelfile)