In this exercise, you’ll prepare a spaCy pipeline to train the entity recognizer to recognize "GADGET" entities in a text – for example, “iPhone X”.

- Create a blank "en" model, for example using the spacy.blank method.
- Create a new entity recognizer using nlp.create_pipe and add it to the pipeline.
- Add the new label "GADGET" to the entity recognizer using the add_label method on the pipeline component.

In [2]:
import spacy
from spacy.language import Language

# Define the custom component

# Create a blank "en" model
nlp = spacy.blank("en")

# Create a new entity recognizer and add it to the pipeline
@Language.component("ner")
ner = nlp.create_pipe("ner")
nlp.add_pipe("ner",name="ner",first=True)

# Add the label "GADGET" to the entity recognizer
ner.add_label("GADGET")

SyntaxError: invalid syntax (Temp/ipykernel_26872/3906057293.py, line 11)

In [4]:
import spacy

# Create a blank "en" model
nlp = spacy.blank("en")

# Create a new entity recognizer and add it to the pipeline

nlp.add_pipe("ner")

# Add the label "GADGET" to the entity recognizer
ner.add_label("GADGET")

1

Let’s write a simple training loop from scratch!

The pipeline you’ve created in the previous exercise is available as the nlp object. It already contains the entity recognizer with the added label "GADGET".

The small set of labelled examples that you’ve created previously is available as TRAINING_DATA. To see the examples, you can print them in your script.

- Call nlp.begin_training, create a training loop for 10 iterations and shuffle the training data.
- Create batches of training data using spacy.util.minibatch and iterate over the batches.
- Convert the (text, annotations) tuples in the batch to lists of texts and annotations.
- For each batch, use nlp.update to update the model with the texts and annotations.

In [62]:
import spacy
import random
import json
from spacy.training.example import Example
from spacy.util import minibatch, compounding
with open("gadgets.json", encoding="utf8") as f:
    TRAINING_DATA = json.loads(f.read())

nlp = spacy.blank("en")
nlp.add_pipe("ner")
ner.add_label("GADGET")

# Start the training
nlp.begin_training
optimizer = nlp.initialize()
# Loop for 10 iterations
for itn in range(10):
    # Shuffle the training data
    random.shuffle(TRAINING_DATA)
    losses = {}
    # Batch the examples and iterate over them
    for text,entities in TRAINING_DATA:   
        #create an example
        doc=nlp.make_doc(text)
        example=Example.from_dict(doc,entities)
        # Update the model
        nlp.update([example],losses=losses)
        print(losses)


        
   

{'ner': 3.3333334922790527}
{'ner': 10.57428377866745}
{'ner': 15.18784213066101}
{'ner': 19.61316305398941}
{'ner': 26.33499163389206}
{'ner': 30.953143417835236}
{'ner': 3.9773558974266052}
{'ner': 6.107253015041351}
{'ner': 9.60909390449524}
{'ner': 11.454748257994652}
{'ner': 12.618445686995983}
{'ner': 13.592833898961544}
{'ner': 0.784626726526767}
{'ner': 5.281341481837444}
{'ner': 7.676397502771579}
{'ner': 9.67427089868579}
{'ner': 9.696120108070318}
{'ner': 11.379895677411696}
{'ner': 1.1947878953651525}
{'ner': 2.302857048867736}
{'ner': 3.186356058147794}
{'ner': 3.759413271651283}
{'ner': 3.759849281462266}
{'ner': 4.085408241021469}
{'ner': 1.1526227818833945}
{'ner': 1.1526492638080867}
{'ner': 1.186012097821731}
{'ner': 1.1945256181799238}
{'ner': 1.20071875577624}
{'ner': 1.2018156952802936}
{'ner': 0.00012432494122993631}
{'ner': 0.0002371366704680966}
{'ner': 0.0002513230536865807}
{'ner': 0.00025984274499027404}
{'ner': 1.8531933726453627}
{'ner': 1.8531933967555805}