<a href="https://colab.research.google.com/github/tanaymukherjee/Natural-Language-Processing/blob/master/11_Training_A_Neural_Network_Model_II.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Training A Neural Network Model - II

## Training loop

The steps of a training loop:
1. Loop for a number of times.
2. Shuffle the training data.
3. Divide the data into batches.
4. Update the model for each batch.
5. Save the updated model.

In [120]:
import spacy

In [121]:
# Create an NLP object
from spacy.lang.en import English
nlp = English()

In [122]:
# Import the Doc class
from spacy.tokens import Doc, Span, Token

In [None]:
!python -m spacy download en_core_web_sm

[38;5;2m✔ Download and installation successful[0m
You can now load the model via spacy.load('en_core_web_sm')


spaCy's rule-based Matcher is a great way to quickly create training data for named entity models. A list of sentences is available as the variable TEXTS.

We want to find all mentions of different iPhone models, so we can create training data to teach a model to recognize them as 'GADGET'.

In [123]:
from spacy.matcher import Matcher
matcher = Matcher(nlp.vocab)

In [124]:
TEXTS = ['How to preorder the iPhone X',
 'iPhone X is coming',
 'Should I pay $1,000 for the iPhone X?',
 'The iPhone reviews are here',
 'Your iPhone goes up to 11 today',
 'I need a new phone! Any tips?']

### Building a training loop

Let's write a simple training loop from scratch!

The pipeline you've created in the previous exercise is available as the nlp object. It already contains the entity recognizer with the added label 'GADGET'.

The small set of labelled examples that you've created previously is available as the global variable TRAINING_DATA.

In [125]:
import random

Call nlp.begin_training, create a training loop for 10 iterations and shuffle the training data.

In [126]:
nlp = spacy.load('en_core_web_sm')

In [127]:
from spacy.lang.en import English
nlp = English()

1. Create batches of training data using spacy.util.minibatch and iterate over the batches.
2. Convert the (text, annotations) tuples in the batch to lists of texts and annotations.

For each batch, use nlp.update to update the model with the texts and annotations.

In [128]:
# Two tokens whose lowercase forms match 'iphone' and 'x'
pattern3 = [{'LOWER': 'iphone'}, {'IS_DIGIT': True, 'OP': '?'}]

# Add patterns to the matcher
matcher.add('GADGET', None, pattern3)

In [129]:
# Create a Doc object for each text in TEXTS
for doc in nlp.pipe(TEXTS):
    # Find the matches in the doc
    matches = matcher(doc)
    
    # Get a list of (start, end, label) tuples of matches in the text
    entities = [(start, end, 'GADGET') for match_id, start, end in matches]
    print(doc.text, entities)    

How to preorder the iPhone X [(4, 5, 'GADGET')]
iPhone X is coming [(0, 1, 'GADGET')]
Should I pay $1,000 for the iPhone X? [(7, 8, 'GADGET')]
The iPhone reviews are here [(1, 2, 'GADGET')]
Your iPhone goes up to 11 today [(1, 2, 'GADGET')]
I need a new phone! Any tips? []


In [130]:
TRAINING_DATA = []

# Create a Doc object for each text in TEXTS
for doc in nlp.pipe(TEXTS):
    # Match on the doc and create a list of matched spans
    spans = [doc[start:end] for match_id, start, end in matcher(doc)]
    # Get (start character, end character, label) tuples of matches
    entities = [(span.start_char, span.end_char, 'GADGET') for span in spans]
    
    # Format the matches as a (doc.text, entities) tuple
    training_example = (doc.text, {'entities': entities})
    # Append the example to the training data
    TRAINING_DATA.append(training_example)
    
print(*TRAINING_DATA, sep='\n')    

('How to preorder the iPhone X', {'entities': [(20, 26, 'GADGET')]})
('iPhone X is coming', {'entities': [(0, 6, 'GADGET')]})
('Should I pay $1,000 for the iPhone X?', {'entities': [(28, 34, 'GADGET')]})
('The iPhone reviews are here', {'entities': [(4, 10, 'GADGET')]})
('Your iPhone goes up to 11 today', {'entities': [(5, 11, 'GADGET')]})
('I need a new phone! Any tips?', {'entities': []})


### Setting up the pipeline

In [131]:
# Create a blank 'en' model
nlp = spacy.blank('en')

In [132]:
# Create a new entity recognizer and add it to the pipeline
ner = nlp.create_pipe('ner')
nlp.add_pipe(ner)

In [133]:
# Add the label 'GADGET' to the entity recognizer
ner.add_label('GADGET')

Call nlp.begin_training, create a training loop for 10 iterations and shuffle the training data.

In [134]:
# Start the training
nlp.begin_training()

# Loop for 10 iterations
for itn in range(10):
    # Shuffle the training data
    random.shuffle(TRAINING_DATA)

1. Create batches of training data using spacy.util.minibatch and iterate over the batches.
2. Convert the (text, annotations) tuples in the batch to lists of texts and annotations.

In [None]:
# Start the training
nlp.begin_training()

# Loop for 10 iterations
for itn in range(10):
    # Shuffle the training data
    random.shuffle(TRAINING_DATA)
    losses = {}
    
    # Batch the examples and iterate over them
    for batch in spacy.util.minibatch(TRAINING_DATA, size=2):
        texts = [text for text, entities in batch]
        annotations = [entities for text, entities in batch]

For each batch, use nlp.update to update the model with the texts and annotations.

In [135]:
# Start the training
nlp.begin_training()

# Loop for 10 iterations
for itn in range(10):
    # Shuffle the training data
    random.shuffle(TRAINING_DATA)
    losses = {}
    
    # Batch the examples and iterate over them
    for batch in spacy.util.minibatch(TRAINING_DATA, size=2):
        texts = [text for text, entities in batch]
        annotations = [entities for text, entities in batch]
        
        # Update the model
        nlp.update(texts, annotations, losses=losses)
        print(losses)

{'ner': 10.0}
{'ner': 21.905810594558716}
{'ner': 32.25822949409485}
{'ner': 10.269250392913818}
{'ner': 15.439290881156921}
{'ner': 19.906322598457336}
{'ner': 2.8490229845046997}
{'ner': 4.489939351100475}
{'ner': 7.512486565625295}
{'ner': 2.6266299698036164}
{'ner': 3.5055149205436464}
{'ner': 4.419238194212085}
{'ner': 0.3983194096945226}
{'ner': 0.48387045215349644}
{'ner': 0.6165173742047045}
{'ner': 0.017605453482246958}
{'ner': 0.01866214360052254}
{'ner': 0.021317225425036668}
{'ner': 0.00020233733502550422}
{'ner': 0.00020756504967778255}
{'ner': 0.0002859472314993283}
{'ner': 2.7428648396998767e-07}
{'ner': 1.4316221002721227e-05}
{'ner': 1.4523199756152818e-05}
{'ner': 1.8436688343933622e-06}
{'ner': 1.8478045194204412e-06}
{'ner': 1.8513102033167448e-06}
{'ner': 1.127002382804882e-09}
{'ner': 1.0077552362912969e-08}
{'ner': 3.1408443326034474e-08}


### Exploring the model

Let's see how the model performs on unseen data! 

1. Process each text in TEST_DATA using nlp.pipe.
2. Print the document text and the entities in the text.

In [137]:
TEST_DATA = ['Apple is slowing down the iPhone 8 and iPhone X - how to stop it',
 "I finally understand what the iPhone X 'notch' is for",
 'Everything you need to know about the Samsung Galaxy S9',
 'Looking to compare iPad models? Here’s how the 2018 lineup stacks up',
 'The iPhone 8 and iPhone 8 Plus are smartphones designed, developed, and marketed by Apple',
 'what is the cheapest ipad, especially ipad pro???',
 'Samsung Galaxy is a series of mobile computing devices designed, manufactured and marketed by Samsung Electronics']

In [138]:
# Process each text in TEST_DATA
for doc in nlp.pipe(TEST_DATA):
    # Print the document text and entitites
    print(doc.text)
    print(doc.ents, '\n\n')

Apple is slowing down the iPhone 8 and iPhone X - how to stop it
(iPhone, iPhone) 


I finally understand what the iPhone X 'notch' is for
(iPhone,) 


Everything you need to know about the Samsung Galaxy S9
() 


Looking to compare iPad models? Here’s how the 2018 lineup stacks up
() 


The iPhone 8 and iPhone 8 Plus are smartphones designed, developed, and marketed by Apple
(iPhone, iPhone) 


what is the cheapest ipad, especially ipad pro???
() 


Samsung Galaxy is a series of mobile computing devices designed, manufactured and marketed by Samsung Electronics
() 


