# NLP training example
In this example, we'll train an NLP model for sentiment analysis of tweets using spaCy.

First we download spaCy language libraries.

In [1]:
!python -m spacy download en_core_web_sm

Collecting en_core_web_sm==2.3.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.1/en_core_web_sm-2.3.1.tar.gz (12.0 MB)
[K     |████████████████████████████████| 12.0 MB 1.2 MB/s eta 0:00:01
Building wheels for collected packages: en-core-web-sm
  Building wheel for en-core-web-sm (setup.py) ... [?25ldone
[?25h  Created wheel for en-core-web-sm: filename=en_core_web_sm-2.3.1-py3-none-any.whl size=12047106 sha256=b66c79f8c85ff5d2cd2e24f59ebf72b08d696c2a766709256bd13587f29dcab2
  Stored in directory: /tmp/pip-ephem-wheel-cache-3pe2_nic/wheels/10/6f/a6/ddd8204ceecdedddea923f8514e13afb0c1f0f556d2c9c3da0
Successfully built en-core-web-sm
Installing collected packages: en-core-web-sm
Successfully installed en-core-web-sm-2.3.1
[38;5;2m✔ Download and installation successful[0m
You can now load the model via spacy.load('en_core_web_sm')


And import the boilerplate code.

In [2]:
from __future__ import unicode_literals, print_function

import boto3
import json
import numpy as np
import pandas as pd
import spacy

## Data prep

Download the dataset from S3.

In [4]:
S3_BUCKET = "verta-strata"
S3_KEY = "english-tweets.csv"
FILENAME = S3_KEY

boto3.client('s3').download_file(S3_BUCKET, S3_KEY, FILENAME)

NoCredentialsError: Unable to locate credentials

Clean and load data using our library.

In [5]:
import utils

data = pd.read_csv(FILENAME).sample(frac=1).reset_index(drop=True)
utils.clean_data(data)

data.head()

FileNotFoundError: [Errno 2] File b'english-tweets.csv' does not exist: b'english-tweets.csv'

## Train the model
We'll use a pre-trained model from spaCy and fine tune it in our new dataset.

In [None]:
nlp = spacy.load('en_core_web_sm')

Update the model with the current data using our library.

In [None]:
import training

training.train(nlp, data, n_iter=20)

Now we save the model back into S3 to a well known location (make sure it's a location you can write to!) so that we can fetch it later.

In [None]:
filename = "/tmp/model.spacy"
with open(filename, 'wb') as f:
    f.write(nlp.to_bytes())

In [None]:
boto3.client('s3').upload_file(filename, S3_BUCKET, "models/01/model.spacy")

In [None]:
filename = "/tmp/model_metadata.json"
with open(filename, 'w') as f:
    f.write(json.dumps(nlp.meta))

In [None]:
boto3.client('s3').upload_file(filename, S3_BUCKET, "models/01/model_metadata.json")

## Deployment

Great! Now you have a model that you can use to run predictions against. Follow the next step of this tutorial to see how to do it.