# Classifications

In this notebook, we will be creating a custom model for Watson Natural Language Understanding (NLU) classifications feature using the Train API.

We will go through following functionalities:
- [How to create training data file](#Create-Training-Data-File)
   - [Create a JSON file from scratch](#Create-training-data-from-scratch)
   - [Convert data from NLC to NLU format](#Convert-data-from-NLC-to-NLU-format): This describes how we can convert the data available in [Watson Natural Language Classifier data (csv)](https://cloud.ibm.com/docs/natural-language-classifier?topic=natural-language-classifier-using-your-data) format to Natural Language Understanding data (JSON)
   - [Fetch data from NLC](#Fetch-data-from-NLC): This describes how we can get the training data from existing NLC classifier and convert it to NLU data format for classifications.
- [How to train a classifications model with NLU train API](#Train-Classifications-model-with-NLU)
- [How to get status of the model](#Retrieve-custom-categories-model-by-ID)
- [How to use the trained model using NLU Analyze API](#Use-the-trained-model-using-NLU-Analyze-API)

To start, we will need an NLU instance and an API key.


## Add your IBM Cloud service credentials here.
- If you use IAM service credentials, leave 'username' set to 'apikey'and set 'password' to the value of your IAM API key.
- If you use pre-IAM service credentials, set the values to your 'username' and 'password'.

Also set 'url' to the URL for your service instance as provided in your service credentials.
See the following instructions for getting your own credentials: https://cloud.ibm.com/docs/watson?topic=watson-iam


In [24]:
username = 'apikey'
password = 'YOUR_IAM_APIKEY'
url = 'NLU_URI'

# Create Training Data File

Classifications training data requires a list of text documents, each annotated by one or more labels. 

The training data for classification needs to be in following JSON format:

```json
[
    {
        "text": "document 1",
        "labels": ["label1"]
    },
    {
        "text": "document 2",
        "labels": ["label2", "label3"]
    }
]
```


## Create training data from scratch

Create a training data file and save it in json file:

In [18]:
training_data = [
    {
        "text": "How hot is it today?",
        "labels": ["temperature"]
    },
    {
        "text": "Is it hot outside?",
        "labels": ["temperature"]
    },
    {
        "text": "Will it be uncomfortably hot?",
        "labels": ["temperature"]
    },
    {
        "text": "Will it be sweltering?",
        "labels": ["temperature"]
    },
    {
        "text": "How cold is it today?",
        "labels": ["temperature"]
    },
    {
        "text": "Will we get snow?",
        "labels": ["conditions"]
    },
    {
        "text": "Are we expecting sunny conditions?",
        "labels": ["conditions"]
    },
    {
        "text": "Is it overcast?",
        "labels": ["conditions"]
    },
    {
        "text": "Will it be cloudy?",
        "labels": ["conditions"]
    },
    {
        "text": "How much rain will fall today?",
        "labels": ["conditions"]
    }
]

# Save Training data in a file
import json

training_data_filename = 'training_data.json'

with open(training_data_filename, 'w', encoding='utf-8') as f:
    json.dump(training_data, f, indent=4)


## Convert data from NLC to NLU format 

This part of the tutorial provides a way of converting the training data stored **locally** in CSV format (as required by [Watson Natural Language Classifier](https://cloud.ibm.com/docs/natural-language-classifier?topic=natural-language-classifier-using-your-data#training-structure)) to classification training data required by NLU classification training.

In [1]:
# Set path to training data file used to train Natural Language Classifier
nlc_training_data_file_name = 'nlc_training_data.csv'

## Imports

import csv
import json


def convert_nlc_to_nlu(filename):
    
    nlu_data = []

    with open(nlc_training_data_file_name, 'r', encoding='utf-8') as csv_file:
        csv_reader = csv.reader(csv_file, delimiter=',')
        for row in csv_reader:
            text = row[0]
            labels = row[1:]
            # Convert the text and label in NLU training data JSON object
            data_dict = {
                'text': text,
                'labels': labels
            }
            nlu_data.append(data_dict)

    return nlu_data

nlu_data = convert_nlc_to_nlu(nlc_training_data_file_name)
        
# Save Training data in a file
training_data_filename = 'training_data.json'

with open(training_data_filename, 'w', encoding='utf-8') as f:
    json.dump(nlu_data, f, indent=4)

print('Data successfully converted to NLU format and saved locally in ' + training_data_filename)

## Fetch data from NLC

This part of the tutorial provides a way of extracting the training data from an **already** trained Natural Language Classifier (NLC) and converting it into NLU classifications training data format. For extracting the data from existing NLC classifier, you would need to provide NLC apikey and the classifier URL. Classifier URL can be obtained by making a GET call to NLC as per the API documentation provided [here](https://cloud.ibm.com/apidocs/natural-language-classifier#getclassifier)

In [None]:
# Add NLC Credentials
NLC_USERNAME = "apikey"
NLC_API_KEY = "NLC_API_KEY"

# Add the classifier URL returned by NLC. Should contain the instance id and classifier id
NLC_CLASSIFIER_URL = "NLC_URL"


import json
import ntpath
import requests
import csv

from contextlib import closing

from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)


# Provide the filename to save the data downloaded from existing NLC classifier
nlc_training_data_file_name = "nlc_training_data.csv"

# Fetch data from NLC
with open(nlc_training_data_file_name, 'w', encoding='utf-8') as out_file:
    uri = "{}/training_data".format(NLC_CLASSIFIER_URL)
    with closing(requests.get(uri, auth=(NLC_USERNAME, NLC_API_KEY), verify=False, stream=True)) as res:
        lines = (line.decode('utf-8') for line in res.iter_lines())
        csv_writer = csv.writer(out_file)
        for row in csv.reader(lines):
            csv_writer.writerow(row)


# Convert to NLU format    
nlu_data = convert_nlc_to_nlu(nlc_training_data_file_name)
        
# Save Training data in a file
training_data_filename = 'training_data.json'

with open(training_data_filename, 'w', encoding='utf-8') as f:
    json.dump(nlu_data, f, indent=4)
    
print('Data successfully converted to NLU format and saved locally in ' + training_data_filename)

## Train Classifications model with NLU

In [25]:
import json
import ntpath
import requests
import sys
import time

from requests.packages.urllib3.exceptions import InsecureRequestWarning

requests.packages.urllib3.disable_warnings(InsecureRequestWarning)


######### Create parameters required for making a call to NLU ######### 
feature_to_train = 'classifications'

headers = {}

data = {
    'name':'Classifications model #1',
    'language':'en',
    'version':'1.0.1'
}

params = {
    'version': '2021-03-25'
}

uri = url + '/v1/models/{}'.format(feature_to_train)


print('\nCreating custom model...')

training_data_filename = 'training_data.json'

######### Make a call to NLU to train the model ######### 
with open(training_data_filename, 'rb') as f:
    response = requests.post(uri,
                         params=params,
                         data=data,
                         headers=headers,
                         files={'training_data': (ntpath.basename(training_data_filename), f, 'application/json')},
                         auth=(username, password),
                         verify=False,
                        )

######### Parse response from NLU ######### 
    
print('Model creation returned: ', response.status_code)

if response.status_code != 201:
    print('Failed to create model')
    print(response.text)
else:
    print('\nCustom model training started...')
    response_json = response.json()
    model_id = response_json['model_id']
    print('Custom Model ID: ', model_id)


Creating custom model...
Model creation returned:  201

Custom model training started...
Custom Model ID:  a39b5357-c0f7-4d30-98c7-d4724c73806f


## Retrieve classifications model by ID

In [30]:
import requests

params = {
    'version': '2021-02-15'
}

uri = url + '/v1/models/classifications/' + model_id

######### Make a call to NLU ######### 

response = requests.get(uri, auth=(username, password), params=params, verify=False, headers=headers)

######### Parse response from NLU ######### 

print('\033[1m'+ '\033[4m' + 'Response from NLU:' + '\033[4m' + '\033[0m')

print('\nStatus: ', response.status_code)

response_json = response.json()
print("Response body:", json.dumps(response_json, indent=4, sort_keys=True), )

[1m[4mResponse from NLU:[4m[0m

Status:  200
Response body: {
    "created": "2021-03-15T07:31:01Z",
    "description": null,
    "features": [
        "classifications"
    ],
    "language": "en",
    "last_deployed": "2021-03-15T07:36:59Z",
    "last_trained": "2021-03-15T07:31:01Z",
    "model_id": "a39b5357-c0f7-4d30-98c7-d4724c73806f",
    "model_version": "1.0.1",
    "name": "Classifications model #1",
    "status": "available",
    "user_metadata": null,
    "version": "1.0.1",
    "version_description": null,
    "workspace_id": null
}


## Use the trained model using NLU Analyze API

Once the model is trained, the status from the get request above will turn to `available`. Once the model is `available`, you can make the analyze request using the `model_id`

In [17]:
######### Create request #########

analyze_request_data = {
        "text":"What is the expected high for today?",
        "language": "en",
        "features": {
            "classifications": {
                "model": model_id
            }
        }
}

uri = url + '/v1/analyze'

params = {
    'version': '2021-02-15'
}

headers = {'Content-Type' : 'application/json'}

######### Make a call to NLU #########

response = requests.post(uri,
                         params=params,
                         json=analyze_request_data,
                         headers=headers,
                         auth=(username, password),
                         verify=False,
                        )

if response.status_code != 200:
    print('Failed to make request to model. Reason:')
    print(response.text)

else:
    response_json = response.json()

    print("Successfully analyzed request. Response from NLU:\n")
    print(json.dumps(response_json, indent=4, sort_keys=True))

Successfully analyzed request. Response from NLU:

{
    "classifications": [
        {
            "class_name": "temperature",
            "confidence": 0.562519
        },
        {
            "class_name": "conditions",
            "confidence": 0.433996
        }
    ],
    "language": "en",
    "usage": {
        "features": 0,
        "text_characters": 36,
        "text_units": 1
    }
}
