<a href="https://colab.research.google.com/github/victor-roris/mediumseries/blob/master/NLP/RasaNLU.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# RASA NLU

Rasa NLU is an open-source natural language processing tool for intent classification, response retrieval and entity extraction.

In this notebook we extract the code of the Rasa scripts to have control about the package actions about Natural Language Understanding.

https://rasa.com/docs/rasa/nlu/about/

Github: https://github.com/RasaHQ/rasa


## Installation

Rasa NLU is an continuously evolving project. This means the code of this notebook can be deprecated in future version. For this reason, we freeze the version of the `rasa_nlu` package. 

In [0]:
! pip install rasa_nlu=='0.15.1'

Some `pipelines` can need some packages.

To know the exact packages you can need: https://github.com/RasaHQ/rasa/blob/master/requirements.txt

In [0]:
! pip install "sklearn-crfsuite==0.3.6"

## Paths definition

`rasa_nlu` works using files and folders as input and output. 

For this reason, for this demostration, in the notebook we are going to define some folderpaths and filepaths. 

In [0]:
rasa_folder = "./rasa"
rasa_data_folder = "./rasa/data"
training_data_filepath = rasa_data_folder + "/training.json" #training.md
evaluation_data_filepath = None 
pipeline_data_filepath = rasa_data_folder + "/config.json" 
models_folder  = rasa_folder + "/model" 
model_name = "rasa_demo"

In [2]:
print('Content of the rasa folder (it should not exist): ')
!ls $rasa_folder

Content of the rasa folder (it should not exist): 
data


To create files from strings.

In [0]:
import os
def write_file(path, content):
  os.makedirs(os.path.dirname(path), exist_ok=True)
  f = open(path, "w")
  f.write(content)
  f.close()

* **Training Data**

To train the nlu model we have to provide training data. This data can be in format *markdown* or *json*. To know the structure of the file you can visit the web: https://rasa.com/docs/rasa/nlu/training-data-format/


To this example, we take a training file found in the web. This training data allows to identify the intents: greet (ex., "hi"), affirm (ex., "yes"), ask_name, ask_weather, restaurant_search (ex., "show me chinese restaurants"), ...   

In the intent `restaurant_search`, the model identify as entity the type of cuisine and location.

In [4]:
import requests
linkjson = "https://raw.githubusercontent.com/RasaHQ/rasa/master/data/examples/rasa/demo-rasa.json"
linkmd = "https://raw.githubusercontent.com/RasaHQ/rasa/master/data/examples/rasa/demo-rasa.md"
training_content = requests.get(linkjson)

write_file(training_data_filepath, training_content.text)

print('Content of the rasa data folder: ')
!ls $rasa_data_folder

Content of the rasa data folder: 
config.json  training.json


* **Pipeline**

`rasa_nlu` makes that the user selects the pipeline of components to execute to identify intents and entities.

Currently, exist a list of preconfigure pipelines. But you can create your custom pipeline. 

To configure this pipeline you should create a file following a specific structure. This structure is explained in the web: https://rasa.com/docs/rasa/nlu/training-data-format/

Some of this pipelines supports other languages than the English language. To know more visit: https://rasa.com/docs/rasa/nlu/language-support/

Furthermore, at same as spaCy you can create your own components and you can integrate it in your pipeline: https://blog.rasa.com/enhancing-rasa-nlu-with-custom-components/

In this example, we define a list of possible pipeline configurations.

In [0]:
## PRECONFIGURE PIPELINES

rasa_pretraemb_config_content = """
language: "en"
pipeline: "pretrained_embeddings_spacy"
"""

rasa_superemb_config_content = """
language: "en"
pipeline: "supervised_embeddings"
"""

rasa_spaskl_config_content = """
language: "en"
pipeline: "spacy_sklearn"
"""

## CUSOM PIPELINES
rasa_manual_config_content = """
language: "en"
pipeline:
- name: "SpacyNLP"
- name: "SpacyTokenizer"
- name: "SpacyFeaturizer"
- name: "RegexFeaturizer"
- name: "CRFEntityExtractor"
- name: "EntitySynonymMapper"
- name: "SklearnIntentClassifier"
"""

We select and use one of this configurations

In [6]:
write_file(pipeline_data_filepath, rasa_spaskl_config_content)

print('Content of the rasa data folder: ')
!ls $rasa_data_folder

Content of the rasa data folder: 
config.json  training.json


Now, we have the files with the training data and the pipeline configuration. The next step is training the NLU model.

## RASA_NLU training

Training the NLU model allows us to have a model capable to identify from a text a list of intents and related entities.



In [0]:
from rasa_nlu.training_data import load_data
from rasa_nlu import config
from rasa_nlu.model import Trainer
from rasa_nlu.test import run_evaluation

We load the training data from the previously created file

In [0]:
training_data = load_data(training_data_filepath)

We define the trainer with the previously created pipeline file as configuration. 

In [9]:
#Define the trainer with the pipeline
trainer = Trainer(config.load(pipeline_data_filepath))

Now, we run the training process with the training data

In [10]:
#Train the model
interpreter = trainer.train(training_data)

Fitting 2 folds for each of 6 candidates, totalling 12 fits


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)
[Parallel(n_jobs=1)]: Done  12 out of  12 | elapsed:    0.1s finished


<rasa_nlu.model.Interpreter at 0x7f369169dfd0>

The next step is stored the trained NLU model

In [0]:
#Store the trained model
model_directory = trainer.persist(models_folder, project_name=model_name)

In [12]:
print('Content of the rasa model folder: ')
!ls $model_directory

Content of the rasa model folder: 
component_3_RegexFeaturizer.pkl       component_6_SklearnIntentClassifier.pkl
component_4_CRFEntityExtractor.pkl    metadata.json
component_5_EntitySynonymMapper.json  training_data.json


Finally, if we have evaluation labelled examples (exactly the same structure of the training examples), we can run an evaluation.

In [0]:
#Run evaluation of the trained model
if evaluation_data_filepath != None:
  run_evaluation(data_path, model_directory)

## RASA_NLU Interpretation

We use the trained model for intent classification and entity extraction. The interpreter result follow a specific structure. You can find some output example in the web: https://rasa.com/docs/rasa/nlu/about/


In [0]:
import pprint
from rasa_nlu.model import Interpreter

We can load the trained model in a new interpreter

In [0]:
# Define the Interpreter (with the appropriate model)
interpreter = Interpreter.load(model_directory)

* **Example 1**

In [0]:
text = "Hello"

In [0]:
# Get the interpretation of the text using the Interpreter
interpretation = interpreter.parse(text)

In [18]:
# Print by console the interpretation
pprint.pprint(interpretation)

{'entities': [],
 'intent': {'confidence': 0.5658857462138052, 'name': 'greet'},
 'intent_ranking': [{'confidence': 0.5658857462138052, 'name': 'greet'},
                    {'confidence': 0.24832025137140976, 'name': 'goodbye'},
                    {'confidence': 0.1362036966385877, 'name': 'affirm'},
                    {'confidence': 0.019737694697191617,
                     'name': 'chitchat/ask_name'},
                    {'confidence': 0.016811858550056086,
                     'name': 'chitchat/ask_weather'},
                    {'confidence': 0.01304075252894959,
                     'name': 'restaurant_search'}],
 'text': 'Hello'}


* **Example 2**

In [0]:
text = "show me chinese resturants in the west"

In [0]:
# Get the interpretation of the text using the Interpreter
interpretation = interpreter.parse(text)

In [33]:
# Print by console the interpretation
pprint.pprint(interpretation)

{'entities': [{'confidence': 0.5155491861262322,
               'end': 15,
               'entity': 'cuisine',
               'extractor': 'CRFEntityExtractor',
               'processors': ['EntitySynonymMapper'],
               'start': 8,
               'value': 'chinese'},
              {'confidence': 0.808502143057703,
               'end': 38,
               'entity': 'location',
               'extractor': 'CRFEntityExtractor',
               'start': 34,
               'value': 'west'}],
 'intent': {'confidence': 0.7371095381829954, 'name': 'restaurant_search'},
 'intent_ranking': [{'confidence': 0.7371095381829954,
                     'name': 'restaurant_search'},
                    {'confidence': 0.08972339958627339, 'name': 'affirm'},
                    {'confidence': 0.05145488831184812,
                     'name': 'chitchat/ask_name'},
                    {'confidence': 0.04780190574297566, 'name': 'goodbye'},
                    {'confidence': 0.045195788574323166,
  

To finish with the example, we remove the files created.

In [34]:
print("Remove the rasa folder")
!rm -rf $rasa_folder

print('Exist rasa folder now? ')
!ls $rasa_folder

Remove the rasa folder
Exist rasa folder now? 
ls: cannot access './rasa': No such file or directory
