# FlashText

[FlashText](https://github.com/vi3k6i5/flashtext) is an open source Natural Language Processing (NLP) library focused on keyword treatment. This is, it's possible to "train" a FlashText *KeywordProcessor()* with several keywords in order to be detected, modified, extracted, etc. from a given text. Although the approach might be simpler than others involving Deep Learning pretrained models as SpaCy, its simplicity and ease-of-use is worth considering for scenarios so domain specific as the one in consideration. Furthermore, as the script defines the entire keyword collection of interest, the processing remains language-agnostic.

In [None]:
from flashtext import KeywordProcessor
import numpy as np

## Creating a Keyword Processor instance

The Keyword Processor is FlashText's core tool. Once initialized, it shall be used to store all the defined keywords. This is going to be further explained in the next sections. It'll then be used to process any kind of text to look for this keywords. Finally, several options exist such as locating them, replacing them, etc. Let's begin with this very first line.

In [2]:
keyword_processor = KeywordProcessor(case_sensitive=False)

## Adding keywords

This is the main step when using FlashText. With the *add_key* method you define which are the keywords of interest that might appear in a given text. It also takes a second optional argument, the *"clean name"*, which replaces the previously defined keyword if found. 

In [3]:
keyword_processor.add_keyword("framework standalone", "standalone")

True

It's also possible to include multiple keywords at a time in a dictionary fashion. In this way, the *"clean name"* comes first as the key, and the value can be one or many possible keywords that will map to the same key.

In [12]:
keyword_dict = {
    "fwk online": ["framework online", "online"],
    "compilar": ["lanzar", "ejecutar"]
}
keyword_processor.add_keywords_from_dict(keyword_dict)

This means, for every instance of the words "framework online" or "online", those'll be replaced by the *clean name* defined in the dictionary, "fwk online" in this case.

## Getting all keywords in dictionary

In [49]:
keyword_processor.get_all_keywords()

{'framework standalone': 'standalone',
 'framework online': 'fwk online',
 'online': 'fwk online'}

## Extract keywords

Find which defined keywords appear in a given text.

In [21]:
text = "Quiero lanzar el framework standalone en Jenkins"
keywords_found = keyword_processor.extract_keywords(text)
print("Text: {0}, \nKeywords found: {1}".format(text, keywords_found))

Text: Quiero lanzar el framework standalone en Jenkins, 
Keywords found: ['compilar', 'standalone']


## Replace keywords

In [42]:
text = "Quiero lanzar el framework standalone en Jenkins"
replaced_text = keyword_processor.replace_keywords(text)
print("Original text: {0} \nNew text: {1}".format(text, replaced_text))

Original text: Quiero lanzar el framework standalone en Jenkins 
New text: Quiero compilar el standalone en Jenkins


## Remove keywords

In [48]:
keyword_processor.remove_keywords_from_dict({"compilar": ["lanzar", "ejecutar"]})
text = "Quiero lanzar el framework standalone en Jenkins"
keywords_found = keyword_processor.extract_keywords(text)
print("Text: {0}, \nKeywords found: {1}".format(text, keywords_found))

Text: Quiero lanzar el framework standalone en Jenkins, 
Keywords found: ['standalone']
