### Importing the Required Packages

In [1]:
import nltk
import re
import random
import time
from nltk.tokenize import RegexpTokenizer, word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.tokenize.regexp import WhitespaceTokenizer
from nltk.stem import WordNetLemmatizer 

### Defining Intents
In the following list, 2 intents have been defined and a list of keywords are mentioned that identify the corresponding intent.

In [2]:
intents = [
    ["greetings", 
     ["hi", 
      "hello"]],
    
    ["farewell", 
     ["bye"]]
    
]

### Defining Responses for each Intent
In the following code block, a list of responses are defined for each intent. Once the intent has been identified from the user's query, this list is used to return a random response from a list of corresponding responses for each intent.

In [3]:
intent_responses = [
    ["greetings", 
     ["hi, how may I help you today?", 
      "hello, what can I do for you today?",  
      "it’s nice to meet you, how may we be of service?"]],
    
    ["farewell", 
     ["it was a pleasure to help you. do come back. type 'quit' to exit the program.",
      "see you later. let me know if you have any other queries. type 'quit' to exit the program.",
      "i hope the interaction was helpful. type 'quit' to exit the program.",
      "thank you for your time. type 'quit' to exit the program."]]
]

### Tokenizing Input Text
**Whitespace** tokenizer in the **nltk** module of python is a powerful tokenizer that can handle punctuation and contractions with greater efficiency as compared to the other tokenizers in the library.

In [4]:
def tokenize_input(user_response):
    wst = WhitespaceTokenizer()
    return wst.tokenize(user_response)

### Removing Stopwords
The stopwords in English language are stored in the **corpus** module inside **nltk** library. These stopwords are removed after the text has been tokenized by the previous function. 

In [5]:
def remove_stopwords(tokens):
    stop_words = set(stopwords.words('english'))
    new_tokens = list()
    for w in tokens:
        if w not in stop_words: new_tokens.append(w)
    return new_tokens

### Remove Punctuation
After removing the stopwords, the punctuation at the end of each token - **full stop(.)**, **exclamation mark(!)**, **question mark(?)** and **comma(,)** - are removed. A simple pattern matching using Regular Expressions (**re**) module of python is sufficient to remove all such punctuation marks.

In [6]:
def remove_punct(user_tokens):
    punct_re = r"(.*)[?,.!]$"
    for i, word in enumerate(user_tokens):
        if re.match(punct_re, word):
            user_tokens[i] = word[:-1]
    return user_tokens

### Lemmatize Tokens
Once all the unwanted punctuation is removed from the tokenized text, the **WordNetLemmatizer** is used to lemmatize (extract the root words) of each token. This step is crucial as words with similar meaning are reduced to a single word and it helps analysing text more efficiently.

In [7]:
def lemmatize_tokens(tokens):
    lemmatizer = WordNetLemmatizer()
    return [lemmatizer.lemmatize(word) for word in tokens]

### Jaccard Similarity
In this code section, a function is defined to calculate the Jaccard Similarity between 2 input sets.<br>
Jaccard Similarity(**JS**) between 2 sets **A** and **B** is defined as - $JS = |A \cap B| / |A \cup B|$

In [8]:
def get_jaccard_sim(set1, set2): 
    set3 = set1.intersection(set2)
    set4 = set1.union(set2)
    return float( len(set3) / len(set4) )

### Matching Intent
In this section, the preprocessed input text is matched with each intent defined in the **intents** list. For each intent, the **Intent Name** and the **Jaccard Similarity** value is stored in a list 

In [9]:
def match_intents(lemma_tokens):
    intents_matched = list()
    for intent in intents:
        intents_matched.append([intent[0], get_jaccard_sim(set(lemma_tokens), set(intent[1]))])
    return intents_matched

### Finding the most suited Intent
The intent that has the maximum **Jaccard Similarity** with the user's input is extracted in the following code block.

In [10]:
def max_sim_intent(intents_matched):
    max_sim = 0
    user_intent = list()
    for i, intent in enumerate(intents_matched):
        if intent[1]>max_sim:
            max_sim = intent[1]
            user_intent = intents_matched[i]
    return user_intent

### Finding an appropriate response
Once the most appropriate intent is identified, the **intent_responses** list is used to retrieve the list of corresponding responses. A random response from this list is returned to the user.

In [11]:
def responses(user_intent):
    response = str()
    for intent_response in intent_responses:
        if intent_response[0] == user_intent[0]:
            response = random.choice(intent_response[1])
    
    return response

### Defining the logic that genrates the bot's response
In this function, first the user's input is preprocessed in the following manner - 
- Input is tokenized. (**tokenize_input()**)
- Stopwords and Punctuation is removed from the tokenized input. (**remove_stopwords()** and **remove_punct()**)
- The tokens so generated and lemmatized to generate a list of keywords for intent matching. (**lemmatize_tokens()**)

After preprocessing the text, the keywords are matched to an appropriate intent in the following manner - 
- The Jaccard Similarity of the input text with all the intents are calculated. (**match_intents()**)
- The intent with the maximum Jaccard Simmilarity is returned as the user's intent. (**max_sim_intent()**)

After identifying the intent, an appropriate response is randomly selected from the list of responses stored in **intent_responses** list. (**responses()**)

In [12]:
def bot_response(user_response):
    # Input Text Preprocessing 
    tokens = tokenize_input(user_response)
    new_tokens = remove_stopwords(tokens)
    new_tokens = remove_punct(new_tokens)
    lemma_tokens = lemmatize_tokens(new_tokens)
    
    # Intent Matching
    intents_matched = match_intents(lemma_tokens)
    user_intent = max_sim_intent(intents_matched)
    
    # Generating an appropriate response for the intent matched
    return responses(user_intent)

### Main Function
The **while loop** prompts the user to enter their text as long as they type **quit**.

In [13]:
print("Intent Classification BOT")
user_response = str()
while(user_response!="quit"):
    print()
    time.sleep(0.2)
    user_response = input("YOU: ")
    user_response = user_response.lower()
    if(user_response!='quit'):
        print("BOT: "+bot_response(user_response))
    else:
        time.sleep(0.2)
        print("BOT: Bye! take care..")

Intent Classification BOT

YOU: hi
BOT: hello, what can I do for you today?

YOU: bye
BOT: see you later. let me know if you have any other queries. type 'quit' to exit the program.

YOU: quit
BOT: Bye! take care..
