# An example of a customer message parsing

Here is the message.

```Hi, my  name is Alice. I am calling from Alabama. I wonder if you could help me with my headaches. I have them frequently sometimes every day,  sometimes twice a week. I do not   know the reason. I have a very stressful job and have to communicate with many annoyed clients. Very tiring!!! My boss has yelled at me this morning. 
I cannot    concentrate because of the issues with my head.. My dad died three months ago because of kidney failure. I kept taking ibuprofen but it does not help me any more. Do you have alternatives that might help with my headache?```

In [1]:
message="Hi, my  name is Alice. I am calling from Alabama. I wonder if you could help me with my headaches. I have them frequently sometimes every day,  sometimes twice a week. I do not   know the reason. I have a very stressful job and have to communicate with many annoyed clients. Very tiring!!! My boss has yelled at me this morning.  I cannot    concentrate because of the issues with my head.. My dad died three months ago because of kidney failure. I kept taking ibuprofen but it does not help me any more. Do you have alternatives that might help with my headache?"

In [2]:
import pandas as pd
from nltk.corpus import wordnet
from nltk.tokenize import sent_tokenize
from nltk.tokenize import word_tokenize
from nltk import pos_tag
from nltk.tokenize import RegexpTokenizer
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet
import spacy

In [3]:
nlp = spacy.load('en_core_web_sm')
lemmatizer = WordNetLemmatizer()
tokenizer=RegexpTokenizer("[\.\?\!]", gaps=True)
sent_list=tokenizer.tokenize(message)
sent_end_list=RegexpTokenizer("[^\.\?\!]", gaps=True).tokenize(message)



I borrowed  a piece of code from here https://stackoverflow.com/questions/28618400/how-to-identify-the-subject-of-a-sentence and adapted to my own needs. I use the functions to extract subjects and objects  from the sentence first. I use the subjects to identify the sentences that might be related to the condition of a customer (sentences with I, my body parts, and family memebers (in case of inherited disseases)). The sentences that have other subjects, can be ignored. To be more precise  with the subjects and objects, I extract nouns and adjectives from the seprated parts. Later, I need to check what words describe what I need (look at the definitions at wordnet). Keywords are selected according to the criteria of mine (keyword list that matches the definitions). 

In [4]:
def is_noun(pos):
    return pos[:2]=='NN'

def is_adj(pos):
    return pos[:2]=='JJ'
 #getting subjects of sentences
def get_subject_phrase(doc):
    for token in doc:
        if ("subj" in token.dep_):
            subtree = list(token.subtree)
            start = subtree[0].i
            end = subtree[-1].i + 1
            sent_subj=doc[start:end].text           
            if (sent_subj=='I'):
                subj=sent_subj
                return lemmatizer.lemmatize(subj)
            elif (sent_subj==None):
                subj=="0"
                return subj
            else:
                tok_sent_subj=word_tokenize(sent_subj)
                subj=[word for (word, pos) in pos_tag(tok_sent_subj) if is_noun(pos)]
                if not subj:
                    return None
                else:     
                    return lemmatizer.lemmatize(subj[0])
#getting nouns and adjectives from the first object phrase in a sentence
def get_object_phrase_nouns_adj(doc):
    for token in doc:
        if ("dobj" in token.dep_):
            subtree = list(token.subtree)
            start = subtree[0].i
            end = subtree[-1].i + 1
            sent_subj=doc[start:end].text           
            tok_sent_subj=word_tokenize(sent_subj)
            subj=[word for (word, pos) in pos_tag(tok_sent_subj) if (is_noun(pos) |is_adj(pos)) ]
            if not subj:
                return None
            else:
                return list(map(lemmatizer.lemmatize, subj))

#getting nouns and adjectives from the second object phrase in a sentence
def get_second_object_phrase_nouns_adj(doc):
    for token in doc:
        if ("pobj" in token.dep_):
            subtree = list(token.subtree)
            start = subtree[0].i
            end = subtree[-1].i + 1
            sent_subj=doc[start:end].text           
            tok_sent_subj=word_tokenize(sent_subj)
            subj=[word for (word, pos) in pos_tag(tok_sent_subj) if (is_noun(pos) |is_adj(pos))]
            if not subj:
                return None
            else:
                return list(map(lemmatizer.lemmatize, subj))

            
family_members=['father', 'dad', 'mom', 'sis', 'bro', 'granny', 'mother', 'son', 'daughter', 'aunt', 'uncle', 'cousin', 'grandfather', 'grandmother', \
                'granddaughter', 'grandson']
#finding family members in the lsit of words
def word_def_family(doc):
    if doc != None:
        synonyms = []
        for syn in wordnet.synsets(doc):
            for l in syn.lemmas():
                synonyms.append(l.name())
        inters=list(set(synonyms) & set(family_members))
        return str(len(inters))
    else:
        return None

#finding specific keywords in the list of words     
def word_def_keywords(doc):
    if doc != None:
        extract_w=[]
        for word in doc:
            define=wordnet.synsets(word)[0].definition()
            list_w=word_tokenize(define)
            fin_list=list(map(lemmatizer.lemmatize, list_w))
            inters=list(set(fin_list) & set(['body', 'organ','stress', 'nerve', 'anxiety', 'medicine', 'drug']))
            detect=len(inters)
            if detect>0:
                extract_w.append(word)
        return extract_w
    else: 
        return None
#to deal with the list of lists. 
def flatten(doc):
    if doc != None:
        flatter_list = list()
        if len(doc)>1:
            for sub_list in doc:
                if sub_list != None:
                    flatter_list += sub_list
        else:
            flatter_list += doc[0]
        return flatter_list
    else: 
        return []

This piece of code extracts all the nouns and adjectives from the sentence subjects and objects. 

In [5]:
subj_list=list()
obj_noun_adv=list()

for sentence in sent_list:
    doc = nlp(sentence)
    a = get_subject_phrase(doc)
    b = get_object_phrase_nouns_adj(doc)
    c = get_second_object_phrase_nouns_adj(doc)
    obj_noun_adv.append([b,c])
    subj_list.append(a)

I put everything into the Pandas dataframe because it is easier for me to follow what sentences to delete: one line coresponds to one sentence.

In [6]:
df=pd.DataFrame({"subj":subj_list,"nouns_adj":obj_noun_adv})
df['nouns_adj']=df['nouns_adj'].apply(flatten)
df['family']=df['subj'].apply(word_def_family)
df=df[((df['family'] != '0') | (df['subj']=='I')) & (~df['subj'].isnull())]
df['keywords']=df['nouns_adj'].apply(word_def_keywords)
df.head()

Unnamed: 0,subj,nouns_adj,family,keywords
1,I,[Alabama],0,[]
2,I,[headache],0,[headache]
3,I,[],0,[]
4,I,[reason],0,[]
5,I,"[stressful, job, many, client]",0,[stressful]


This is **the list of keywords**.

In [7]:
print(flatten(df['keywords'].to_list()))

['headache', 'stressful', 'head', 'kidney', 'ibuprofen']


## Conclusions
I selected the list of keywords that might present a pharmasist with some questions, e.g. 'Is ibuprofen safe for people with kidney problems?' However, there are a several things to be noticed
1. Currently, the term *headache* and the term *kidney* have the same weight. Actually the patient called because of her headaches and kidney problems are potential (because of family traits). This has to be taken into account. 
2. I did not perform the analysis of verbal phrases. it is important to know that *ibuprofen did not help*. The pharmacists would be encouraged to search for alternatives, to ask what alternatives of ibuprofen were used, what concentration of ibuprofen was taken, etc. 
3. The gender, time of request (e.g. full night moon, Friday night, etc. ), place (maybe there was some unusual pollution in Alabama), device from which it was contacted (phone, computer) could be processed. 
4. I did not make any use of explamation marks and broken sentences like *Very tiring*. They show lack of concentration, stress and hurry/tiredness. 
5. **All in all**, this lady needs a recommendation to go to a doctor and ask for drugs like Xanax. It is a question if a pharmacist wants to give such a recommendation. More thorough analysis has to be performed to propose this suggestion. 