#Problem Statement: Natural Language Processing of Yelp Recommendation



Link to the Dataset: https://www.kaggle.com/c/yelp-recsys-2013/data
(https://www.kaggle.com/c/yelp-recsys-2013/data)
Description of Data: Each observation in this dataset is a review of a particular business by a
particular user. The "stars" column is the number of stars (1 through 5) assigned by the reviewer to
the business. (Higher stars is better.) In other words, it is the rating of the business by the person who
wrote the review.


#Steps
1. Read the yelp.csv file and set it as a Dataframe called yelp. Check the head, info, and describe
methods on yelp 
2. Remove punctuations and stopwords from the text in ‘text’ column 
3. Create two objects X and y. X will be the 'text' column of yelp dataframe and y will be the 'stars'
column of yelp. Create a CountVectorizer object and split the data into training and testing sets.
Train a MultinomialNB model and Display the confusion Matrix 
4. Display the HMM POS tagging on the first 4 rows of ‘text’ 
5. Parse the first 4 rows of ‘text’ using Viterbi Parser 

# Import the necessary libraries

In [1]:
import pandas as pd

In [2]:
#Confirming the working directory!
from os import chdir, getcwd
wd=getcwd()
#wd
chdir(wd)
#wd
chdir(wd+"/Dataset/yelp_dataset/yelp_training_set")
#wd=getcwd()
#wd

## 1. Read the yelp file into Dataframes, Check the head, info, and describe methods

In [3]:
# Read the Json data file
yelp_review_data = pd.read_json('yelp_training_set_review.json', lines=True)

In [4]:
#nltk.download()
import nltk

In [5]:
dir(nltk)

['AbstractLazySequence',
 'AffixTagger',
 'AlignedSent',
 'Alignment',
 'AnnotationTask',
 'ApplicationExpression',
 'Assignment',
 'BigramAssocMeasures',
 'BigramCollocationFinder',
 'BigramTagger',
 'BinaryMaxentFeatureEncoding',
 'BlanklineTokenizer',
 'BllipParser',
 'BottomUpChartParser',
 'BottomUpLeftCornerChartParser',
 'BottomUpProbabilisticChartParser',
 'Boxer',
 'BrillTagger',
 'BrillTaggerTrainer',
 'CFG',
 'CRFTagger',
 'CfgReadingCommand',
 'ChartParser',
 'ChunkParserI',
 'ChunkScore',
 'Cistem',
 'ClassifierBasedPOSTagger',
 'ClassifierBasedTagger',
 'ClassifierI',
 'ConcordanceIndex',
 'ConditionalExponentialClassifier',
 'ConditionalFreqDist',
 'ConditionalProbDist',
 'ConditionalProbDistI',
 'ConfusionMatrix',
 'ContextIndex',
 'ContextTagger',
 'ContingencyMeasures',
 'CoreNLPDependencyParser',
 'CoreNLPParser',
 'Counter',
 'CrossValidationProbDist',
 'DRS',
 'DecisionTreeClassifier',
 'DefaultTagger',
 'DependencyEvaluator',
 'DependencyGrammar',
 'DependencyGrap

In [6]:
yelp_review_data.head()

Unnamed: 0,votes,user_id,review_id,stars,date,text,type,business_id
0,"{'funny': 0, 'useful': 5, 'cool': 2}",rLtl8ZkDX5vH5nAx9C3q5Q,fWKvX83p0-ka4JS3dc6E5A,5,2011-01-26,My wife took me here on my birthday for breakf...,review,9yKzy9PApeiPPOUJEtnvkg
1,"{'funny': 0, 'useful': 0, 'cool': 0}",0a2KyEL0d3Yb1V6aivbIuQ,IjZ33sJrzXqU-0X6U8NwyA,5,2011-07-27,I have no idea why some people give bad review...,review,ZRJwVLyzEJq1VAihDhYiow
2,"{'funny': 0, 'useful': 1, 'cool': 0}",0hT2KtfLiobPvh6cDC8JQg,IESLBzqUCLdSzSqm0eCSxQ,4,2012-06-14,love the gyro plate. Rice is so good and I als...,review,6oRAC4uyJCsJl1X0WZpVSA
3,"{'funny': 0, 'useful': 2, 'cool': 1}",uZetl9T0NcROGOyFfughhg,G-WvGaISbqqaMHlNnByodA,5,2010-05-27,"Rosie, Dakota, and I LOVE Chaparral Dog Park!!...",review,_1QQZuf4zZOyFCvXc0o6Vg
4,"{'funny': 0, 'useful': 0, 'cool': 0}",vYmM4KTsC8ZfQBg-j5MWkw,1uJFq2r5QfJG_6ExMRCaGw,5,2012-01-05,General Manager Scott Petello is a good egg!!!...,review,6ozycU1RpktNG2-1BroVtw


In [7]:
yelp_review_data.describe()

Unnamed: 0,stars
count,229907.0
mean,3.766723
std,1.21701
min,1.0
25%,3.0
50%,4.0
75%,5.0
max,5.0


In [8]:
yelp_review_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 229907 entries, 0 to 229906
Data columns (total 8 columns):
 #   Column       Non-Null Count   Dtype         
---  ------       --------------   -----         
 0   votes        229907 non-null  object        
 1   user_id      229907 non-null  object        
 2   review_id    229907 non-null  object        
 3   stars        229907 non-null  int64         
 4   date         229907 non-null  datetime64[ns]
 5   text         229907 non-null  object        
 6   type         229907 non-null  object        
 7   business_id  229907 non-null  object        
dtypes: datetime64[ns](1), int64(1), object(6)
memory usage: 14.0+ MB


In [9]:
yelp_review_data.groupby('stars').describe()

Unnamed: 0_level_0,business_id,business_id,business_id,business_id,business_id,business_id,date,date,date,date,...,user_id,user_id,user_id,user_id,votes,votes,votes,votes,votes,votes
Unnamed: 0_level_1,count,unique,top,freq,first,last,count,unique,top,freq,...,top,freq,first,last,count,unique,top,freq,first,last
stars,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
1,17516,6292,bA-Cj6N9TEMlDlOh2aAnUw,95,,,17516,2018,2013-01-02 00:00:00,47,...,q9XgOylNsSbqZqF_SO3-OQ,33,,,17516,718,"{'funny': 0, 'useful': 0, 'cool': 0}",5129,,
2,20957,5655,hW0Ne_HTHEAgGF1rAdmR-g,140,,,20957,2132,2013-01-02 00:00:00,31,...,joIzw_aUiNvBTuGoytrH7g,59,,,20957,788,"{'funny': 0, 'useful': 0, 'cool': 0}",6332,,
3,35363,6723,hW0Ne_HTHEAgGF1rAdmR-g,280,,,35363,2293,2011-03-14 00:00:00,57,...,0bNXP9quoJEgyVZu9ipGgQ,158,,,35363,944,"{'funny': 0, 'useful': 0, 'cool': 0}",12836,,
4,79878,8683,JokKtdXU7zXHcr20Lrk29A,304,,,79878,2397,2013-01-03 00:00:00,139,...,fczQCSmaWF78toLEmb0Zsw,298,,,79878,1383,"{'funny': 0, 'useful': 0, 'cool': 0}",29524,,
5,76193,9285,VVeogjZya58oiTxK7qUjAQ,363,,,76193,2371,2013-01-02 00:00:00,129,...,4ozupHULqGyO42s3zNUzOQ,164,,,76193,1376,"{'funny': 0, 'useful': 0, 'cool': 0}",29002,,


## 2. Remove punctuations and stopwords from the text in ‘text’ column 

In [9]:
#Remove Punctuation
import string
string.punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

In [10]:
#Remove stopwords
from nltk.corpus import stopwords

In [11]:
stopwords.words('english')[0:10] # Show some stop words

['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're"]

In [12]:
#let's put both of these together in a function to apply it to our DataFrame later on(tokenized):
def text_process(mess):
    """
    Takes in a string of text, then performs the following:
    1. Remove all punctuation
    2. Create Tokens
    2. Remove all stopwords
    3. Returns a list of the cleaned/processed text
    """
    # Check characters to see if they are in punctuation
    nopunc = [char for char in mess if char not in string.punctuation]

    # Join the characters again to form the string.
    nopunc = ''.join(nopunc)
    #tokenize
    #tokens = re.split('\W+', nopunc)    
    # Now just remove any stopwords
    return [word for word in nopunc.split() if word.lower() not in stopwords.words('english')]


### **Since this is a time consuming step only a sample of data (20 rows) are preprocessed
    ### yelp_review_data['processed_text']=yelp_review_data['text'].apply(text_process)
### Cleaned text is placed under processed_text column and displayed here
### The Entire text data is cleaned and tokenized in the Step 3 (Model Training)


In [13]:

data_sample = yelp_review_data[0:20]
data_sample['processed_text']=data_sample['text'].apply(text_process)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [16]:
data_sample.head()

Unnamed: 0,votes,user_id,review_id,stars,date,text,type,business_id,processed_text
0,"{'funny': 0, 'useful': 5, 'cool': 2}",rLtl8ZkDX5vH5nAx9C3q5Q,fWKvX83p0-ka4JS3dc6E5A,5,2011-01-26,My wife took me here on my birthday for breakf...,review,9yKzy9PApeiPPOUJEtnvkg,"[wife, took, birthday, breakfast, excellent, w..."
1,"{'funny': 0, 'useful': 0, 'cool': 0}",0a2KyEL0d3Yb1V6aivbIuQ,IjZ33sJrzXqU-0X6U8NwyA,5,2011-07-27,I have no idea why some people give bad review...,review,ZRJwVLyzEJq1VAihDhYiow,"[idea, people, give, bad, reviews, place, goes..."
2,"{'funny': 0, 'useful': 1, 'cool': 0}",0hT2KtfLiobPvh6cDC8JQg,IESLBzqUCLdSzSqm0eCSxQ,4,2012-06-14,love the gyro plate. Rice is so good and I als...,review,6oRAC4uyJCsJl1X0WZpVSA,"[love, gyro, plate, Rice, good, also, dig, can..."
3,"{'funny': 0, 'useful': 2, 'cool': 1}",uZetl9T0NcROGOyFfughhg,G-WvGaISbqqaMHlNnByodA,5,2010-05-27,"Rosie, Dakota, and I LOVE Chaparral Dog Park!!...",review,_1QQZuf4zZOyFCvXc0o6Vg,"[Rosie, Dakota, LOVE, Chaparral, Dog, Park, co..."
4,"{'funny': 0, 'useful': 0, 'cool': 0}",vYmM4KTsC8ZfQBg-j5MWkw,1uJFq2r5QfJG_6ExMRCaGw,5,2012-01-05,General Manager Scott Petello is a good egg!!!...,review,6ozycU1RpktNG2-1BroVtw,"[General, Manager, Scott, Petello, good, egg, ..."


## 3. CountVectorize, Train MultinomialNB model and Display the confusion Matrix

In [14]:
from sklearn.feature_extraction.text import CountVectorizer

In [15]:
# CountVectorize for a Sample data, Entire data is done during the Model Training Step
count_vectors = CountVectorizer(analyzer=text_process)
X_counts = count_vectors.fit_transform(data_sample['text'])
# Print total number of vocab words
print(len(count_vectors.vocabulary_))
print(count_vectors.get_feature_names())

912
['100', '10pm', '11', '15', '16th', '2', '21', '22', '23', '25', '3', '3Great', '4', '45', '475', '5', '550', '552', '602', '642', '75', 'ATM', 'Absolutely', 'Also', 'Anyhow', 'Anyway', 'Apollo', 'Apparently', 'Arizona', 'Around', 'Arrogant', 'Ascent', 'BBQ', 'Bad', 'Baja', 'Bastard', 'Beach', 'Beef', 'Bloody', 'Boo', 'Brandon', 'Bring', 'Burro', 'Carefully', 'Chaparral', 'Cheese', 'Chicken', 'Chile', 'Clean', 'Condesa', 'DEFINITELY', 'Dakota', 'Dawn', 'Definitely', 'Deli', 'Dept', 'Diego', 'Dog', 'Dogfish', 'Drop', 'Dusk', 'EVER', 'EVERYTHING', 'Everyone', 'Except', 'Farm', 'Followed', 'Franks', 'Fresh', 'Frida', 'Friend', 'Full', 'General', 'Giant', 'Good', 'Grand', 'Green', 'Groupon', 'Groupons', 'Happy', 'Heres', 'Hes', 'Hey', 'Hopefully', 'Hot', 'Id', 'Ill', 'Im', 'Inside', 'Irish', 'Ive', 'Jasons', 'Jet', 'Kahlo', 'Kitchen', 'LOVE', 'La', 'Like', 'Loved', 'Luckily', 'Mac', 'Manager', 'Mandarin', 'Marcy', 'Mary', 'Maybe', 'Mexican', 'Mistakes', 'Next', 'Nobuo', 'Oakville', 'Op

In [16]:
# Display the Count Vectors for Sample data
X_counts_df = pd.DataFrame(X_counts.toarray())
X_counts_df.columns = count_vectors.get_feature_names()
#Vector Dimension
print(X_counts.shape)
X_counts_df

(20, 912)


Unnamed: 0,100,10pm,11,15,16th,2,21,22,23,25,...,wouldnt,wrapped,written,x,xeriscape,yelping,yesterday,yet,youre,yummy
0,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
6,0,0,0,0,1,0,0,0,0,0,...,0,1,0,0,0,1,0,0,1,1
7,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [17]:
from sklearn.model_selection import train_test_split

#Split the data into Training and Test Set

#Complete Data
X_train, X_test, y_train, y_test = train_test_split(yelp_review_data['text'], yelp_review_data['stars'], test_size = 0.1, random_state = 1)
#Sample Data
#X_train, X_test, y_train, y_test = train_test_split(data_sample['text'], data_sample['stars'], test_size = 0.1, random_state = 1)

count_vectors = CountVectorizer(analyzer=text_process)
X_train = count_vectors.fit_transform(X_train)

In [21]:
# Print total number of vocab words
print(len(count_vectors.vocabulary_))
print(X_train.shape)
print(count_vectors.get_feature_names()[20000:210000])

254035
(206916, 254035)


In [25]:
from sklearn.naive_bayes import MultinomialNB

# Train a MultinominalNB model

ratings_model = MultinomialNB().fit(X_train, y_train)
all_predictions = ratings_model.predict(X_train)
print(list(all_predictions))

[5, 2, 1, 3, 5, 4, 3, 4, 4, 4, 4, 5, 4, 4, 4, 1, 5, 5, 4, 1, 4, 4, 4, 5, 2, 4, 1, 4, 4, 5, 4, 4, 4, 5, 2, 2, 5, 4, 5, 4, 5, 5, 4, 5, 2, 3, 5, 2, 1, 5, 4, 3, 2, 1, 4, 4, 5, 3, 1, 4, 4, 3, 5, 4, 5, 5, 2, 4, 4, 3, 4, 5, 4, 3, 3, 3, 4, 3, 5, 5, 5, 3, 4, 3, 5, 5, 1, 2, 3, 4, 2, 2, 4, 1, 1, 4, 5, 5, 5, 5, 1, 5, 5, 1, 5, 2, 4, 3, 1, 4, 4, 4, 4, 4, 5, 2, 4, 5, 4, 4, 5, 4, 2, 5, 4, 1, 5, 3, 4, 5, 3, 5, 5, 5, 3, 5, 4, 4, 4, 5, 2, 2, 2, 4, 4, 2, 5, 3, 4, 5, 3, 2, 3, 5, 5, 5, 4, 5, 5, 5, 2, 3, 1, 4, 4, 1, 4, 1, 4, 4, 4, 1, 4, 4, 5, 5, 4, 5, 4, 4, 1, 4, 5, 4, 4, 3, 4, 4, 4, 4, 5, 4, 2, 3, 5, 4, 4, 5, 4, 3, 5, 1, 4, 3, 4, 4, 4, 3, 4, 4, 3, 3, 4, 3, 4, 5, 1, 5, 4, 4, 3, 4, 5, 2, 4, 4, 5, 5, 5, 1, 5, 4, 5, 4, 4, 3, 5, 5, 4, 1, 5, 1, 4, 5, 2, 2, 3, 5, 4, 3, 4, 3, 4, 4, 4, 5, 4, 5, 2, 1, 4, 3, 4, 4, 4, 1, 3, 4, 2, 1, 5, 3, 4, 5, 5, 4, 5, 3, 4, 5, 5, 4, 3, 5, 4, 4, 4, 1, 4, 4, 5, 1, 4, 1, 5, 4, 4, 4, 5, 5, 5, 4, 4, 1, 4, 5, 4, 3, 1, 4, 5, 1, 5, 5, 4, 4, 1, 4, 5, 5, 4, 4, 5, 5, 5, 5, 5, 4, 5, 4, 1, 4, 1, 

In [26]:
from sklearn.metrics import classification_report

#Display the Confusion Matrix for predicted and ground truth Star ratings

print (classification_report(y_train, all_predictions))

              precision    recall  f1-score   support

           1       0.63      0.62      0.62     15805
           2       0.66      0.43      0.52     18866
           3       0.64      0.49      0.56     31727
           4       0.61      0.77      0.68     71890
           5       0.75      0.70      0.72     68628

    accuracy                           0.66    206916
   macro avg       0.66      0.60      0.62    206916
weighted avg       0.67      0.66      0.66    206916



# 4.	Display the HMM POS tagging on the first 4 rows of ‘text’ 

In [27]:
nltk.download('averaged_perceptron_tagger')
nltk.download('punkt')

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/honeywell/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package punkt to /home/honeywell/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [28]:
# POS tagging for the first 4 rows of text
POSTag_Data = yelp_review_data[0:4]
# Show first 4 rows
print(POSTag_Data['text'])

count_vect_POSTag_Data = CountVectorizer(analyzer=text_process)
Word_Count = count_vect_POSTag_Data.fit_transform(POSTag_Data['text'])
nltk.pos_tag(count_vect_POSTag_Data.get_feature_names())

0    My wife took me here on my birthday for breakf...
1    I have no idea why some people give bad review...
2    love the gyro plate. Rice is so good and I als...
3    Rosie, Dakota, and I LOVE Chaparral Dog Park!!...
Name: text, dtype: object


[('11', 'CD'),
 ('2', 'CD'),
 ('550', 'CD'),
 ('552', 'CD'),
 ('602', 'CD'),
 ('642', 'CD'),
 ('Anyway', 'NNP'),
 ('Beef', 'NNP'),
 ('Bloody', 'NNP'),
 ('Chaparral', 'NNP'),
 ('Dakota', 'NNP'),
 ('Dept', 'NNP'),
 ('Dog', 'NNP'),
 ('EVERYTHING', 'NNP'),
 ('Everyone', 'NNP'),
 ('Heres', 'NNP'),
 ('Im', 'NNP'),
 ('Ive', 'NNP'),
 ('LOVE', 'NNP'),
 ('Mary', 'NNP'),
 ('PM', 'NNP'),
 ('Park', 'NNP'),
 ('Rec', 'NNP'),
 ('Rice', 'NNP'),
 ('Rosie', 'NNP'),
 ('Saturday', 'NNP'),
 ('Scottsdale', 'NNP'),
 ('Sunday', 'NNP'),
 ('absolute', 'VBP'),
 ('absolutely', 'RB'),
 ('also', 'RB'),
 ('amazing', 'VBG'),
 ('area', 'NN'),
 ('arrived', 'VBD'),
 ('awesome', 'JJ'),
 ('back', 'RB'),
 ('bad', 'JJ'),
 ('baked', 'VBD'),
 ('ballparks', 'NNS'),
 ('baseball', 'NN'),
 ('best', 'RB'),
 ('better', 'RB'),
 ('birthday', 'JJ'),
 ('blend', 'VBP'),
 ('box', 'NN'),
 ('bread', 'NN'),
 ('breakfast', 'NN'),
 ('calzone', 'NN'),
 ('came', 'VBD'),
 ('candy', 'JJ'),
 ('cans', 'NNS'),
 ('cant', 'JJ'),
 ('case', 'NN'),
 ('cle

# 5. Parse any 3 sentences using Viterbi Parser

In [29]:
from nltk import sent_tokenize
from nltk import word_tokenize
from nltk.grammar import toy_pcfg1 
from nltk.grammar import toy_pcfg2 
from nltk import ViterbiParser

In [30]:
# 3 sentences to be parsed using Viterbi Parser..
yelp_sent_parser = [('Bob saw Jack with a cookie', toy_pcfg2),
                   ('the boy ran under the table', toy_pcfg2),
                   ('I saw John with the telescope', toy_pcfg1)]



In [31]:
# Loop for all 3 sentences and corresponding grammar defined above in yelp_sent_parser
for i in range(3):
    #Sentence 2
    sent, grammar = yelp_sent_parser[i]
    # Define a list of parsers
    parser = ViterbiParser(grammar)
    print('\n \n VITERBI PARSING FOR SENTENCE ',i+1,'.............')
    print('--------------------------------------------------')
    print('\n sentence: %s\n parser: %s\n grammar_rules: %s' % (sent,parser,grammar))
    # Tokenize the sentence using word tokenizer
    tokens = word_tokenize(sent)
    # Print tokens in sentence
    print('Tokens for Sentence ', i+1, tokens)
    parser.trace(3)
    # Parse and print tree with probabilities
    for parse in parser.parse_all(tokens):
        print(parse)


 
 VITERBI PARSING FOR SENTENCE  1 .............
--------------------------------------------------

 sentence: Bob saw Jack with a cookie
 parser: <ViterbiParser for <Grammar with 23 productions>>
 grammar_rules: Grammar with 23 productions (start state = S)
    S -> NP VP [1.0]
    VP -> V NP [0.59]
    VP -> V [0.4]
    VP -> VP PP [0.01]
    NP -> Det N [0.41]
    NP -> Name [0.28]
    NP -> NP PP [0.31]
    PP -> P NP [1.0]
    V -> 'saw' [0.21]
    V -> 'ate' [0.51]
    V -> 'ran' [0.28]
    N -> 'boy' [0.11]
    N -> 'cookie' [0.12]
    N -> 'table' [0.13]
    N -> 'telescope' [0.14]
    N -> 'hill' [0.5]
    Name -> 'Jack' [0.52]
    Name -> 'Bob' [0.48]
    P -> 'with' [0.61]
    P -> 'under' [0.39]
    Det -> 'the' [0.41]
    Det -> 'a' [0.31]
    Det -> 'my' [0.28]
Tokens for Sentence  1 ['Bob', 'saw', 'Jack', 'with', 'a', 'cookie']
Inserting tokens into the most likely constituents table...
   Insert: |=.....| Bob
   Insert: |.=....| saw
   Insert: |..=...| Jack
   Insert: