**Identify Advocates in Own Customer Data**

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv("data/ci-data.csv")

In [3]:
df.remarks[0:5]

0    In hac habitasse platea dictumst. Etiam faucib...
1    Praesent blandit. Nam nulla. Integer pede just...
2    Praesent id massa id nisl venenatis lacinia. A...
3    In hac habitasse platea dictumst. Morbi vestib...
4    Pellentesque at nulla. Suspendisse potenti. Cr...
Name: remarks, dtype: object

We want to deconstruct the strings below and find remarks that are similar to these

In [4]:
remarks = ['This is the best bank on the planet.',
           'Lots of changes to their savings account product. It is terrible.',
           'The new app takes some getting used to but it is good once you learn it']

sentence tokenizer (punkt), natural language toolkit

In [5]:
from sklearn.feature_extraction.text import CountVectorizer
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\vince\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping tokenizers\punkt.zip.


True

Convert remarks collection into a matrix of token counts

https://towardsdatascience.com/natural-language-processing-count-vectorization-with-scikit-learn-e7804269bb5e

In [6]:
remarks_token_counts = CountVectorizer(min_df=1, tokenizer=nltk.word_tokenize)

Transform to vector of word frequency counts

In [7]:
remarks_as_sparse_vector = remarks_token_counts.fit_transform(remarks)

In [15]:
#Map unique words to indexes
remarks_token_counts.vocabulary_

{'this': 25,
 'is': 9,
 'the': 23,
 'best': 4,
 'bank': 3,
 'on': 15,
 'planet': 17,
 '.': 0,
 'lots': 12,
 'of': 14,
 'changes': 6,
 'to': 26,
 'their': 24,
 'savings': 19,
 'account': 1,
 'product': 18,
 'it': 10,
 'terrible': 22,
 'new': 13,
 'app': 2,
 'takes': 21,
 'some': 20,
 'getting': 7,
 'used': 27,
 'but': 5,
 'good': 8,
 'once': 16,
 'you': 28,
 'learn': 11}

In [11]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

In [12]:
#Assign class to analyser variable
analyser = SentimentIntensityAnalyzer()

In [13]:
#pull sentence in and provide score 
def sentiment_analyser_scores(sentence):
    score = analyser.polarity_scores(sentence)
    print("{}{}".format(sentence, str(score)))

In [14]:
sentiment_analyser_scores("Best!!!!")

Best!!!!{'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.7482}


In [19]:
sentiment_analyser_scores('ok!!')

ok!!{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}


In [20]:
for x in remarks:
    print(sentiment_analyser_scores(x))

This is the best bank on the planet.{'neg': 0.0, 'neu': 0.625, 'pos': 0.375, 'compound': 0.6369}
None
Lots of changes to their savings account product. It is terrible.{'neg': 0.237, 'neu': 0.763, 'pos': 0.0, 'compound': -0.4767}
None
The new app takes some getting used to but it is good once you learn it{'neg': 0.0, 'neu': 0.796, 'pos': 0.204, 'compound': 0.5927}
None
