## Sentiment Analysis

### Also referred to as "Text Mining" or "Opnion Analysis"
#### A technique used for text analysis using NLP or ML to assign weighted sentiment scores to the opinion mentioned in the text. 
#### A view or opinion that is held or expressed. 
#### Widely applied to review and survey responses online and on social media, for providing good customer service and marketing. 
#### If 2 people are experessing their opinion then there could be 3 opinions - your opinion, my opinion, and our opinion. 
#### Given the enormity of textual contents on social media, we want to understand the sentiment. 
##### 
### Sentiment Analysis: Techniques
#### 1. Machine Learning Approach
#### 2. Lexicon-Based Approach
#### Product Reviews -> Sentiment Identification (Opinionative words or phrases) -> Feature Selection -> Sentiment Classification -> Sentiment Polarity
### Steps involved
####  1. User gives the feedback or comment on the portal 
####  2. Categorization of possible sentiments is done
####  3. Feature selection is applied after noise removal and text processing
####  4. ML or Lexicon-based approach will be applied for sentiment classification 
####  5. Score polarity is defined. 
##### 
### Sentiment Analysis: Applications
#### 1. Brand Monitoring
#### 2. Customer Service Prioritization
#### 3. Product Analysis
#### 4. Competitive Research
#### 5. Market Research and Insights into Industry Trends
#### 6. Employee Engagement Monitoring
#### 

### VADER - VALENCE AWARE DICTIONARY FOR SENTIMENT REASONING
#### A Lexicon-Based Approach
#### A Rule-Based Approach

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

In [2]:
!pip install vaderSentiment

Defaulting to user installation because normal site-packages is not writeable
You should consider upgrading via the '/usr/local/bin/python3 -m pip install --upgrade pip' command.[0m[33m
[0m

In [4]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

In [5]:
vader_analyzer = SentimentIntensityAnalyzer()

In [8]:
def sentiment_analyzer_calculation(sentence):
    score = vader_analyzer.polarity_scores(sentence)
    print("{:-<40} {}".format(sentence, str(score)))

In [13]:
# Below statement will provide the polarity score.  
# The first one is giving negative sentiment as'neg' which is saying that there is no negative sentiment. 
# The second score is neutral sentiment as 'neu' specifying that it is 0.635
# The third score is positive sentiment as 'pos' specify it as 0.365
# The last one is compound score as 'compound' which is calculating based on negative, neutral, and positive ratings. 
print(sentiment_analyzer_calculation('The NLP Session is Cool'))

The NLP Session is Cool----------------- {'neg': 0.0, 'neu': 0.635, 'pos': 0.365, 'compound': 0.3182}
None


In [14]:
print(sentiment_analyzer_calculation('The NLP Session is Cool!'))

The NLP Session is Cool!---------------- {'neg': 0.0, 'neu': 0.607, 'pos': 0.393, 'compound': 0.3802}
None


In [15]:
print(sentiment_analyzer_calculation('The NLP Session is Cool!!'))

The NLP Session is Cool!!--------------- {'neg': 0.0, 'neu': 0.581, 'pos': 0.419, 'compound': 0.4374}
None


In [16]:
print(sentiment_analyzer_calculation('The NLP Session is Cool!!!'))

The NLP Session is Cool!!!-------------- {'neg': 0.0, 'neu': 0.557, 'pos': 0.443, 'compound': 0.4898}
None


In [24]:
print(sentiment_analyzer_calculation('The NLP Session is Cool!!!!!!'))

The NLP Session is Cool!!!!!!----------- {'neg': 0.0, 'neu': 0.536, 'pos': 0.464, 'compound': 0.5374}
None


In [25]:
print(sentiment_analyzer_calculation('The nlp Session is COOL'))

The nlp Session is COOL----------------- {'neg': 0.0, 'neu': 0.569, 'pos': 0.431, 'compound': 0.4648}
None


In [26]:
print(sentiment_analyzer_calculation('THE NLP SESSION IS COOL'))

THE NLP SESSION IS COOL----------------- {'neg': 0.0, 'neu': 0.635, 'pos': 0.365, 'compound': 0.3182}
None


In [27]:
## The above examples show that scores are different when sentence (cool) is written in differently 
## such as all words n capital vs. all lower case except COOL word capitalized. Similarly, if the sentence 
## is italicized, it will have different scores. 

In [30]:
# Modifier words such as Very, Extremely, Really, Absolutely, Marginally, etc. 
print(sentiment_analyzer_calculation('The nlp session is extremely COOL'))

The nlp session is extremely COOL------- {'neg': 0.0, 'neu': 0.601, 'pos': 0.399, 'compound': 0.5149}
None


In [29]:
print(sentiment_analyzer_calculation('The nlp session is marginally COOL'))

The nlp session is marginally COOL------ {'neg': 0.0, 'neu': 0.646, 'pos': 0.354, 'compound': 0.4098}
None


In [31]:
print(sentiment_analyzer_calculation("The nlp session is cool but I don't like the complexity"))

The nlp session is cool but I don't like the complexity {'neg': 0.2, 'neu': 0.676, 'pos': 0.124, 'compound': -0.2535}
None


In [32]:
import pandas as pd
import numpy as np
import pickle
import sys
import os
import io
import re
from sys import path

###
### Natural Language Generation 

#### A part of AI and computational linguistics that mainly focuses on computer systems which can produce understandable text in human language. 
#### It converts a computer-based representation into natual language representation which is the opposite process of natural language understanding (NLU). 
### Stages in NLG
#### 1. Content Determination
#### 2. Document Structuring
#### 3. Lexical Selection (Lexicon means words)
#### 4. Expression Generation
#### 5. Aggregation and Realization
### Response Generation Mechanism
#### 1. Generative Based Model
#### 2. Retrieval Based Model
### Applications of NLG
#### 1. Autocomplete
#### 2. Translation
#### 3. Question Answering
#### 4. Dialogue System
### Retrieval Based Model: Introduction
#### 1. The model creates responses from a bunch of predefined patterns 
#### 2. Input and context are important parameters to pick the responses. 
#### 3. It uses heuristics to fetch the best result from the available repsonses
#### 4. The score is generated for picking the relevant responses. 
### Retrieval Based Model: Cons
#### 1. No new text generation possible due to its fixed nature
#### 2. Lots of heuristics are written due to which the system is not intelligent
#### 3. Can handle only predefined scenarios
### Retrieval Based Model: Pros
#### 1. Less chances of error as the system consists of grammatically correct responses
#### 2. Suitable for customer satisfaction and business problems
#### 3. Requires less effort and data
#### 

### Examples: 
#### Search Engine or Document Retrieval System
#### 1. Used in information retrieval systems
#### 2. Knowledge base is set of documents and input is a search term or query
#### 3. Task is to retrieve documents that are most relevant to the search query

In [33]:
#### Vinyals and Le Approach
#### ELIZA
#### AIML

In [35]:
### 1. XML Based Markup Language

### Artificial Intelligence Markup Language 
#### Used in Chatbot, Superbot, Verbot, Pandorabot

### Generative Based Model

#### A statistical Model of the joint probability distribution of X and Y:
####   X is observable
####   Y is target
#### Describes how a dataset is created, in terms of probabilistic model
#### Learns any kind of data distribution, using unsupervised learning model 

### Apporaches
#### 1. Variation Autoencoders (VAE)
#### 2. Generative Adversarial Networks (GAN)
### Examples
#### Create a model that can generate a new image of dog:
####   Input: Dataset of dogs
####   Model: To learn generic rules to create new outputs
####   Output: New images of dogs
#### Training Data -> Generatie Model (Random Noise) -> Generative Sample
#### argmaxy P(Y=y|X=x)=argmaxy P(Y=y, X=x)/P(X=x) and since P(X=x) is constant on the RHS, this equals to argmaxy P(Y=y, X=x)
#### Generative models are capable of more than just prediction i.e. maximizing P(Y|X=x).  By estimating P(Y,X) and able to sample X,Y pairs.
#### 
#### Generative model can be used to 
#### 1. Impute missing data
#### 2. Compress the dataset
#### 3. Generate unseen data
#### 
### Language Modeling
#### 1. Used to compute a probability of a token.
#### 2. Way of statistical analysis of natural language.
#### 3. One of the fundamental tasks of NLP that has many applications. 
### Language Modeling is used for
#### 1. Word Prediction: Probability distribution of sequence of words
#### 2. Spelling Correction
#### 3. Automative Speech Recognition
#### 4. Machine Translation
### Next Word Prediction
#### 
### Define Language Models
#### Calculate the probability of a sentence of sequence of words, P(W) = P(w1, w2, w3,....,wn) - Condition Probability of Chain Rule
#### Markov Assumption: 
#### The conditional probability of a sentence of sequence of words (P