<table align="left" width=100%>
    <tr>
        <td width="10%">
            <img src="../images/RA_Logo.png">
        </td>
        <td>
            <div align="center">
                <font color="#21618C" size=8px>
                  <b> 9. Semantic Analysis </b>
                </font>
            </div>
        </td>
    </tr>
</table>

<table align="left">
  <td>
    <a href="https://colab.research.google.com/github/vidyadharbendre/learn_nlp_using_examples/blob/main/notebooks/09_Semantic_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
  </td>
  <td>
    <a target="_blank" href="https://kaggle.com/kernels/welcome?src=https://github.com/vidyadharbendre/learn_nlp_using_examples/blob/main/notebooks/09_Semantic_Analysis.ipynb"><img src="https://kaggle.com/static/images/open-in-kaggle.svg" /></a>
  </td>
</table>

## What is Semantic Analysis?
Semantic Analysis is the process of understanding the meaning and interpretation of words, phrases, and sentences in context. It involves analyzing the relationships and meanings of words to determine the intended message of the text.

Sentiment Analysis is the process of determining the sentiment expressed in a piece of text, whether it is positive, negative, or neutral. It involves analyzing the subjective information conveyed in the text to understand the emotions and opinions of the author.

## Why Semantic Analysis?
Semantic Analysis is essential for:

Improving the accuracy of NLP tasks like sentiment analysis, machine translation, and information retrieval.
Enhancing the understanding of context and meaning in text.
Supporting more sophisticated text analysis and interpretation.

Sentiment Analysis is crucial for various applications:

Understanding Customer Feedback: Analyzing customer reviews and social media posts to gauge customer sentiment towards products or services.
Brand Monitoring: Monitoring sentiment towards a brand or product to manage reputation and customer satisfaction.
Market Research: Analyzing sentiment in news articles or financial reports to gauge market trends and investor sentiment.

## How to Achieve Semantic Analysis Programmatically?
Using SpaCy:

In [1]:
import spacy

# Print the version of SpaCy installed
print(spacy.__version__)

3.5.4


In [2]:
import spacy

# Print the version of SpaCy installed
print(spacy.__version__)

3.5.4


In [3]:
import spacy

# Print the version of SpaCy installed
print(spacy.__version__)

3.5.4


In [4]:
import spacy

# Load SpaCy's English language model
nlp = spacy.load("en_core_web_sm")

# Example text
text = "Apple is looking at buying a U.K. startup for $1 billion."

# Process the text with SpaCy
doc = nlp(text)

# Semantic analysis using SpaCy
semantic_analysis_spacy = [(token.text, token.vector) for token in doc]

# Display token text and vector
for token_text, token_vector in semantic_analysis_spacy:
    print(f"Token: {token_text}, Vector: {token_vector[:5]}")  # Displaying first 5 dimensions of the vector for brevity

Token: Apple, Vector: [-1.2311026  -1.1917264   0.15840489  0.35988116  0.6805322 ]
Token: is, Vector: [-1.0027504  -0.25142258  0.28945386  0.7576598  -0.58256423]
Token: looking, Vector: [-0.41153106  1.094019    0.73340464  0.08318283 -1.0782878 ]
Token: at, Vector: [ 1.3064765   1.9255978   0.5234667  -1.0940402  -0.33415377]
Token: buying, Vector: [-0.38275096  0.8829504  -0.06646706  0.16533871 -0.43682557]
Token: a, Vector: [0.93501806 0.00710958 0.04742083 1.0769678  0.44846863]
Token: U.K., Vector: [-0.28209645 -1.3021066  -0.2161325   1.3588122   0.06865764]
Token: startup, Vector: [-0.7089213  -0.34339467  0.4557867  -0.7568426  -0.3378846 ]
Token: for, Vector: [-0.05590749  0.15916066  1.2973515  -0.43173665 -0.79201484]
Token: $, Vector: [-0.6496361   1.2837512   0.17862085  0.393975    1.7535145 ]
Token: 1, Vector: [-1.1002016   0.63555384  5.3034716   0.27870944  3.1097717 ]
Token: billion, Vector: [-1.4627483  0.5298873 -1.3364623  1.4983535  3.1686215]
Token: ., Vector

Using NLTK:
NLTK does not have built-in capabilities for vector-based semantic analysis directly. However, you can use word2vec or GloVe pre-trained word vectors from external libraries like gensim to perform semantic analysis. Here's an example using gensim:

In [5]:
#!conda list scipy

In [6]:
#!conda install -c conda-forge gensim

In [7]:
#!conda install --force-reinstall scipy

In [8]:
import scipy
print(scipy.__version__)

1.10.1


In [9]:
#!conda update -n nlp_env scipy

In [10]:
from scipy.linalg import triu

In [11]:
import numpy as np
from scipy.linalg import triu

# Create a sample matrix
A = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])

# Extract the upper triangular part of the matrix
upper_triangular = triu(A)

print("Original Matrix:")
print(A)
print("\nUpper Triangular Part:")
print(upper_triangular)

Original Matrix:
[[1 2 3]
 [4 5 6]
 [7 8 9]]

Upper Triangular Part:
[[1 2 3]
 [0 5 6]
 [0 0 9]]


In [13]:
from scipy.linalg import triu

In [14]:
import nltk
from nltk.tokenize import word_tokenize
from gensim.models import KeyedVectors

# Ensure necessary resources are downloaded
nltk.download('punkt')

# Sample text
text = "Hello, how are you?"

# Tokenize text
tokens = word_tokenize(text)
print(tokens)

['Hello', ',', 'how', 'are', 'you', '?']


[nltk_data] Downloading package punkt to
[nltk_data]     /Users/vidyadharbendre/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [20]:
import os

path_to_model = '/Users/vidyadharbendre/nlp_workspace/learn_nlp_using_examples/GoogleNews-vectors-negative300.bin'
path_to_model

'/Users/vidyadharbendre/nlp_workspace/learn_nlp_using_examples/GoogleNews-vectors-negative300.bin'

In [21]:
if os.path.exists(path_to_model):
    word2vec_model = KeyedVectors.load_word2vec_format(path_to_model, binary=True)
else:
    print(f"Error: File '{path_to_model}' not found.")

In [22]:
import nltk
from nltk.tokenize import word_tokenize
from gensim.models import KeyedVectors

# Ensure necessary resources are downloaded
nltk.download('punkt')

# Load pre-trained Word2Vec model (for example purposes, using a small pre-trained model)
# You can download a larger model from gensim-data or use your own pre-trained model
word2vec_model = KeyedVectors.load_word2vec_format(path_to_model, binary=True)

# Example text
text = "Apple is looking at buying a U.K. startup for $1 billion."

# Tokenize the text
words = word_tokenize(text)

# Semantic analysis using Word2Vec
semantic_analysis_word2vec = [(word, word2vec_model[word]) for word in words if word in word2vec_model]

# Display token text and vector
for word, vector in semantic_analysis_word2vec:
    print(f"Word: {word}, Vector: {vector[:5]}")  # Displaying first 5 dimensions of the vector for brevity

[nltk_data] Downloading package punkt to
[nltk_data]     /Users/vidyadharbendre/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


Word: Apple, Vector: [-0.17480469  0.0300293  -0.21679688  0.15625    -0.35742188]
Word: is, Vector: [ 0.00704956 -0.07324219  0.171875    0.02258301 -0.1328125 ]
Word: looking, Vector: [ 0.02783203  0.25585938  0.15820312 -0.0480957  -0.05395508]
Word: at, Vector: [-0.05859375 -0.03759766  0.07275391  0.10888672  0.06640625]
Word: buying, Vector: [ 0.12109375  0.05664062 -0.2421875   0.18164062 -0.03637695]
Word: U.K., Vector: [ 0.11328125  0.07373047 -0.2890625   0.05639648 -0.11474609]
Word: startup, Vector: [ 0.0390625  -0.08251953  0.00311279 -0.07373047  0.10791016]
Word: for, Vector: [-0.01177979 -0.04736328  0.04467773  0.06347656 -0.01818848]
Word: $, Vector: [ 0.11376953 -0.11767578  0.06494141  0.1328125   0.05493164]
Word: 1, Vector: [ 0.05078125 -0.09326172  0.06494141  0.11425781 -0.06494141]
Word: billion, Vector: [0.04541016 0.02648926 0.09960938 0.13964844 0.01708984]


## Explanation:

What: This snippet demonstrates semantic analysis using NLTK for tokenization and Gensim's Word2Vec model for obtaining word vectors.

Why: NLTK is used for tokenization (word_tokenize) to break down the text into words. Gensim's Word2Vec model (KeyedVectors) is then loaded to obtain vector representations for each word.

How: After tokenizing the text (words = word_tokenize(text)), the script checks if each word exists in the loaded Word2Vec model (word in word2vec_model). If true, it retrieves the vector representation (word2vec_model[word]) for semantic analysis purposes.

## Summary
SpaCy offers an efficient way to perform semantic analysis by leveraging pre-trained word vectors.
NLTK provides tools for text processing and tokenization, which are essential for preparing text for semantic analysis.
Gensim's Word2Vec models enable the extraction of semantic information through word vectors, enhancing the understanding of textual context and meaning.
By using these libraries and models, developers can implement robust semantic analysis solutions for various NLP tasks effectively.

In [23]:
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer

# Sample text
text = "I love this product! It works really well and meets all my expectations."

# Initialize Sentiment Intensity Analyzer
sid = SentimentIntensityAnalyzer()

# Perform sentiment analysis
scores = sid.polarity_scores(text)

# Interpret the sentiment score
if scores['compound'] >= 0.05:
    sentiment = "Positive"
elif scores['compound'] <= -0.05:
    sentiment = "Negative"
else:
    sentiment = "Neutral"

# Display results
print(f"Text: {text}")
print(f"Sentiment: {sentiment}")
print(f"Scores: {scores}")

LookupError: 
**********************************************************************
  Resource [93mvader_lexicon[0m not found.
  Please use the NLTK Downloader to obtain the resource:

  [31m>>> import nltk
  >>> nltk.download('vader_lexicon')
  [0m
  For more information see: https://www.nltk.org/data.html

  Attempted to load [93msentiment/vader_lexicon.zip/vader_lexicon/vader_lexicon.txt[0m

  Searched in:
    - '/Users/vidyadharbendre/nltk_data'
    - '/Users/vidyadharbendre/miniforge3/envs/nlp_env/nltk_data'
    - '/Users/vidyadharbendre/miniforge3/envs/nlp_env/share/nltk_data'
    - '/Users/vidyadharbendre/miniforge3/envs/nlp_env/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - ''
**********************************************************************


## Explanation:

What: The code snippet demonstrates sentiment analysis using NLTK's VADER (Valence Aware Dictionary and sEntiment Reasoner).
Why: NLTK's VADER is chosen for its simplicity and effectiveness in sentiment analysis tasks, especially for social media text and informal language.
How: The SentimentIntensityAnalyzer is initialized (sid = SentimentIntensityAnalyzer()), and then sentiment analysis is performed on the sample text (scores = sid.polarity_scores(text)). Based on the compound score (scores['compound']), which represents the overall sentiment, the sentiment label is determined and displayed.

## Word Embeddings

What are Word Embeddings?
Word Embeddings are a type of word representation that allows words with similar meanings to have a similar representation. They capture semantic relationships between words and can be used to find similarities between words or analyze textual context.

Why Word Embeddings?
Word Embeddings are useful for:

Semantic Similarity: Calculating similarity between words based on their vector representations.

NLP Tasks: Enhancing performance in various NLP tasks such as machine translation, named entity recognition, and sentiment analysis.
Feature Representation: Providing dense, meaningful representations of words for machine learning models.
How to Use Word Embeddings Programmatically?
                                                                            
Using Gensim for Word Embeddings

In [24]:
path_to_model = '/Users/vidyadharbendre/nlp_workspace/learn_nlp_using_examples/GoogleNews-vectors-negative300.bin'
path_to_model

'/Users/vidyadharbendre/nlp_workspace/learn_nlp_using_examples/GoogleNews-vectors-negative300.bin'

In [25]:
from gensim.models import KeyedVectors

# Load pre-trained Word2Vec model (example using a small pre-trained model)
# Replace 'path_to_model' with the actual path to your Word2Vec model
word2vec_model = KeyedVectors.load_word2vec_format(path_to_model, binary=True)

# Example words
words = ['apple', 'orange', 'fruit', 'computer', 'phone']

# Obtain word embeddings
word_embeddings = [(word, word2vec_model[word]) for word in words if word in word2vec_model]

# Display word and corresponding vector
for word, vector in word_embeddings:
    print(f"Word: {word}, Vector: {vector[:5]}")

Word: apple, Vector: [-0.06445312 -0.16015625 -0.01208496  0.13476562 -0.22949219]
Word: orange, Vector: [-0.10498047 -0.18261719  0.09912109  0.26367188 -0.19628906]
Word: fruit, Vector: [-0.05834961  0.06787109 -0.05395508  0.33398438 -0.13574219]
Word: computer, Vector: [ 0.10742188 -0.20117188  0.12304688  0.21191406 -0.09130859]
Word: phone, Vector: [-0.01446533 -0.12792969 -0.11572266 -0.22167969 -0.07373047]


## Explanation:

What: This snippet demonstrates how to use Gensim to load a pre-trained Word2Vec model and obtain word embeddings.
Why: Word embeddings provide meaningful representations of words in a continuous vector space, capturing semantic relationships.
How: The Word2Vec model is loaded (word2vec_model = KeyedVectors.load_word2vec_format('path/to/GoogleNews-vectors-negative300.bin', binary=True)), and then word embeddings are obtained for example words (word2vec_model[word]). The resulting vectors can be used for semantic analysis, similarity calculations, or as features in machine learning models.

## Summary
Sentiment Analysis using NLTK's VADER provides a straightforward way to determine the sentiment expressed in text, useful for various applications like customer feedback analysis.
Word Embeddings obtained using Gensim's Word2Vec models enable the representation of words as dense vectors, capturing semantic meanings and relationships, beneficial for enhancing NLP tasks and machine learning applications.
By utilizing these libraries and models, developers can effectively perform Sentiment Analysis and leverage Word Embeddings to enhance their NLP solutions and text processing applications.