# Semantic Similarity - Experiment 03
Employing the models trained in the previous notebooks to evaluate semantic similarity in the SICK dataset.

## Introduction

### Libraries

In [3]:
import sys
sys.path.append("../source")  # Add the directory 'source' to sys.path

In [5]:
from sca_utils import TextClassifier

## Using SCA_utils methods:

In [11]:
## Instantiating the SCA Text Classifier:
classifier = TextClassifier(model_multiclass_path='../models/model_02_E.h5',
                            encoder_multiclass_path='../models/encoder_oneHot_E.pickle',
                            model_regression_path='../models/model_01_D2.h5')

### Finding similar words:

In [12]:
## Replace keyword for the word of interest:
keyword = "technology"

similar_words = classifier.similarity_findWords(keyword, n=5)

print(f'--- Similar words for: "{keyword}":')
for word, similarity in similar_words:
    print(f'"{word}", similarity of {similarity:.2f}%')

--- Similar words for: "technology":
"technologic", similarity of 0.95%
"technologies", similarity of 0.94%
"technologie", similarity of 0.93%
"technological", similarity of 0.90%
"technoscience", similarity of 0.89%


### Comparing similarity between words or sentences:

In [13]:
## Define an input word or sentence to be compared:
input_text = 'Last things for last!'

## Define a set of reference words or sentences to which input_word will be compared with:
reference_texts = ["First things first",
                   "another example",
                   "universal sentence encoder", 
                   "natural language processing"]

In [14]:
## Find the closest embeddings
closest_embeddings = classifier.similarity_compareSentences(input_text, reference_texts)

## Show the results
for text, similarity in closest_embeddings:
    print(f"Text: {text}, Similarity: {similarity:.2f}%")

Text: First things first, Similarity: 0.40%
Text: another example, Similarity: 0.08%
Text: universal sentence encoder, Similarity: 0.06%
Text: natural language processing, Similarity: 0.01%


### Estimating subjective and objective load for a given sentence:

In [15]:
## Regression inference for a given word:
classifier.textClassifier_regression('happyness')

--- happyness:
20.03 of objectivity
78.65 of subjectivity


In [17]:
## Multiclass classification for a given word:
classifier.textClassifier_multiclass('happyness')

--- SCA: "happyness" has Latent content.
Model used: Model_02_E_regularized
