# Semantic Content Classifier (SCC)
In this Jupyter Notebook, we delve into practical applications of the SCA_utils.py library, which stands at the forefront of Semantic Content Analysis (SCA).



## Introduction

### Libraries

In [2]:
import sys
sys.path.append("../source")  # Add the directory 'source' to sys.path

In [3]:
from sca_utils import TextClassifier




## Using SCA_utils methods:

In [4]:
## Instantiating the SCA Text Classifier:
classifier = TextClassifier(model_multiclass_path='../models/model_02_E.h5',
                            encoder_multiclass_path='../models/encoder_oneHot_E.pickle',
                            model_regression_path='../models/model_01_D2.h5')



















### Getting the embedded vector for a given word:

In [22]:
## Experimenting with an existent word
word_text, word_vector = classifier.nlp_getVector(word='None')
print(word_text)
print(f'{word_vector[:2]}..')


None
[-0.0562946  -0.00365345]..


In [21]:
## Trying a none word to check how exceptions are being raised:
classifier.nlp_getVector(word=None)

Error: word vector not available for None due to:
[E1041] Expected a string, Doc, or bytes as input, but got: <class 'NoneType'>


### Finding similar words:

In [5]:
## Replace keyword for the word of interest:
keyword = "technology"

similar_words = classifier.similarity_findWords(keyword, n=5)

print(f'--- Similar words for: "{keyword}":')
for word, similarity in similar_words:
    print(f'"{word}", similarity of {similarity:.2f}%')

--- Similar words for: "technology":
"technologic", similarity of 0.95%
"technologies", similarity of 0.94%
"technologie", similarity of 0.93%
"technological", similarity of 0.90%
"technoscience", similarity of 0.89%


### Comparing similarity between words or sentences:

In [6]:
## Define an input word or sentence to be compared:
input_text = 'Last things for last!'

## Define a set of reference words or sentences to which input_word will be compared with:
reference_texts = ["First things first",
                   "another example",
                   "universal sentence encoder", 
                   "natural language processing"]

In [7]:
## Find the closest embeddings
closest_embeddings = classifier.similarity_compareSentences(input_text, reference_texts)

## Show the results
for text, similarity in closest_embeddings:
    print(f"Text: {text}, Similarity: {similarity:.2f}%")

Text: First things first, Similarity: 0.40%
Text: another example, Similarity: 0.08%
Text: universal sentence encoder, Similarity: 0.06%
Text: natural language processing, Similarity: 0.01%


### Estimating subjective and objective load for a given sentence:

In [28]:
## Regression inference for a given word:
classifier.textClassifier_regression('happyness')

--- happyness:
20.03 of objectivity
78.65 of subjectivity


In [24]:
## Multiclass classification for a given word:
classifier.textClassifier_multiclass('happyness')

--- SCA: "happyness" has Latent content.
Model used: Model_02_E_regularized


In [34]:
## Regression inference for a given sentence:
classifier.textClassifier_regression('Your sample sentence')

--- Your sample sentence:
38.72 of objectivity
15.49 of subjectivity
