In [1]:
import os

if "jbook" in os.getcwd():
    os.chdir(os.path.abspath(os.path.join("../..")))
import warnings

warnings.filterwarnings("ignore")
FORCE = False

# Sentiment Classification 
This notebook leverages a **DistilBERT-based Sentiment Classification Model**, specifically the `tabularisai/robust-sentiment-analysis` model, to perform sentiment analysis. The goal is to efficiently analyze and classify sentiment within a dataset for the purposes of **Data Quality Assessment (DQA)** and **Exploratory Data Analysis (EDA)**. By using an 'off-the-shelf', pre-trained model, we gain a sense of sentiment class balance, and insights with a computational efficient technique.  

## Model Overview
- **Model Name**: `tabularisai/robust-sentiment-analysis`
- **Base Model**: `distilbert/distilbert-base-uncased`
- **Task**: Text Classification (Sentiment Analysis)
- **Language**: English
- **Number of Classes**: 5 sentiment categories:
  - **Very Negative**
  - **Negative**
  - **Neutral**
  - **Positive**
  - **Very Positive**

## Model Description
This model is a fine-tuned version of `distilbert-base-uncased`, optimized for sentiment analysis using synthetic data generated by cutting-edge language models like **Llama3.1** and **Gemma2**. By training exclusively on synthetic data, the model has been exposed to a diverse range of sentiment expressions, which enhances its ability to generalize across different use cases

## Purpose of the Notebook
1. **Data Quality Assessment (DQA)**: By running sentiment analysis on the dataset, we can assess sentiment distribution and identify any potential biases or issues in the data that may impact subsequent analysis.
2. **Exploratory Data Analysis (EDA)**: Understanding the overall sentiment landscape of the dataset provides critical context for deeper analysis, revealing trends, patterns, or anomalies in the data.
3. **Pre-Tuned Efficiency**: Using an off-the-shelf model ensures quick and efficient analysis, allowing us to focus on insights rather than model optimization. This is particularly valuable as we will later fine-tune a more specialized model for ABSA.



## Imports

In [2]:
import pandas as pd
from tqdm import tqdm
from enum import Enum

from discover.container import DiscoverContainer
from discover.infra.service.datamanager.sentiment import SentimentAnalysisDataManager

# Register `tqdm` with pandas
tqdm.pandas()

pd.options.display.max_colwidth = None

In [3]:
container = DiscoverContainer()
container.init_resources()
container.wire(
    modules=[
        "discover.infra.service.datamanager.base",
    ],
)

## Data Manager
The `SentimentAnalysisDataManager` owns persistence of data and datasets used in this notebook.

In [4]:
datamanager = SentimentAnalysisDataManager()

## Execution Path Options
This notebook supports three execution paths:

1. **Load Endpoint**: If the notebook has already been executed and results are stored in the repository, they will be loaded. This path is used unless the `FORCE` parameter is set to `True`.
2. **Load Sentiments**: If sentiment analysis results have been precomputed on cloud-based GPUs and saved in a CSV file, the file will be loaded and merged with the dataset, unless `FORCE` is `True`.
3. **Execute Inference**: If `FORCE` is set to `True` or if neither the endpoint nor the sentiment file is available, the notebook will perform inference using the sentiment analysis model.

The following code supports the determination of the execution path based on these conditions.

In [5]:
class ExecutionPath(Enum):
    LOAD_ENDPOINT = "load_endpoint"
    LOAD_SENTIMENTS = "load_sentiments"
    EXECUTE_INFERENCE = "execute_inference"


def determine_execution_path(
    force: bool, datamanager: SentimentAnalysisDataManager
) -> ExecutionPath:
    """Determines the execution path based on the existence of data and the force parameter.

    Args:
        force (bool): Whether to force execution, overriding existing data checks.
        data_manager (SentimentAnalysisDataManager): The data manager to check for existing datasets and sentiments.

    Returns:
        ExecutionPath: The determined execution path.
    """
    if force:
        return ExecutionPath.EXECUTE_INFERENCE

    elif datamanager.dataset_exists(stage="sentiment"):
        return ExecutionPath.LOAD_ENDPOINT

    elif datamanager.sentiments_exist():
        return ExecutionPath.LOAD_SENTIMENTS

    else:
        return ExecutionPath.EXECUTE_INFERENCE


execution_path = determine_execution_path(force=FORCE, datamanager=datamanager)

## Load Endpoint
Loads the endpoint if appropriate given the execution path.

In [6]:
if execution_path == ExecutionPath.LOAD_ENDPOINT:
    df = datamanager.get_dataset(stage="sentiment", name="review")

## Load Pre-Computed Sentiments
Obtain the dataset from the prior stage, 'ingest', and merge in the sentiments from file. 

In [7]:
if execution_path == ExecutionPath.LOAD_SENTIMENTS:
    df = datamanager.get_dataset(stage="ingest", name="review")
    sentiments = datamanager.get_sentiments()
    df = datamanager.merge_sentiments(df=df, sentiments=sentiments)
    datamanager.add_dataset(df=df, stage="sentiment")

## Execute Inference
The following cells perform inference using the sentiment analysis model according to the execution path.

### Import Model and Transformer Libraries
PyTorch model and tokenizer are imported, as well as tqdm for progress monitoring.  

In [8]:
if execution_path == ExecutionPath.EXECUTE_INFERENCE:
    from transformers import AutoTokenizer, AutoModelForSequenceClassification
    import torch
    from tqdm import tqdm

### Check GPU Availability and Prepare for Inference 
Verify GPU availability, ensuring GPU resources are being detected and utilized. To mitigate memory issues, release all unused cached memory held by the caching allocator, making it available for other GPU applications and visible in `nvidia-smi`.

In [9]:
if execution_path == ExecutionPath.EXECUTE_INFERENCE:
    print("PyTorch version:", torch.__version__)
    print("CUDA available:", torch.cuda.is_available())
    print("CUDA version:", torch.version.cuda)
    print("GPU count:", torch.cuda.device_count())
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    !nvidia-smi
    torch.cuda.empty_cache()

## Load Data
Loads the data from the ingest stage from the repository. 

In [10]:
if execution_path == ExecutionPath.EXECUTE_INFERENCE:
    df = datamanager.get_dataset(stage="ingest", name="review")

## Load Model and Tokenizer
Import and load the sentiment analyzer and the tokenizer designed for sequence classification, then move the model to the device detected.

In [11]:
# Load model and tokenizer
if execution_path == ExecutionPath.EXECUTE_INFERENCE:
    model_name = "tabularisai/robust-sentiment-analysis"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    model.to(device)

## Create the Classifier
Tokenize the string of text, truncating it to 512 characters and pad the text if it is shorter than 512 characters. Move the tokenized input to the device detected. Probabilities are computed for each class, and the function returns the highest probability class label. 

In [12]:
# Function to predict sentiment
def predict_sentiment(text):
    with torch.no_grad():
        inputs = tokenizer(
            text.lower(),
            return_tensors="pt",
            truncation=True,
            padding=True,
            max_length=512,
        )
        inputs = {
            key: value.to(device) for key, value in inputs.items()
        }  # Move inputs to the GPU
        outputs = model(**inputs)

        probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
        predicted_class = torch.argmax(probabilities, dim=-1).item()
    sentiment_map = {
        0: "Very Negative",
        1: "Negative",
        2: "Neutral",
        3: "Positive",
        4: "Very Positive",
    }
    return sentiment_map[predicted_class]

## Run Inference
Run inference using the classification function above.

In [13]:
if execution_path == ExecutionPath.EXECUTE_INFERENCE:
    df["an_sentiment"] = df["content"].progress_apply(predict_sentiment)
    datamanager.add_dataset(df=df, stage="sentiment")

## Check Results

In [14]:
df[["id", "app_name", "content", "rating", "an_sentiment"]].sample(n=5, random_state=22)

Unnamed: 0,id,app_name,content,rating,an_sentiment
76544,10201072201,Ad Block One: Tube Ad Blocker,Awesome,5,Very Positive
82700,8833410900,Cleanup: Phone Storage Cleaner,Save time and space,5,Very Positive
26968,8183002438,sweetgreen,I’ve used Chipotle’s and other restaurants’ apps and this is by far the easiest to use and best interface. Not to mention it is similar in price to get a salad delivered and the food is absolutely amazing!! I do have two suggestions: (I) allow the user to add more than two bases and (ii) allow for the use of Apple Pay at checkout. Thanks :)!!!,5,Very Positive
2161,9187505053,OwO Novel - Read Romance Story,The app is not worth 5 stars and the cost for chapters keeps going up,5,Negative
60842,9288815829,Bible,I use this app every day. Easy and intuitive. I like all the different versions. I would love to see a chronological version and a Reference to Jesus version. I want plans that are for one day a week.,5,Neutral


From this sample, several observations are notable:

1. **Ad Block One: Tube Ad Blocker ("Awesome")**: The 5-star rating and "Very Positive" sentiment remain well-aligned, as the single-word feedback conveys a clear and enthusiastic endorsement. No further action needed here.

2. **Cleanup: Phone Storage Cleaner ("Save time and space")**: The sentiment analysis again correctly identifies the positive tone of the review, which matches the 5-star rating. The short, impactful statement reflects high user satisfaction with the app's functionality.

3. **sweetgreen**: The expanded review content continues to justify the "Very Positive" sentiment and the 5-star rating. The user expresses enthusiasm about the app's interface, ease of use, and the quality of the food. Despite suggesting improvements (like adding more bases and supporting Apple Pay), the overall sentiment remains overwhelmingly positive. This is a good example of how constructive feedback can coexist with high satisfaction, and the sentiment analysis accurately captures the overall positive tone.

4. **OwO Novel - Read Romance Story**: The mismatch between the negative content and the 5-star rating becomes even more evident with the added details. The user explicitly states that the app "is not worth 5 stars" and criticizes the rising cost for chapters. This discrepancy is likely a case where the sentiment model is correct in detecting negativity, but the user gave a high rating that contradicts their review. This case suggests that users may sometimes give ratings that do not reflect their written feedback, highlighting the complexity of relying solely on ratings for sentiment analysis.

5. **Bible**: The review content provides constructive feedback alongside a description of regular app use. The suggestions for additional features, like a chronological version and specific plans, are not emotionally charged, which supports the "Neutral" sentiment label. However, the 5-star rating indicates a high level of satisfaction despite the neutral tone of the review. This suggests that the user is content overall but expressed feedback in a more factual manner. The model’s labeling is understandable, but incorporating more contextual understanding might help align sentiment labels more closely with ratings in cases like this.

### Key Takeaways and Recommendations:
- **sweetgreen**: The sentiment analysis does well to capture overall positivity despite the presence of suggestions for improvement, demonstrating robustness in handling mixed feedback.
- **OwO Novel - Read Romance Story**: This highlights a potential gap in understanding user intent behind ratings. Further investigation into user behavior (such as high ratings paired with negative comments) may provide insights into refining sentiment analysis models.
- **Bible**: This review underscores the challenge of interpreting reviews that are positive overall but expressed in a neutral tone. Sentiment analysis might benefit from additional heuristics or metadata to better align with user ratings.

Overall, these examples illustrate the complexities of sentiment analysis when ratings and content don’t always align perfectly, but your model appears to be performing well in capturing the general sentiment conveyed by the text. Let me know if you’d like to explore further improvements or adjustments!

In the next section, we will add perplexity, a proxy measure for gibberish in review text, to the dataset.