# Aspect-Based Sentiment Analysis Exploration

This notebook demonstrates the exploration and testing of different ABSA implementations.

## Overview
- **Lexicon-based ABSA**: Using spaCy and sentiment lexicons
- **Transformer-based ABSA**: Using pre-trained Hugging Face models
- **LLM-based ABSA**: Using local LLM through Ollama

## Setup and Imports


In [1]:
import sys
import os
import time
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

sys.path.append(os.path.join(os.path.dirname(os.getcwd()), 'src'))

from src.base import AspectSentiment
from src.lexicon_absa import LexiconABSA
from src.transformer_absa import TransformerABSA
from src.llm_absa import LLMABSA
from src.utils import (
    load_test_data, save_results, calculate_accuracy, 
    calculate_precision_recall_f1, benchmark_analyzer,
    create_sample_data, print_analysis_results, ensure_data_directory
)

print("Imports successful!")


Imports successful!


## Initialize Analyzers

Let's initialize all three ABSA implementations:


In [2]:
analyzers = {}
results = {}

print("Initializing analyzers...")

try:
    lexicon_analyzer = LexiconABSA()
    analyzers['Lexicon'] = lexicon_analyzer
    print("✓ LexiconABSA initialized successfully")
except Exception as e:
    print(f"✗ LexiconABSA failed to initialize: {e}")

try:
    transformer_analyzer = TransformerABSA()
    analyzers['Transformer'] = transformer_analyzer
    print("✓ TransformerABSA initialized successfully")
except Exception as e:
    print(f"✗ TransformerABSA failed to initialize: {e}")

try:
    llm_analyzer = LLMABSA()
    analyzers['LLM'] = llm_analyzer
    print("✓ LLMABSA initialized successfully")
except Exception as e:
    print(f"✗ LLMABSA failed to initialize: {e}")

print(f"\nSuccessfully initialized {len(analyzers)} analyzers")


Initializing analyzers...
✓ LexiconABSA initialized successfully
Error loading model yangheng/deberta-v3-base-absa-v1.1: 
DebertaV2Tokenizer requires the SentencePiece library but it was not found in your environment. Check out the instructions on the
installation page of its repo: https://github.com/google/sentencepiece#installation and follow the ones
that match your environment. Please note that you may need to restart your runtime after installation.

Falling back to alternative ABSA model...


Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


✓ TransformerABSA initialized successfully
✓ LLMABSA initialized successfully

Successfully initialized 3 analyzers


## Load Test Data

Let's load our test data and create sample data if needed:


In [3]:
data_file = ensure_data_directory()
test_data = load_test_data(data_file)

if not test_data:
    print("Creating sample test data...")
    test_data = create_sample_data()
    save_results(test_data, data_file)

print(f"Loaded {len(test_data)} test samples")
print("\nSample test data:")
for i, item in enumerate(test_data[:3]):
    aspects = [f"{a['aspect']} ({a['sentiment']})" for a in item['ground_truth']]
    print(f"{i+1}. {item['text']}")
    print(f"   Expected: {aspects}")
    print()


Loaded 5 test samples

Sample test data:
1. The pizza was delicious but the service was terrible.
   Expected: ['pizza (positive)', 'service (negative)']

2. The laptop has great performance and excellent battery life.
   Expected: ['performance (positive)', 'battery life (positive)']

3. The hotel room was clean and comfortable, but the WiFi was slow.
   Expected: ['hotel room (positive)', 'WiFi (negative)']



## Individual Testing

Let's test each analyzer individually on sample texts:

In [5]:
test_texts = [
    "The pizza was delicious but the service was terrible.",
    "The laptop has great performance and excellent battery life.",
    "The hotel room was clean and comfortable, but the WiFi was slow."
]

for text in test_texts:
    print(f"\nText: {text}")
    print("-" * 50)

    for name, analyzer in analyzers.items():
        try:
            start_time = time.time()
            results = analyzer.analyze(text)
            processing_time = time.time() - start_time

            print(f"\n{name} Results ({processing_time:.3f}s):")
            if results:
                for result in results:
                    print(f"  - {result.aspect}: {result.sentiment} (confidence: {result.confidence:.3f})")
            else:
                print("  No aspects found")

        except Exception as e:
            print(f"\n{name} Error: {e}")

Device set to use cpu



Text: The pizza was delicious but the service was terrible.
--------------------------------------------------

Lexicon Results (0.007s):
  - pizza: positive (confidence: 0.572)
  - service: positive (confidence: 0.572)
  - the pizza: positive (confidence: 0.572)
  - the service: positive (confidence: 0.572)

Transformer Results (0.111s):
  - pizza: positive (confidence: 0.953)
  - service: negative (confidence: 0.901)


Device set to use cpu



LLM Results (16.964s):
  - pizza: positive (confidence: 0.900)
  - service: negative (confidence: 0.800)

Text: The laptop has great performance and excellent battery life.
--------------------------------------------------

Lexicon Results (0.045s):
  - laptop: positive (confidence: 0.625)
  - performance: positive (confidence: 0.625)
  - battery: positive (confidence: 0.625)
  - life: positive (confidence: 0.625)
  - the laptop: positive (confidence: 0.625)
  - great performance: positive (confidence: 0.625)
  - excellent battery life: positive (confidence: 0.625)

Transformer Results (1.010s):
  - life: positive (confidence: 0.986)
  - battery: positive (confidence: 0.972)
  - laptop: positive (confidence: 0.986)
  - performance: positive (confidence: 0.986)
  - excellent battery life: positive (confidence: 0.983)
  - great performance: positive (confidence: 0.986)


Device set to use cpu



LLM Results (9.617s):
  - performance: positive (confidence: 0.900)
  - battery life: positive (confidence: 0.800)

Text: The hotel room was clean and comfortable, but the WiFi was slow.
--------------------------------------------------

Lexicon Results (0.019s):
  - hotel: positive (confidence: 0.511)
  - room: positive (confidence: 0.511)
  - wifi: positive (confidence: 0.511)
  - the hotel room: positive (confidence: 0.511)
  - the wifi: positive (confidence: 0.511)

Transformer Results (0.220s):
  - room: positive (confidence: 0.591)
  - hotel: positive (confidence: 0.925)
  - wifi: neutral (confidence: 0.472)

LLM Results (10.342s):
  - cleanliness: positive (confidence: 0.900)
  - comfort: positive (confidence: 0.800)
  - WiFi speed: negative (confidence: 0.700)
