# Notebook Overview

## key objectives:
- Loading a pre-trained sentiment analysis model
- Validating the model's functionality
- Analyzing training performance through visualizations

# Imports
This code imports necessary libraries and modules for setting up the sentiment analysis project, including the setup for importing local project files, managing paths, and preparing for model training and evaluation. Here's an explanation of each part:

1. **Library Imports:**
   - `os` and `sys`: Used for interacting with the operating system, including setting paths and changing directories.
   - `json`: Imports JSON parsing functionalities, which could be used for handling configuration or data.
   - `tensorflow`: TensorFlow is used for deep learning, including training and evaluating models.
   - `DistilBertTokenizer` from `transformers`: Imports the tokenizer for DistilBERT, a transformer model optimized for text-based tasks like sentiment analysis.
   - `numpy`: Provides support for numerical operations, often used for array handling.
   - `matplotlib.pyplot` and `seaborn`: Used for plotting graphs and visualizations (e.g., to visualize model performance).
   - `defaultdict` from `collections`: A dictionary subclass that provides a default value for missing keys.

2. **Project Path Setup:**
   - `project_root`: Specifies the root directory of the project.
   - `sys.path.insert(0, project_root)`: Adds the root directory to the Python module search path to allow importing local project modules.
   - `os.chdir('../')`: Changes the current working directory to the parent directory.

3. **Imports from Local Modules:**
   - `EnhancedDistilBertForSentiment`, `ModelTrainer`, and other classes are imported from local modules, which are presumably implemented to handle sentiment analysis models, training, evaluation, and persistence.
   - `Config`: Likely a configuration object for model parameters.
   - `SentimentAnalyzer`: Likely the main class for performing sentiment analysis.
   - `ModelPersistence`: A class for saving and loading trained models.
   - `ModelEvaluator`: A class used to evaluate model performance.
   - `DataProcessor`: A class to handle preprocessing and data management.

4. **Path Setup for Models:**
   - `project_path`: Defines the path to the project's directory.
   - `model_path`: Constructs the path to a specific saved model version (`model_v2.0.0_epoch6`), which will be used for inference or evaluation.

This cell primarily sets up the environment, imports necessary libraries, and configures paths for the sentiment analysis project.

In [None]:
import os
import sys
import json
import tensorflow as tf
from transformers import DistilBertTokenizer
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from collections import defaultdict
project_root = '/Users/tayebekavousi/Desktop/sentimentAnalysisPortfolio'
sys.path.insert(0, project_root)
os.chdir('../')
from models.sentiment_model import EnhancedDistilBertForSentiment, ModelTrainer
from config.model_config import Config
from main import SentimentAnalyzer
from models.modelPersistence import ModelPersistence
from utils.modelEvaluator import ModelEvaluator
from data.data_processing import DataProcessor

# Set paths
project_path = "/Users/tayebekavousi/Desktop/sentimentAnalysisPortfolio"
model_path = os.path.join(project_path, "saved_models/model_v2.0.0_epoch6")

In [None]:
model = EnhancedDistilBertForSentiment()

In [None]:
# Initialize a new model with the same architecture
print("\nInitializing model architecture...")
# Create a dummy input to build the model
dummy_input = {
    'input_ids': tf.zeros((1, Config.MAX_LENGTH), dtype=tf.int32),
    'attention_mask': tf.zeros((1, Config.MAX_LENGTH), dtype=tf.int32)
}
_ = model(dummy_input)  # Build the model
    
# Load saved weights
print("\nLoading model weights...")
model.load_weights(os.path.join(model_path, "full_model", "variables", "variables"))

# Load tokenizer
print("\nLoading tokenizer...")
loaded_tokenizer = DistilBertTokenizer.from_pretrained(os.path.join(model_path, "tokenizer"))

# Initialize analyzer
analyzer = SentimentAnalyzer(model=model)
analyzer.tokenizer = loaded_tokenizer
analyzer.trainer = ModelTrainer(model, loaded_tokenizer)
analyzer.data_processor = DataProcessor()

# Test the loaded model
test_texts = [
    "The food was absolutely amazing! Best restaurant experience ever!",
    "Service was terrible and the food was cold.",
    "It was okay, nothing special but nothing terrible either."
]

print("\nTesting model with sample reviews:")
for text in test_texts:
    result = analyzer.predict(text)
    print(f"\nInput: {text}")
    print("Prediction:", json.dumps(result, indent=2))

print("\nModel loaded successfully! Ready for predictions.")

In [None]:
# Initialize the  evaluator
evaluator = ModelEvaluator(model, analyzer.trainer)

In [None]:
# Process the data first
analyzer.process_data()

# Now get the test set
test_texts = analyzer.processed_data['dataframes']['test']['processed_text'].to_numpy()
test_labels = {'sentiment': analyzer.processed_data['model_inputs'][5]['sentiment']}

# Evaluate model on test set
test_metrics = evaluator.evaluate_model(test_texts, test_labels)

In [None]:
# Get training history from config
with open(os.path.join(model_path, "config.json"), 'r') as f:
    training_history = json.load(f)['performance']

# Create and display visualization
fig = visualize_results(training_history, test_metrics, evaluator.label_map)
plt.show()

# Print additional summary metrics
print("\nTest Set Performance Summary:")
print(f"Overall Accuracy: {test_metrics['overall_accuracy']:.3f}")
print("\nPer-class Performance:")
for label, metrics in test_metrics['class_metrics'].items():
    print(f"\n{label}:")
    print(f"Accuracy: {metrics['accuracy']:.3f}")
    print(f"Support: {metrics['support']}")

In [None]:
evaluator.performance_visualizer()

In [None]:
print("\n=== Testing Edge Cases ===")
edge_case_results = evaluator.test_sentiment_edge_cases(analyzer)