# 🏆 TechJam 2025: Review Quality Assessment System

## 🎯 Challenge: ML for Trustworthy Location Reviews

This notebook will guide you through building a system to detect policy violations in Google location reviews:
- 🚫 **Advertisements**: Reviews containing promotional content
- 🚫 **Irrelevant Content**: Reviews not related to the location
- 🚫 **Fake Rants**: Complaints from users who never visited

**Today's Goal (Day 1)**: Set up environment, explore data, and build basic understanding

---

## 📚 Step 1: Import Required Libraries

Let's start by importing all the libraries we'll need for data processing, ML models, and visualization.

In [None]:
# Install required packages (run this if packages are not installed)
# Uncomment the lines below if you need to install packages

# !pip install pandas numpy matplotlib seaborn
# !pip install transformers torch
# !pip install huggingface_hub
# !pip install scikit-learn
# !pip install streamlit --quiet

In [None]:
# Core data processing libraries
import pandas as pd
import numpy as np
import re
import json
from typing import List, Dict, Tuple

# Visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns

# ML and NLP libraries
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_recall_fscore_support, confusion_matrix, classification_report

# Hugging Face transformers
try:
    from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification
    import torch
    print("✅ Transformers library loaded successfully")
except ImportError:
    print("❌ Transformers not installed. Please run: pip install transformers torch")

# Set style for better plots
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

print("📚 All libraries imported successfully!")

## 📊 Step 2: Data Loading and Initial Exploration

First, let's load the Google Reviews dataset and understand its structure.

In [None]:
def load_sample_data():
    """
    Create sample data for testing if you don't have the dataset yet.
    Replace this with actual data loading when you get the dataset.
    """
    
    # Sample reviews with different violation types
    sample_reviews = [
        # Normal reviews
        {"review_text": "Great food and excellent service. The pasta was delicious and the staff was very friendly. Highly recommend!", "rating": 5, "business_name": "Mario's Restaurant"},
        {"review_text": "Average experience. Food was okay but service was slow. Not bad but not great either.", "rating": 3, "business_name": "Central Cafe"},
        {"review_text": "Terrible experience. Food was cold and the waiter was rude. Will not return.", "rating": 1, "business_name": "Downtown Diner"},
        
        # Advertisement examples
        {"review_text": "Amazing pizza! Visit our website www.pizzadeals.com for 50% off coupons and special offers!", "rating": 5, "business_name": "Tony's Pizza"},
        {"review_text": "Great burgers! Call us at 555-BURGER for catering services and party packages!", "rating": 5, "business_name": "Burger Palace"},
        {"review_text": "Delicious food! Check out our new location on Main Street. Grand opening specials available!", "rating": 5, "business_name": "Fresh Bites"},
        
        # Irrelevant content examples
        {"review_text": "I love my new smartphone camera! Anyway, this restaurant has okay food I guess.", "rating": 3, "business_name": "City Grill"},
        {"review_text": "Traffic was terrible today because of construction. Politics are crazy these days. Oh, the coffee was fine.", "rating": 3, "business_name": "Corner Coffee"},
        {"review_text": "My car broke down on the way here, what a terrible day. The weather is also awful. Food was decent though.", "rating": 2, "business_name": "Highway Diner"},
        
        # Fake rant examples
        {"review_text": "Never been here but I heard from my neighbor that it's absolutely terrible. Probably overpriced too.", "rating": 1, "business_name": "Elite Restaurant"},
        {"review_text": "I hate these fancy restaurants, they're all scams. Never visited but I'm sure it's pretentious.", "rating": 1, "business_name": "Fine Dining Co"},
        {"review_text": "Looks dirty from the outside, probably awful inside too. Won't waste my time going there.", "rating": 1, "business_name": "Street Food Truck"}
    ]
    
    return pd.DataFrame(sample_reviews)

# Load data
# TODO: Replace this with actual dataset loading
# df = pd.read_csv('path_to_google_reviews_dataset.csv')

# For now, use sample data
df = load_sample_data()

print(f"📊 Dataset loaded with {len(df)} reviews")
print(f"📋 Columns: {df.columns.tolist()}")
print("\n📝 First 3 reviews:")
df.head(3)

In [None]:
# Basic data exploration
def explore_data(df):
    """
    Perform basic exploration of the review dataset
    """
    print("🔍 BASIC DATA EXPLORATION")
    print("=" * 40)
    
    # Dataset info
    print(f"Dataset shape: {df.shape}")
    print(f"Missing values: {df.isnull().sum().sum()}")
    
    # Text statistics
    df['review_length'] = df['review_text'].str.len()
    df['word_count'] = df['review_text'].str.split().str.len()
    
    print(f"\n📏 Review Length Statistics:")
    print(f"  Average length: {df['review_length'].mean():.1f} characters")
    print(f"  Average words: {df['word_count'].mean():.1f} words")
    print(f"  Shortest review: {df['review_length'].min()} characters")
    print(f"  Longest review: {df['review_length'].max()} characters")
    
    # Rating distribution
    print(f"\n⭐ Rating Distribution:")
    print(df['rating'].value_counts().sort_index())
    
    return df

df = explore_data(df)

In [None]:
# Visualize data distribution
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Review length distribution
axes[0, 0].hist(df['review_length'], bins=20, alpha=0.7, color='skyblue')
axes[0, 0].set_title('Distribution of Review Length (Characters)')
axes[0, 0].set_xlabel('Characters')
axes[0, 0].set_ylabel('Frequency')

# Word count distribution
axes[0, 1].hist(df['word_count'], bins=20, alpha=0.7, color='lightgreen')
axes[0, 1].set_title('Distribution of Word Count')
axes[0, 1].set_xlabel('Words')
axes[0, 1].set_ylabel('Frequency')

# Rating distribution
rating_counts = df['rating'].value_counts().sort_index()
axes[1, 0].bar(rating_counts.index, rating_counts.values, alpha=0.7, color='orange')
axes[1, 0].set_title('Distribution of Ratings')
axes[1, 0].set_xlabel('Rating')
axes[1, 0].set_ylabel('Count')

# Review length vs rating scatter
axes[1, 1].scatter(df['rating'], df['review_length'], alpha=0.6, color='purple')
axes[1, 1].set_title('Review Length vs Rating')
axes[1, 1].set_xlabel('Rating')
axes[1, 1].set_ylabel('Review Length (Characters)')

plt.tight_layout()
plt.show()

print("📈 Data visualization complete!")

## 🔧 Step 3: Feature Engineering

Let's extract useful features that can help identify policy violations.

In [None]:
def extract_features(df):
    """
    Extract features that might indicate policy violations
    """
    print("🔧 EXTRACTING FEATURES FOR VIOLATION DETECTION")
    print("=" * 50)
    
    # Basic text features
    df['review_length'] = df['review_text'].str.len()
    df['word_count'] = df['review_text'].str.split().str.len()
    df['exclamation_count'] = df['review_text'].str.count('!')
    df['question_count'] = df['review_text'].str.count('\?')
    
    # Capitalization features (potential indicators of spam/rants)
    df['caps_ratio'] = df['review_text'].apply(lambda x: sum(1 for c in x if c.isupper()) / len(x) if len(x) > 0 else 0)
    df['excessive_caps'] = df['caps_ratio'] > 0.3
    
    # Advertisement indicators
    df['has_url'] = df['review_text'].str.contains(r'http[s]?://|www\.', regex=True, na=False)
    df['has_phone'] = df['review_text'].str.contains(r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b|call|phone', regex=True, na=False, case=False)
    df['has_promo_words'] = df['review_text'].str.contains(r'discount|deal|promo|sale|coupon|special offer|visit|website', regex=True, na=False, case=False)
    
    # Irrelevant content indicators
    df['mentions_unrelated'] = df['review_text'].str.contains(r'my phone|my car|politics|weather|traffic|news|government', regex=True, na=False, case=False)
    
    # Fake rant indicators
    df['never_visited'] = df['review_text'].str.contains(r'never been|never visited|heard it|looks like|probably|i hate these|all these places', regex=True, na=False, case=False)
    
    # Length-based features
    df['very_short'] = df['word_count'] < 5
    df['very_long'] = df['word_count'] > 200
    
    print(f"✅ Extracted {len([col for col in df.columns if col not in ['review_text', 'rating', 'business_name']])} features")
    
    # Show feature summary
    feature_cols = ['has_url', 'has_phone', 'has_promo_words', 'mentions_unrelated', 'never_visited', 'excessive_caps']
    print("\n📊 Feature Summary:")
    for col in feature_cols:
        count = df[col].sum()
        print(f"  {col}: {count} reviews ({count/len(df)*100:.1f}%)")
    
    return df

df = extract_features(df)
print("\n📋 Sample of extracted features:")
df[['review_text', 'has_url', 'has_promo_words', 'mentions_unrelated', 'never_visited']].head()

## 🏷️ Step 4: Manual Labeling (Create Ground Truth)

Let's manually label our sample data to create ground truth for evaluation.

In [None]:
def create_manual_labels(df):
    """
    Create manual labels for the sample data
    In a real scenario, you would label a larger subset manually
    """
    print("🏷️ CREATING MANUAL LABELS FOR GROUND TRUTH")
    print("=" * 45)
    
    # Manual labels based on our sample data
    # 0 = No violation, 1 = Violation
    
    # Advertisement labels (reviews 3, 4, 5 in our sample)
    ad_labels = [0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0]
    
    # Irrelevant content labels (reviews 6, 7, 8 in our sample)
    irrelevant_labels = [0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0]
    
    # Fake rant labels (reviews 9, 10, 11 in our sample)
    fake_rant_labels = [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1]
    
    df['is_advertisement'] = ad_labels
    df['is_irrelevant'] = irrelevant_labels
    df['is_fake_rant'] = fake_rant_labels
    
    # Summary
    print(f"📊 Label Summary:")
    print(f"  Advertisements: {df['is_advertisement'].sum()}/{len(df)} ({df['is_advertisement'].sum()/len(df)*100:.1f}%)")
    print(f"  Irrelevant: {df['is_irrelevant'].sum()}/{len(df)} ({df['is_irrelevant'].sum()/len(df)*100:.1f}%)")
    print(f"  Fake Rants: {df['is_fake_rant'].sum()}/{len(df)} ({df['is_fake_rant'].sum()/len(df)*100:.1f}%)")
    
    return df

df = create_manual_labels(df)
print("\n✅ Manual labels created successfully!")

## 🤖 Step 5: LLM-Based Classifier with Real Language Models

Now let's implement a real language model-based classifier using Hugging Face transformers with prompt engineering.

In [None]:
class ReviewClassifier:
    """
    LLM-based classifier for policy violation detection using Hugging Face transformers
    """
    
    def __init__(self, model_name: str = "distilbert-base-uncased-finetuned-sst-2-english"):
        """Initialize the review classifier with specified model"""
        self.model_name = model_name
        self.classifier = None
        self.text_generator = None
        self._setup_model()
    
    def _setup_model(self):
        """Setup the Hugging Face model"""
        # Always setup fallback keywords first
        self._setup_fallback()
        
        try:
            # Try to load a text generation model for prompt-based classification
            print(f"🤖 Loading language model: {self.model_name}")
            
            # For lighter models, use text-classification pipeline directly
            if 'distilbert' in self.model_name.lower() or 'bert' in self.model_name.lower():
                self.classifier = pipeline(
                    "text-classification",
                    model=self.model_name,
                    device=-1  # Use CPU for compatibility
                )
                print("✅ Loaded classification model successfully")
            else:
                # For larger LLMs, use text generation
                self.text_generator = pipeline(
                    "text-generation",
                    model=self.model_name,
                    device=-1,
                    max_length=512,
                    do_sample=False
                )
                print("✅ Loaded text generation model successfully")
                
        except Exception as e:
            print(f"❌ Error loading {self.model_name}: {e}")
            print("🔄 Falling back to rule-based classification")
            self._setup_fallback()
    
    def _setup_fallback(self):
        """Setup fallback rule-based classification"""
        self.ad_keywords = [
            'visit', 'website', 'www', 'http', 'call', 'phone', 'discount',
            'deal', 'promo', 'sale', 'coupon', 'special offer'
        ]
        
        self.irrelevant_keywords = [
            'my phone', 'my car', 'politics', 'weather', 'traffic',
            'my day', 'my life', 'news', 'government', 'president'
        ]
        
        self.fake_rant_keywords = [
            'never been', 'heard it', 'looks like', 'probably',
            'i hate these', 'all these places', 'never visited'
        ]
    
    def create_prompts(self) -> dict:
        """Create prompts for each policy violation type"""
        return {
            'advertisement': '''
Task: Determine if this review contains advertisements or promotional content.

Examples of ADVERTISEMENTS:
- "Great food! Visit www.discount-deals.com for coupons!"
- "Call 555-1234 for catering services!"
- "Check out our new location on Main Street!"

Examples of NOT ADVERTISEMENTS:
- "The food was delicious and service was great"
- "I loved the atmosphere, will definitely come back"
- "Terrible experience, would not recommend"

Review to analyze: "{review_text}"

Is this review an ADVERTISEMENT? Answer only YES or NO:''',

            'irrelevant': '''
Task: Determine if this review is about the business location being reviewed.

Examples of IRRELEVANT reviews:
- "I love my new phone, but this place is noisy" (about phone, not restaurant)
- "My car broke down on the way here, terrible day" (about car, not business)
- "Politics these days are crazy, anyway the food was ok" (mostly about politics)

Examples of RELEVANT reviews:
- "The pizza was amazing, great service too"
- "Parking was difficult but the experience was worth it"
- "Staff was rude and food was cold"

Review to analyze: "{review_text}"

Is this review IRRELEVANT to the business? Answer only YES or NO:''',

            'fake_rant': '''
Task: Determine if this is a rant from someone who likely never visited the place.

Examples of FAKE RANTS:
- "Never been here but heard it's terrible from my friend"
- "I hate this type of business, they're all scams"
- "Looks dirty from the outside, probably awful inside"

Examples of GENUINE reviews (even if negative):
- "I visited yesterday and the service was terrible"
- "Went there for lunch, food was cold and overpriced"
- "Been there multiple times, quality has declined"

Review to analyze: "{review_text}"

Is this a FAKE RANT from someone who likely never visited? Answer only YES or NO:'''
        }
    
    def _classify_with_llm(self, review_text: str, violation_type: str) -> bool:
        """Use LLM to classify for a specific violation type"""
        if self.text_generator is None:
            return self._classify_fallback(review_text, violation_type)
        
        try:
            prompts = self.create_prompts()
            prompt = prompts[violation_type].format(review_text=review_text)
            
            response = self.text_generator(
                prompt,
                max_new_tokens=10,
                do_sample=False,
                pad_token_id=self.text_generator.tokenizer.eos_token_id
            )
            
            generated_text = response[0]['generated_text'][len(prompt):].strip().upper()
            return 'YES' in generated_text
            
        except Exception as e:
            print(f"⚠️ LLM classification failed for {violation_type}: {e}")
            return self._classify_fallback(review_text, violation_type)
    
    def _classify_with_bert(self, review_text: str, violation_type: str) -> bool:
        """Use BERT-style model for classification with heuristics"""
        if self.classifier is None:
            return self._classify_fallback(review_text, violation_type)
        
        try:
            # For BERT models, we'll use sentiment analysis as a proxy and combine with heuristics
            result = self.classifier(review_text)
            sentiment_score = result[0]['score'] if result[0]['label'] == 'POSITIVE' else 1 - result[0]['score']
            
            # Combine sentiment with rule-based heuristics for better accuracy
            rule_based = self._classify_fallback(review_text, violation_type)
            
            # Advertisement: typically positive sentiment + promotional keywords
            if violation_type == 'advertisement':
                return rule_based or (sentiment_score > 0.8 and any(kw in review_text.lower() 
                                                                   for kw in ['www', 'http', 'call', 'visit']))
            
            # Irrelevant: often contains off-topic keywords regardless of sentiment
            elif violation_type == 'irrelevant':
                return rule_based
            
            # Fake rant: typically very negative + fake indicators
            elif violation_type == 'fake_rant':
                return rule_based or (sentiment_score < 0.3 and any(kw in review_text.lower() 
                                                                   for kw in ['never been', 'heard', 'probably']))
            
            return rule_based
            
        except Exception as e:
            print(f"⚠️ BERT classification failed for {violation_type}: {e}")
            return self._classify_fallback(review_text, violation_type)
    
    def _classify_fallback(self, review_text: str, violation_type: str) -> bool:
        """Fallback rule-based classification"""
        text_lower = review_text.lower()
        
        if violation_type == 'advertisement':
            return any(keyword in text_lower for keyword in self.ad_keywords)
        elif violation_type == 'irrelevant':
            return any(keyword in text_lower for keyword in self.irrelevant_keywords)
        elif violation_type == 'fake_rant':
            return any(keyword in text_lower for keyword in self.fake_rant_keywords)
        
        return False
    
    def classify_review(self, text: str) -> dict:
        """Classify a single review for all violation types"""
        results = {}
        
        for violation_type in ['advertisement', 'irrelevant', 'fake_rant']:
            if self.text_generator:
                results[violation_type] = self._classify_with_llm(text, violation_type)
            elif self.classifier:
                results[violation_type] = self._classify_with_bert(text, violation_type)
            else:
                results[violation_type] = self._classify_fallback(text, violation_type)
        
        return results
    
    def classify_batch(self, texts: list) -> list:
        """Classify multiple reviews"""
        results = []
        total = len(texts)
        
        for i, text in enumerate(texts):
            if i % 50 == 0:
                print(f"📊 Processing review {i+1}/{total}")
            results.append(self.classify_review(text))
        
        return results

# Test the LLM-based classifier
print("🤖 TESTING LLM-BASED CLASSIFIER")
print("=" * 35)

classifier = ReviewClassifier()

# Get predictions
reviews = df['review_text'].tolist()
predictions = classifier.classify_batch(reviews)

# Add predictions to dataframe
df['pred_advertisement'] = [pred['advertisement'] for pred in predictions]
df['pred_irrelevant'] = [pred['irrelevant'] for pred in predictions]
df['pred_fake_rant'] = [pred['fake_rant'] for pred in predictions]

print("✅ LLM-based classification complete!")
print(f"\n📊 Prediction Summary:")
print(f"  Predicted Advertisements: {df['pred_advertisement'].sum()}")
print(f"  Predicted Irrelevant: {df['pred_irrelevant'].sum()}")
print(f"  Predicted Fake Rants: {df['pred_fake_rant'].sum()}")

## 📊 Step 6: Evaluation and Metrics

Let's evaluate our LLM-based classifier performance.

In [None]:
def evaluate_classifier(df, violation_type):
    """
    Evaluate classifier performance for a specific violation type
    """
    true_col = f'is_{violation_type}'
    pred_col = f'pred_{violation_type}'
    
    y_true = df[true_col].tolist()
    y_pred = df[pred_col].tolist()
    
    # Calculate metrics
    precision, recall, f1, _ = precision_recall_fscore_support(y_true, y_pred, average='binary', zero_division=0)
    accuracy = sum(t == p for t, p in zip(y_true, y_pred)) / len(y_true)
    
    return {
        'precision': precision,
        'recall': recall,
        'f1_score': f1,
        'accuracy': accuracy
    }

def plot_confusion_matrix(df, violation_type):
    """
    Plot confusion matrix for a violation type
    """
    true_col = f'is_{violation_type}'
    pred_col = f'pred_{violation_type}'
    
    y_true = df[true_col].tolist()
    y_pred = df[pred_col].tolist()
    
    cm = confusion_matrix(y_true, y_pred)
    
    plt.figure(figsize=(6, 4))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
               xticklabels=['No Violation', 'Violation'],
               yticklabels=['No Violation', 'Violation'])
    plt.title(f'Confusion Matrix - {violation_type.title()} Detection')
    plt.ylabel('True Label')
    plt.xlabel('Predicted Label')
    plt.show()

# Evaluate all violation types
print("📊 LLM-BASED CLASSIFIER EVALUATION")
print("=" * 40)

violation_types = ['advertisement', 'irrelevant', 'fake_rant']
results = {}

for vtype in violation_types:
    metrics = evaluate_classifier(df, vtype)
    results[vtype] = metrics
    
    print(f"\n🎯 {vtype.upper()} DETECTION:")
    print(f"  Precision: {metrics['precision']:.3f}")
    print(f"  Recall:    {metrics['recall']:.3f}")
    print(f"  F1-Score:  {metrics['f1_score']:.3f}")
    print(f"  Accuracy:  {metrics['accuracy']:.3f}")
    
    # Plot confusion matrix
    plot_confusion_matrix(df, vtype)

# Overall average F1-score
avg_f1 = np.mean([results[vtype]['f1_score'] for vtype in violation_types])
print(f"\n🏆 OVERALL AVERAGE F1-SCORE: {avg_f1:.3f}")

## 🔮 Step 7: Advanced LLM Model Options and Comparison

Let's explore different language models and compare their performance on our classification task.

In [None]:
def compare_llm_models():
    """
    Compare different language models for classification performance
    """
    print("🔮 COMPARING DIFFERENT LLM MODELS")
    print("=" * 35)
    
    # List of models to try (ordered by computational requirements)
    models_to_test = [
        "distilbert-base-uncased-finetuned-sst-2-english",  # Lightweight
        "microsoft/DialoGPT-medium",  # Medium LLM
        "google/flan-t5-small",  # Text-to-text model
    ]
    
    model_results = {}
    
    # Test each model on a sample review
    test_reviews = [
        "Great food! Visit www.discount-deals.com for coupons!",  # Advertisement
        "I love my new phone, but this place is noisy",  # Irrelevant
        "Never been here but heard it's terrible from my friend"  # Fake rant
    ]
    
    for model_name in models_to_test:
        print(f"\n🧪 Testing {model_name}...")
        
        try:
            # Create classifier with this model
            test_classifier = ReviewClassifier(model_name=model_name)
            
            # Test on sample reviews
            results = []
            for review in test_reviews:
                prediction = test_classifier.classify_review(review)
                results.append(prediction)
            
            model_results[model_name] = {
                'status': 'success',
                'results': results
            }
            
            print(f"✅ {model_name} loaded and tested successfully")
            
        except Exception as e:
            print(f"❌ {model_name} failed: {e}")
            model_results[model_name] = {
                'status': 'failed',
                'error': str(e)
            }
    
    # Display comparison results
    print("\n📊 MODEL COMPARISON RESULTS:")
    print("=" * 30)
    
    for model_name, result in model_results.items():
        status = "✅" if result['status'] == 'success' else "❌"
        print(f"{status} {model_name}: {result['status']}")
        
        if result['status'] == 'success':
            print("   Sample classifications:")
            for i, (review, pred) in enumerate(zip(test_reviews, result['results'])):
                violations = [k for k, v in pred.items() if v]
                violation_str = ", ".join(violations) if violations else "No violations"
                print(f"     Review {i+1}: {violation_str}")
    
    return model_results

# Run model comparison
model_comparison = compare_llm_models()

# Suggest best practices
print("\n💡 RECOMMENDATIONS:")
print("=" * 20)
print("• For production: Use DistilBERT for speed, T5 for accuracy")
print("• For experimentation: Try larger models like GPT-2 or Llama")
print("• Consider ensemble methods: Combine multiple model predictions")
print("• Fine-tuning: Train models on domain-specific review data")

## 💾 Step 8: Save Progress and Plan Next Steps

Let's save our work and prepare for tomorrow.

In [None]:
# Save processed data for next session
df.to_csv('processed_reviews_day1.csv', index=False)
print("💾 Data saved to 'processed_reviews_day1.csv'")

# Save results summary
summary = {
    'day': 1,
    'date': '2025-08-25',
    'dataset_size': len(df),
    'features_extracted': len([col for col in df.columns if col.startswith(('has_', 'is_', 'pred_'))]),
    'baseline_results': results,
    'avg_f1_score': avg_f1,
    'next_steps': [
        'Fine-tune models on domain data',
        'Implement ensemble approach',
        'Test on larger dataset',
        'Create ensemble approach'
    ]
}

with open('day1_summary.json', 'w') as f:
    json.dump(summary, f, indent=2)

print("📄 Summary saved to 'day1_summary.json'")

print("\n🎉 DAY 1 COMPLETE!")
print("=" * 20)
print("✅ Environment set up")
print("✅ Data loaded and explored")
print("✅ Features extracted")
print("✅ LLM-based classifier implemented")
print("✅ Real language models integrated")
print("✅ Evaluation metrics calculated")
print(f"✅ LLM F1-score: {avg_f1:.3f}")

print("\n📅 TOMORROW'S PLAN (Day 2):")
print("🎯 Fine-tune models on domain data")
print("🎯 Implement ensemble approach")
print("🎯 Test and improve accuracy")
print("🎯 Prepare for demo creation")

print("\n🏆 Great job! You're on track to win this hackathon! 🏆")