# ü§ñ TweetGuard AI Detector - Model Training

This notebook trains a DistilBERT model to detect AI-generated text, optimized for Twitter/X content.

## Overview
- **Model**: DistilBERT (40% smaller, 60% faster than BERT)
- **Task**: Binary classification (Human vs AI-generated)
- **Datasets**: HC3, ChatGPT Detection Corpus
- **Output**: TensorFlow.js model (~1-2MB)

## Requirements
- Google Colab with GPU runtime (free tier works)
- ~2-4 hours training time

---

## Step 1: Setup Environment

First, let's install the required packages and set up GPU.

In [None]:
# Check GPU availability
!nvidia-smi

In [None]:
# Install required packages
!pip install -q transformers datasets accelerate tensorflowjs torch scikit-learn

In [None]:
import torch
import numpy as np
import pandas as pd
from datasets import load_dataset, Dataset, DatasetDict
from transformers import (
    DistilBertTokenizer,
    DistilBertForSequenceClassification,
    TrainingArguments,
    Trainer,
    EarlyStoppingCallback
)
from sklearn.metrics import accuracy_score, precision_recall_fscore_support, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# Check device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")

## Step 2: Load and Prepare Datasets

We'll use multiple datasets for robust training:
1. **HC3** - Human ChatGPT Comparison corpus
2. **ChatGPT Detection Corpus** - Additional AI-generated samples

In [None]:
print("Loading datasets...")

# Load HC3 dataset (using the updated method)
try:
    # Try loading with trust_remote_code for newer datasets library
    hc3 = load_dataset("Hello-SimpleAI/HC3", "all", trust_remote_code=True)
    print(f"‚úÖ HC3 loaded: {len(hc3['train'])} samples")
except Exception as e1:
    try:
        # Alternative: Load without config name
        hc3 = load_dataset("Hello-SimpleAI/HC3", trust_remote_code=True)
        print(f"‚úÖ HC3 loaded (alt method)")
    except Exception as e2:
        try:
            # Fallback: Use a different AI detection dataset
            print("‚ö†Ô∏è HC3 not available, trying alternative dataset...")
            hc3 = load_dataset("artem9k/ai-text-detection-pile", trust_remote_code=True)
            print(f"‚úÖ Alternative dataset loaded")
        except Exception as e3:
            print(f"‚ùå Could not load datasets: {e3}")
            hc3 = None

# If HC3 failed, try another popular AI detection dataset
if hc3 is None:
    try:
        print("Trying OpenAI detector dataset...")
        hc3 = load_dataset("aadityaubhat/GPT-wiki-intro", trust_remote_code=True)
        print(f"‚úÖ GPT-wiki-intro loaded")
    except:
        print("‚ùå No datasets available. Please check your internet connection.")

In [None]:
def prepare_hc3_data(dataset):
    """
    Prepare HC3 dataset for training.
    Handles multiple dataset formats.
    """
    texts = []
    labels = []
    
    # Check the structure of the dataset
    if hasattr(dataset, 'features'):
        print(f"Dataset features: {dataset.features}")
    
    for item in dataset:
        # HC3 format: 'human_answers' and 'chatgpt_answers'
        if 'human_answers' in item and item['human_answers']:
            for answer in item['human_answers']:
                if answer and len(str(answer).strip()) > 20:
                    texts.append(str(answer).strip())
                    labels.append(0)  # Human
        
        if 'chatgpt_answers' in item and item['chatgpt_answers']:
            for answer in item['chatgpt_answers']:
                if answer and len(str(answer).strip()) > 20:
                    texts.append(str(answer).strip())
                    labels.append(1)  # AI
        
        # Alternative format: 'text' and 'label'
        if 'text' in item and 'label' in item:
            text = str(item['text']).strip()
            if len(text) > 20:
                texts.append(text)
                labels.append(int(item['label']))
        
        # GPT-wiki-intro format: 'wiki_intro' (human) and 'generated_intro' (AI)
        if 'wiki_intro' in item and item['wiki_intro']:
            text = str(item['wiki_intro']).strip()
            if len(text) > 20:
                texts.append(text)
                labels.append(0)  # Human
        
        if 'generated_intro' in item and item['generated_intro']:
            text = str(item['generated_intro']).strip()
            if len(text) > 20:
                texts.append(text)
                labels.append(1)  # AI
    
    return texts, labels

# Process data
if hc3:
    # Get the training split (handle different dataset structures)
    if 'train' in hc3:
        data_split = hc3['train']
    else:
        # If no train split, use the first available split
        data_split = hc3[list(hc3.keys())[0]]
    
    texts, labels = prepare_hc3_data(data_split)
    print(f"\nProcessed data:")
    print(f"  Total samples: {len(texts)}")
    print(f"  Human samples (label=0): {labels.count(0)}")
    print(f"  AI samples (label=1): {labels.count(1)}")
else:
    print("‚ùå No dataset loaded. Cannot continue.")
    texts, labels = [], []

In [None]:
# Create balanced dataset for Twitter-like short texts
def filter_for_twitter(texts, labels, max_chars=500, min_chars=20):
    """
    Filter texts to be more Twitter-like (shorter texts).
    Also truncate longer texts to simulate tweet-length content.
    """
    filtered_texts = []
    filtered_labels = []
    
    for text, label in zip(texts, labels):
        # Clean text
        text = text.strip()
        
        # Skip very short texts
        if len(text) < min_chars:
            continue
        
        # For longer texts, take first portion (like a tweet thread opener)
        if len(text) > max_chars:
            # Find a good breaking point
            text = text[:max_chars]
            last_period = text.rfind('.')
            if last_period > max_chars // 2:
                text = text[:last_period + 1]
        
        filtered_texts.append(text)
        filtered_labels.append(label)
    
    return filtered_texts, filtered_labels

# Filter for Twitter-like content
texts, labels = filter_for_twitter(texts, labels)
print(f"\nAfter Twitter-like filtering:")
print(f"  Total samples: {len(texts)}")
print(f"  Human samples: {labels.count(0)}")
print(f"  AI samples: {labels.count(1)}")

In [None]:
# Balance the dataset
from collections import Counter
import random

def balance_dataset(texts, labels, max_per_class=15000):
    """
    Balance dataset to have equal human and AI samples.
    """
    human_texts = [(t, l) for t, l in zip(texts, labels) if l == 0]
    ai_texts = [(t, l) for t, l in zip(texts, labels) if l == 1]
    
    # Shuffle
    random.seed(42)
    random.shuffle(human_texts)
    random.shuffle(ai_texts)
    
    # Balance
    min_samples = min(len(human_texts), len(ai_texts), max_per_class)
    
    balanced = human_texts[:min_samples] + ai_texts[:min_samples]
    random.shuffle(balanced)
    
    return [t for t, l in balanced], [l for t, l in balanced]

texts, labels = balance_dataset(texts, labels)
print(f"\nBalanced dataset:")
print(f"  Total samples: {len(texts)}")
print(f"  Human samples: {labels.count(0)}")
print(f"  AI samples: {labels.count(1)}")

In [None]:
# Split into train/val/test
from sklearn.model_selection import train_test_split

# First split: 80% train, 20% temp
train_texts, temp_texts, train_labels, temp_labels = train_test_split(
    texts, labels, test_size=0.2, random_state=42, stratify=labels
)

# Second split: 50% val, 50% test from temp
val_texts, test_texts, val_labels, test_labels = train_test_split(
    temp_texts, temp_labels, test_size=0.5, random_state=42, stratify=temp_labels
)

print(f"Dataset splits:")
print(f"  Train: {len(train_texts)} samples")
print(f"  Validation: {len(val_texts)} samples")
print(f"  Test: {len(test_texts)} samples")

## Step 3: Tokenization

Load DistilBERT tokenizer and prepare data for training.

In [None]:
# Load tokenizer
model_name = "distilbert-base-uncased"
tokenizer = DistilBertTokenizer.from_pretrained(model_name)

print(f"Tokenizer loaded: {model_name}")
print(f"Vocab size: {tokenizer.vocab_size}")

In [None]:
# Create HuggingFace datasets
train_dataset = Dataset.from_dict({
    'text': train_texts,
    'label': train_labels
})

val_dataset = Dataset.from_dict({
    'text': val_texts,
    'label': val_labels
})

test_dataset = Dataset.from_dict({
    'text': test_texts,
    'label': test_labels
})

# Tokenize function
def tokenize_function(examples):
    return tokenizer(
        examples['text'],
        padding='max_length',
        truncation=True,
        max_length=128  # Shorter for Twitter-like content
    )

# Tokenize datasets
print("Tokenizing datasets...")
train_dataset = train_dataset.map(tokenize_function, batched=True)
val_dataset = val_dataset.map(tokenize_function, batched=True)
test_dataset = test_dataset.map(tokenize_function, batched=True)

# Set format for PyTorch
train_dataset.set_format('torch', columns=['input_ids', 'attention_mask', 'label'])
val_dataset.set_format('torch', columns=['input_ids', 'attention_mask', 'label'])
test_dataset.set_format('torch', columns=['input_ids', 'attention_mask', 'label'])

print("‚úÖ Tokenization complete!")

## Step 4: Model Setup

Load DistilBERT and configure for binary classification.

In [None]:
# Load model
model = DistilBertForSequenceClassification.from_pretrained(
    model_name,
    num_labels=2,
    id2label={0: "human", 1: "ai"},
    label2id={"human": 0, "ai": 1}
)

# Move to GPU if available
model = model.to(device)

# Count parameters
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f"Model loaded: {model_name}")
print(f"Total parameters: {total_params:,}")
print(f"Trainable parameters: {trainable_params:,}")
print(f"Estimated model size: {total_params * 4 / 1e6:.1f} MB (FP32)")

## Step 5: Training Configuration

In [None]:
# Define metrics
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    
    precision, recall, f1, _ = precision_recall_fscore_support(
        labels, predictions, average='binary'
    )
    accuracy = accuracy_score(labels, predictions)
    
    return {
        'accuracy': accuracy,
        'f1': f1,
        'precision': precision,
        'recall': recall
    }

# Training arguments
training_args = TrainingArguments(
    output_dir='./results',
    num_train_epochs=3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=32,
    warmup_steps=500,
    weight_decay=0.01,
    logging_dir='./logs',
    logging_steps=100,
    eval_strategy='steps',
    eval_steps=500,
    save_strategy='steps',
    save_steps=500,
    load_best_model_at_end=True,
    metric_for_best_model='f1',
    greater_is_better=True,
    fp16=torch.cuda.is_available(),  # Use mixed precision on GPU
    dataloader_num_workers=2,
    report_to='none'  # Disable wandb/tensorboard
)

print("Training configuration:")
print(f"  Epochs: {training_args.num_train_epochs}")
print(f"  Batch size: {training_args.per_device_train_batch_size}")
print(f"  Learning rate: {training_args.learning_rate}")
print(f"  Mixed precision (FP16): {training_args.fp16}")

In [None]:
# Create trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    compute_metrics=compute_metrics,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=3)]
)

print("‚úÖ Trainer initialized!")

## Step 6: Train the Model

‚è±Ô∏è This will take approximately 2-4 hours on a free Colab GPU.

In [None]:
# Train!
print("üöÄ Starting training...")
print("This may take 2-4 hours on a free GPU.")
print("-" * 50)

train_result = trainer.train()

print("-" * 50)
print("‚úÖ Training complete!")
print(f"Training time: {train_result.metrics['train_runtime'] / 60:.1f} minutes")

## Step 7: Evaluate Model

In [None]:
# Evaluate on test set
print("Evaluating on test set...")
test_results = trainer.evaluate(test_dataset)

print("\n" + "="*50)
print("TEST RESULTS")
print("="*50)
print(f"Accuracy:  {test_results['eval_accuracy']*100:.2f}%")
print(f"F1 Score:  {test_results['eval_f1']*100:.2f}%")
print(f"Precision: {test_results['eval_precision']*100:.2f}%")
print(f"Recall:    {test_results['eval_recall']*100:.2f}%")
print("="*50)

In [None]:
# Confusion matrix
predictions = trainer.predict(test_dataset)
preds = np.argmax(predictions.predictions, axis=1)
true_labels = predictions.label_ids

cm = confusion_matrix(true_labels, preds)

plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=['Human', 'AI'],
            yticklabels=['Human', 'AI'])
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()

print(f"\nConfusion Matrix Analysis:")
print(f"  True Negatives (Human‚ÜíHuman): {cm[0,0]}")
print(f"  False Positives (Human‚ÜíAI): {cm[0,1]}")
print(f"  False Negatives (AI‚ÜíHuman): {cm[1,0]}")
print(f"  True Positives (AI‚ÜíAI): {cm[1,1]}")

## Step 8: Test with Sample Texts

In [None]:
def predict_text(text):
    """Predict if text is human or AI generated."""
    inputs = tokenizer(
        text,
        return_tensors='pt',
        truncation=True,
        max_length=128,
        padding=True
    ).to(device)
    
    model.eval()
    with torch.no_grad():
        outputs = model(**inputs)
        probs = torch.softmax(outputs.logits, dim=1)
    
    human_prob = probs[0][0].item()
    ai_prob = probs[0][1].item()
    
    return {
        'text': text[:100] + '...' if len(text) > 100 else text,
        'human_prob': human_prob,
        'ai_prob': ai_prob,
        'prediction': 'AI' if ai_prob > 0.5 else 'Human',
        'confidence': max(human_prob, ai_prob)
    }

# Test samples
test_samples = [
    "Just had the best coffee at that new place downtown! Highly recommend their oat milk latte ü•§",
    "The implementation of machine learning algorithms in modern applications has revolutionized the way we process and analyze data, enabling unprecedented levels of efficiency and accuracy.",
    "lol cant believe what happened today üòÇ my dog literally ate my homework no joke",
    "In conclusion, the systematic analysis of the aforementioned factors demonstrates a clear correlation between the variables, suggesting that further research is warranted to fully understand the implications.",
    "anyone else tired of these AI takes? like just let people enjoy things"
]

print("\n" + "="*60)
print("SAMPLE PREDICTIONS")
print("="*60)

for sample in test_samples:
    result = predict_text(sample)
    print(f"\nText: {result['text']}")
    print(f"Prediction: {result['prediction']} ({result['confidence']*100:.1f}% confidence)")
    print(f"Scores: Human={result['human_prob']*100:.1f}%, AI={result['ai_prob']*100:.1f}%")
    print("-"*60)

## Step 9: Save PyTorch Model

In [None]:
# Save the model
save_path = './tweetguard_model'

model.save_pretrained(save_path)
tokenizer.save_pretrained(save_path)

print(f"‚úÖ Model saved to {save_path}")

# Check size
import os
total_size = 0
for f in os.listdir(save_path):
    size = os.path.getsize(os.path.join(save_path, f))
    total_size += size
    print(f"  {f}: {size/1e6:.2f} MB")
print(f"\nTotal size: {total_size/1e6:.2f} MB")

## Step 10: Convert to TensorFlow.js

Convert the model for browser deployment.

In [None]:
# First convert to TensorFlow SavedModel format
from transformers import TFDistilBertForSequenceClassification

print("Converting to TensorFlow format...")

# Load the PyTorch model into TensorFlow
tf_model = TFDistilBertForSequenceClassification.from_pretrained(
    save_path,
    from_pt=True
)

# Save as TensorFlow SavedModel
tf_save_path = './tweetguard_tf_model'
tf_model.save_pretrained(tf_save_path, saved_model=True)

print(f"‚úÖ TensorFlow model saved to {tf_save_path}")

In [None]:
# Convert to TensorFlow.js
import subprocess

tfjs_output_path = './tweetguard_tfjs_model'

# Find the saved_model directory
saved_model_path = f"{tf_save_path}/saved_model/1"

print("Converting to TensorFlow.js format...")
print("This may take a few minutes...")

# Convert with quantization for smaller size
result = subprocess.run([
    'tensorflowjs_converter',
    '--input_format=tf_saved_model',
    '--output_format=tfjs_graph_model',
    '--quantize_uint8',  # Quantize to reduce size
    '--skip_op_check',
    saved_model_path,
    tfjs_output_path
], capture_output=True, text=True)

if result.returncode == 0:
    print("‚úÖ TensorFlow.js conversion successful!")
else:
    print(f"‚ö†Ô∏è Conversion output: {result.stdout}")
    print(f"‚ö†Ô∏è Conversion errors: {result.stderr}")

In [None]:
# Check TensorFlow.js model size
import os

if os.path.exists(tfjs_output_path):
    total_size = 0
    print(f"\nTensorFlow.js model files:")
    for f in os.listdir(tfjs_output_path):
        filepath = os.path.join(tfjs_output_path, f)
        size = os.path.getsize(filepath)
        total_size += size
        print(f"  {f}: {size/1e6:.2f} MB")
    
    print(f"\nüì¶ Total TensorFlow.js model size: {total_size/1e6:.2f} MB")
    
    if total_size/1e6 < 50:
        print("‚úÖ Model is suitable for browser deployment!")
    else:
        print("‚ö†Ô∏è Model may be too large for optimal browser performance.")

## Step 11: Export Tokenizer for JavaScript

In [None]:
import json

# Export vocab for JavaScript tokenizer
vocab = tokenizer.get_vocab()

# Save vocab
vocab_path = f"{tfjs_output_path}/vocab.json"
with open(vocab_path, 'w') as f:
    json.dump(vocab, f)

print(f"‚úÖ Vocabulary exported to {vocab_path}")
print(f"   Vocabulary size: {len(vocab)} tokens")

# Export tokenizer config
tokenizer_config = {
    'max_length': 128,
    'pad_token_id': tokenizer.pad_token_id,
    'cls_token_id': tokenizer.cls_token_id,
    'sep_token_id': tokenizer.sep_token_id,
    'unk_token_id': tokenizer.unk_token_id,
    'do_lower_case': True
}

config_path = f"{tfjs_output_path}/tokenizer_config.json"
with open(config_path, 'w') as f:
    json.dump(tokenizer_config, f, indent=2)

print(f"‚úÖ Tokenizer config exported to {config_path}")

## Step 12: Download Model Files

Download the TensorFlow.js model files to use in the Chrome extension.

In [None]:
# Zip the model for download
import shutil

zip_path = './tweetguard_tfjs_model.zip'
shutil.make_archive(
    zip_path.replace('.zip', ''),
    'zip',
    tfjs_output_path
)

print(f"‚úÖ Model zipped to {zip_path}")

# Download link (for Colab)
try:
    from google.colab import files
    print("\nüì• Downloading model...")
    files.download(zip_path)
except:
    print(f"\nüìÅ Model ready at: {zip_path}")
    print("Download it manually or use the Files panel on the left.")

## üéâ Training Complete!

### Next Steps:

1. **Download** the `tweetguard_tfjs_model.zip` file
2. **Extract** it to your Chrome extension's `model/` directory
3. **Update** your extension's `detector.js` to load the TensorFlow.js model

### Model Files You'll Need:
- `model.json` - Model architecture
- `group*.bin` - Model weights (sharded)
- `vocab.json` - Tokenizer vocabulary
- `tokenizer_config.json` - Tokenizer settings

### Expected Performance:
- **Accuracy**: ~75-82%
- **Model Size**: ~15-50MB (with quantization)
- **Inference Time**: ~10-30ms per tweet

In [None]:
# Final summary
print("\n" + "="*60)
print("üìä TRAINING SUMMARY")
print("="*60)
print(f"Model: DistilBERT-base-uncased")
print(f"Task: Binary Classification (Human vs AI)")
print(f"Training samples: {len(train_texts)}")
print(f"Test Accuracy: {test_results['eval_accuracy']*100:.2f}%")
print(f"Test F1 Score: {test_results['eval_f1']*100:.2f}%")
print(f"")
print(f"Output files:")
print(f"  - PyTorch model: {save_path}")
print(f"  - TensorFlow model: {tf_save_path}")
print(f"  - TensorFlow.js model: {tfjs_output_path}")
print(f"  - Download: {zip_path}")
print("="*60)
print("\n‚úÖ Ready for Chrome extension integration!")