# 🧠 AI Classifier - Annotating Human vs. AI Content
This notebook builds a simple AI detector using Sentence Transformers and Logistic Regression.

In [None]:
!pip install -U sentence-transformers scikit-learn

In [1]:
from sklearn.datasets import fetch_20newsgroups
from sentence_transformers import SentenceTransformer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score
import matplotlib.pyplot as plt
import numpy as np

In [2]:
# Load sample human-written news texts
human_texts = fetch_20newsgroups(subset='train', remove=('headers', 'footers', 'quotes')).data[:200]

# Simulate AI-generated texts
ai_texts = ["Artificial intelligence is transforming industries through automation and prediction." for _ in range(200)]

# Combine and label
texts = human_texts + ai_texts
labels = [0]*len(human_texts) + [1]*len(ai_texts)  # 0 = Human, 1 = AI

In [3]:
# Load sentence transformer model and generate embeddings
model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = model.encode(texts, show_progress_bar=True)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Batches:   0%|          | 0/13 [00:00<?, ?it/s]

  return forward_call(*args, **kwargs)


In [4]:
# Train-test split and train logistic regression
X_train, X_test, y_train, y_test = train_test_split(embeddings, labels, test_size=0.2, random_state=42)
clf = LogisticRegression(max_iter=1000)
clf.fit(X_train, y_train)

In [None]:
# Evaluate
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=["Human", "AI"]))

In [6]:
# Try your own text
your_text = "Data science enables businesses to make better decisions."
your_embedding = model.encode([your_text])
prediction = clf.predict(your_embedding)[0]
print("Prediction:", "AI-Generated" if prediction == 1 else "Human-Written")

Prediction: Human-Written


  return forward_call(*args, **kwargs)


# ⚠️ **CAUTION: Heuristic AI Content Classifier**

This method is **simple**, **interpretable**, and useful for **quick experimentation**, but it's important to be aware of its **limitations**:

- ❌ **Not trained or fine-tuned on labeled attribution data**  
- 🎯 **Not reliable at scale** — should not be used for high-stakes decisions  
- 🔁 **Vulnerable to paraphrasing or rewording tricks**  
- 🌍 **Language-specific** — performance can vary across languages or domains

---

## ✅ **When It’s Useful**

- Quick **proof-of-concept** development  
- Demonstrating basic **AI vs. Human text differences**  
- **Educational use** for understanding attribution logic  
- Projects where **speed and transparency** matter more than precision  
- Serving as a **fallback or sanity check** alongside more advanced models

---

## ✅ **Suggested Best Practices**

- Use it as a **lightweight baseline**, not as a definitive judgment tool  
- Regularly **test against known edge cases** like paraphrased or translated inputs  
- Be cautious in **non-English** or domain-specific use cases — results may degrade  
- Always follow up with **human-in-the-loop review** before taking action

