# TCS Embedding Adaptor Learning Experiment

## Goal: Learn Domain-Specific Embedding Adaptation

This experiment explores whether a simple transformation matrix can improve retrieval for TCS-specific queries.

**Key Learning Question**: Does adapting query embeddings help with domain-specific language?

## The Concept
- **Problem**: General embeddings may not capture TCS-specific relationships
- **Solution**: Learn a matrix that transforms queries: `adapted_query = matrix × original_query`
- **Learning**: Understanding parameter-efficient adaptation (like LoRA for retrieval)

## Experiment Flow
1. Generate 20 training + 20 test questions
2. Create training data with GPT-4.1 relevance labeling
3. Train simple 384×384 transformation matrix
4. Compare Simple RAG vs Adapted RAG
5. Analyze what we learned about domain adaptation

In [1]:
# Setup and Imports
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from sentence_transformers import SentenceTransformer
import chromadb
from chromadb.utils.embedding_functions import SentenceTransformerEmbeddingFunction
import os
from dotenv import load_dotenv
from openai import OpenAI
import pandas as pd
import time
import json
from sklearn.metrics.pairwise import cosine_similarity

# Load environment
load_dotenv()
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

print("✅ All imports successful!")
print("🎯 Ready to start embedding adaptation experiment")

  from .autonotebook import tqdm as notebook_tqdm


✅ All imports successful!
🎯 Ready to start embedding adaptation experiment


In [2]:
# Connect to Existing ChromaDB Collection
CHROMA_DB_PATH = "./chroma_db"
COLLECTION_NAME = "tcs_annual_report_2024"

print(f"🗃️  Connecting to ChromaDB at {CHROMA_DB_PATH}...")

# Initialize ChromaDB client
chroma_client = chromadb.PersistentClient(path=CHROMA_DB_PATH)
embedding_function = SentenceTransformerEmbeddingFunction()

# Get existing collection
chroma_collection = chroma_client.get_collection(
    COLLECTION_NAME,
    embedding_function=embedding_function
)

# Verify collection
count = chroma_collection.count()
print(f"✅ Connected to collection: {COLLECTION_NAME}")
print(f"📊 Total document chunks: {count:,}")

# Initialize SentenceTransformer model (same as ChromaDB)
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
print(f"🤖 Embedding model loaded: {embedding_model.get_sentence_embedding_dimension()} dimensions")

print("\n🚀 Ready for embedding adaptation experiment!")

🗃️  Connecting to ChromaDB at ./chroma_db...
✅ Connected to collection: tcs_annual_report_2024
📊 Total document chunks: 1,324
🤖 Embedding model loaded: 384 dimensions

🚀 Ready for embedding adaptation experiment!


# Step 1: Generate Training and Test Questions

We'll create 40 questions total:
- 20 training questions focusing on TCS-specific terminology
- 20 test questions for comparison

Focus on terms that general embeddings might struggle with.

In [3]:
# Training Questions (20) - TCS-Specific Terms and Concepts
training_questions = [
    "What are the key features of TCS WisdomNext platform?",
    "How does TCS Pace Port accelerate innovation for clients?",
    "What is the role of TCS Co-innovation Network in client transformation?",
    "How does TCS BaNCS support banking operations?",
    "What capabilities does TCS Crystallus offer for telecommunications?",
    "How does TCS OptumeraTM enhance retail operations?",
    "What is the purpose of TCS OmniStore platform?",
    "How does TCS ignioTM transform IT operations?",
    "What role does TCS Clever Energy play in sustainability?",
    "How does TCS MasterCraft support software engineering?",
    "What is the function of TCS TwinX in digital transformation?",
    "How does TCS InTwin enable simulable enterprises?",
    "What services does TCS iON provide for learning ecosystems?",
    "How does TCS ADD platform support clinical trials?",
    "What is the impact of TCS CodeVita programming contest?",
    "How does TCS Bringing Life to Things Lab drive IoT innovation?",
    "What is the significance of TCS Quantum Diamond Microchip collaboration?",
    "How does TCS support the National Quantum Mission initiative?",
    "What role does TCS Pace Studio play in regional innovation?",
    "How does TCS All-women Digital Services Center empower talent?"
]

print(f"📋 Generated {len(training_questions)} training questions")
print("\nSample training questions:")
for i, q in enumerate(training_questions[:3], 1):
    print(f"{i}. {q}")
print("...")

print(f"\n✅ Training questions ready: {len(training_questions)} TCS-specific questions")

📋 Generated 20 training questions

Sample training questions:
1. What are the key features of TCS WisdomNext platform?
2. How does TCS Pace Port accelerate innovation for clients?
3. What is the role of TCS Co-innovation Network in client transformation?
...

✅ Training questions ready: 20 TCS-specific questions


In [4]:
# Test Questions (20) - Different TCS Topics for Comparison
test_questions = [
    "What are TCS's main research and development priorities?",
    "How does TCS approach digital transformation for manufacturing clients?",
    "What is TCS's strategy for artificial intelligence implementation?",
    "How does TCS support clients in cloud migration initiatives?",
    "What role does TCS play in cybersecurity solutions?",
    "How does TCS contribute to sustainability and green energy initiatives?",
    "What is TCS's approach to talent development and reskilling?",
    "How does TCS support government digital transformation projects?",
    "What are TCS's key partnerships with technology vendors?",
    "How does TCS approach innovation in financial services?",
    "What is TCS's methodology for enterprise modernization?",
    "How does TCS support healthcare and life sciences digitization?",
    "What are TCS's capabilities in data analytics and insights?",
    "How does TCS approach automation in business operations?",
    "What role does TCS play in telecommunications modernization?",
    "How does TCS support retail and consumer business transformation?",
    "What is TCS's approach to regulatory compliance solutions?",
    "How does TCS enable supply chain optimization for clients?",
    "What are TCS's key differentiators in the IT services market?",
    "How does TCS measure and ensure client satisfaction?"
]

print(f"📋 Generated {len(test_questions)} test questions")
print("\nSample test questions:")
for i, q in enumerate(test_questions[:3], 1):
    print(f"{i}. {q}")
print("...")

print(f"\n✅ Test questions ready: {len(test_questions)} questions for comparison")
print(f"\n🎯 Total questions generated: {len(training_questions) + len(test_questions)}")

📋 Generated 20 test questions

Sample test questions:
1. What are TCS's main research and development priorities?
2. How does TCS approach digital transformation for manufacturing clients?
3. What is TCS's strategy for artificial intelligence implementation?
...

✅ Test questions ready: 20 questions for comparison

🎯 Total questions generated: 40


# Step 2: Create Training Data

For each training question:
1. Retrieve top 10 chunks from ChromaDB
2. Use GPT-4.1 to label each (query, chunk) pair as relevant/irrelevant
3. Build training dataset of ~200 labeled examples

In [5]:
# Retrieve Chunks for Training Questions
def retrieve_chunks_for_question(question, collection, n_results=10):
    """Retrieve top chunks for a question from ChromaDB"""
    results = collection.query(
        query_texts=[question],
        n_results=n_results
    )
    
    return {
        'question': question,
        'chunks': results['documents'][0],
        'distances': results['distances'][0],
        'ids': results['ids'][0]
    }

print("🔍 Retrieving chunks for training questions...")

# Retrieve chunks for all training questions
training_data_raw = []

for i, question in enumerate(training_questions, 1):
    print(f"Retrieving for Q{i}/20: {question[:50]}...")
    
    result = retrieve_chunks_for_question(question, chroma_collection, n_results=10)
    training_data_raw.append(result)

print(f"\n✅ Retrieved chunks for {len(training_data_raw)} questions")

# Calculate total training pairs
total_pairs = sum(len(item['chunks']) for item in training_data_raw)
print(f"📊 Total (question, chunk) pairs: {total_pairs}")

print("\n🎯 Ready for relevance labeling with GPT-4.1")

🔍 Retrieving chunks for training questions...
Retrieving for Q1/20: What are the key features of TCS WisdomNext platfo...
Retrieving for Q2/20: How does TCS Pace Port accelerate innovation for c...
Retrieving for Q3/20: What is the role of TCS Co-innovation Network in c...
Retrieving for Q4/20: How does TCS BaNCS support banking operations?...
Retrieving for Q5/20: What capabilities does TCS Crystallus offer for te...
Retrieving for Q6/20: How does TCS OptumeraTM enhance retail operations?...
Retrieving for Q7/20: What is the purpose of TCS OmniStore platform?...
Retrieving for Q8/20: How does TCS ignioTM transform IT operations?...
Retrieving for Q9/20: What role does TCS Clever Energy play in sustainab...
Retrieving for Q10/20: How does TCS MasterCraft support software engineer...
Retrieving for Q11/20: What is the function of TCS TwinX in digital trans...
Retrieving for Q12/20: How does TCS InTwin enable simulable enterprises?...
Retrieving for Q13/20: What services does TCS iON pro

In [6]:
# Label Relevance with GPT-4.1
def label_relevance(question, chunk):
    """Use GPT-4.1 to label (question, chunk) pair relevance"""
    prompt = f"""You are evaluating document chunk relevance for a TCS Annual Report question.

Question: {question}

Document Chunk: {chunk}

Is this chunk relevant to answering the question?
- Reply "1" if the chunk contains information that helps answer the question
- Reply "0" if the chunk is not relevant or doesn't help answer the question

Only respond with "1" or "0"."""
    
    try:
        response = client.responses.create(
            model="gpt-4.1",
            input=prompt
        )
        
        # Extract response
        content = response.output_text.strip()
        
        # Parse label
        if content == "1":
            return 1
        elif content == "0":
            return 0
        else:
            # Fallback parsing
            if "1" in content:
                return 1
            else:
                return 0
                
    except Exception as e:
        print(f"Error labeling relevance: {e}")
        return 0  # Default to not relevant

print("🤖 Starting relevance labeling with GPT-4.1...")
print("This will take a few minutes...")

# Create labeled training dataset
labeled_training_data = []

for q_idx, question_data in enumerate(training_data_raw, 1):
    question = question_data['question']
    print(f"\nLabeling Q{q_idx}/20: {question[:40]}...")
    
    for chunk_idx, chunk in enumerate(question_data['chunks'], 1):
        print(f"  Chunk {chunk_idx}/10...", end="")
        
        # Get relevance label
        label = label_relevance(question, chunk)
        
        # Store labeled pair
        labeled_training_data.append({
            'question': question,
            'chunk': chunk,
            'label': label,
            'distance': question_data['distances'][chunk_idx-1],
            'chunk_id': question_data['ids'][chunk_idx-1]
        })
        
        print(f" Label: {label}")

print(f"\n✅ Labeling complete!")
print(f"📊 Total labeled pairs: {len(labeled_training_data)}")

# Show label distribution
relevant_count = sum(1 for item in labeled_training_data if item['label'] == 1)
irrelevant_count = len(labeled_training_data) - relevant_count

print(f"📈 Label distribution:")
print(f"   Relevant (1): {relevant_count} ({relevant_count/len(labeled_training_data)*100:.1f}%)")
print(f"   Irrelevant (0): {irrelevant_count} ({irrelevant_count/len(labeled_training_data)*100:.1f}%)")

print("\n🎯 Training data ready for adaptor matrix training!")

🤖 Starting relevance labeling with GPT-4.1...
This will take a few minutes...

Labeling Q1/20: What are the key features of TCS WisdomN...
  Chunk 1/10... Label: 1
  Chunk 2/10... Label: 1
  Chunk 3/10... Label: 0
  Chunk 4/10... Label: 0
  Chunk 5/10... Label: 0
  Chunk 6/10... Label: 0
  Chunk 7/10... Label: 0
  Chunk 8/10... Label: 0
  Chunk 9/10... Label: 1
  Chunk 10/10... Label: 0

Labeling Q2/20: How does TCS Pace Port accelerate innova...
  Chunk 1/10... Label: 1
  Chunk 2/10... Label: 1
  Chunk 3/10... Label: 1
  Chunk 4/10... Label: 1
  Chunk 5/10... Label: 0
  Chunk 6/10... Label: 0
  Chunk 7/10... Label: 0
  Chunk 8/10... Label: 1
  Chunk 9/10... Label: 0
  Chunk 10/10... Label: 0

Labeling Q3/20: What is the role of TCS Co-innovation Ne...
  Chunk 1/10... Label: 1
  Chunk 2/10... Label: 0
  Chunk 3/10... Label: 1
  Chunk 4/10... Label: 0
  Chunk 5/10... Label: 0
  Chunk 6/10... Label: 1
  Chunk 7/10... Label: 1
  Chunk 8/10... Label: 1
  Chunk 9/10... Label: 0
  Chunk 10/1

# Step 3: Train Simple Adaptor Matrix

Now we'll train a 384×384 transformation matrix to adapt query embeddings.

**Goal**: Learn `adapted_query = matrix × original_query` that improves cosine similarity with relevant chunks.

In [7]:
# Prepare Embeddings for Training
print("🔢 Preparing embeddings for training...")

# Get unique questions and chunks
unique_questions = list(set(item['question'] for item in labeled_training_data))
all_chunks = [item['chunk'] for item in labeled_training_data]

print(f"Generating embeddings for {len(unique_questions)} questions...")
question_embeddings = embedding_model.encode(unique_questions)

print(f"Generating embeddings for {len(all_chunks)} chunks...")
chunk_embeddings = embedding_model.encode(all_chunks)

# Create mapping from question to embedding
question_to_embedding = {q: emb for q, emb in zip(unique_questions, question_embeddings)}

print(f"✅ Embeddings ready:")
print(f"   Question embeddings: {question_embeddings.shape}")
print(f"   Chunk embeddings: {chunk_embeddings.shape}")
print(f"   Embedding dimension: {question_embeddings.shape[1]}")

# Prepare training data
training_pairs = []

for i, item in enumerate(labeled_training_data):
    question_emb = question_to_embedding[item['question']]
    chunk_emb = chunk_embeddings[i]
    label = item['label']
    
    training_pairs.append({
        'question_emb': question_emb,
        'chunk_emb': chunk_emb,
        'label': label
    })

print(f"\n🎯 Training pairs ready: {len(training_pairs)}")
print("Ready for matrix training!")

🔢 Preparing embeddings for training...
Generating embeddings for 20 questions...
Generating embeddings for 200 chunks...
✅ Embeddings ready:
   Question embeddings: (20, 384)
   Chunk embeddings: (200, 384)
   Embedding dimension: 384

🎯 Training pairs ready: 200
Ready for matrix training!


In [8]:
# Train Adaptor Matrix
class AdaptorMatrix(nn.Module):
    def __init__(self, embedding_dim):
        super(AdaptorMatrix, self).__init__()
        # Initialize transformation matrix
        self.transform = nn.Linear(embedding_dim, embedding_dim, bias=False)
        
        # Initialize weights (start close to identity)
        nn.init.eye_(self.transform.weight)
        
    def forward(self, query_embeddings):
        # Apply transformation
        return self.transform(query_embeddings)

# Setup training
embedding_dim = question_embeddings.shape[1]
adaptor = AdaptorMatrix(embedding_dim)
optimizer = optim.Adam(adaptor.parameters(), lr=0.001)
criterion = nn.MSELoss()

print(f"🤖 Adaptor Matrix initialized: {embedding_dim}×{embedding_dim}")
print(f"📊 Parameters: {sum(p.numel() for p in adaptor.parameters()):,}")

# Convert to tensors
question_tensors = torch.FloatTensor([pair['question_emb'] for pair in training_pairs])
chunk_tensors = torch.FloatTensor([pair['chunk_emb'] for pair in training_pairs])
label_tensors = torch.FloatTensor([pair['label'] for pair in training_pairs])

print(f"\n🔄 Starting training...")
print(f"Training data: {len(training_pairs)} pairs")

# Training loop
num_epochs = 100
losses = []

for epoch in range(num_epochs):
    optimizer.zero_grad()
    
    # Forward pass
    adapted_questions = adaptor(question_tensors)
    
    # Calculate cosine similarities
    similarities = torch.sum(adapted_questions * chunk_tensors, dim=1) / (
        torch.norm(adapted_questions, dim=1) * torch.norm(chunk_tensors, dim=1)
    )
    
    # Loss: MSE between similarity and labels
    loss = criterion(similarities, label_tensors)
    
    # Backward pass
    loss.backward()
    optimizer.step()
    
    losses.append(loss.item())
    
    if (epoch + 1) % 20 == 0:
        print(f"Epoch {epoch+1}/{num_epochs}, Loss: {loss.item():.4f}")

print(f"\n✅ Training complete!")
print(f"📉 Final loss: {losses[-1]:.4f}")
print(f"📈 Loss improvement: {losses[0]:.4f} → {losses[-1]:.4f}")

# Save the trained adaptor
torch.save(adaptor.state_dict(), 'tcs_adaptor_matrix.pth')
print(f"💾 Adaptor matrix saved to: tcs_adaptor_matrix.pth")

print("\n🎯 Adaptor matrix training complete! Ready for testing.")

🤖 Adaptor Matrix initialized: 384×384
📊 Parameters: 147,456

🔄 Starting training...
Training data: 200 pairs
Epoch 20/100, Loss: 0.0753


  question_tensors = torch.FloatTensor([pair['question_emb'] for pair in training_pairs])


Epoch 40/100, Loss: 0.0536
Epoch 60/100, Loss: 0.0467
Epoch 80/100, Loss: 0.0442
Epoch 100/100, Loss: 0.0432

✅ Training complete!
📉 Final loss: 0.0432
📈 Loss improvement: 0.2251 → 0.0432
💾 Adaptor matrix saved to: tcs_adaptor_matrix.pth

🎯 Adaptor matrix training complete! Ready for testing.


# Step 4: Compare Simple RAG vs Adapted RAG

Now we'll test both approaches on our 20 test questions and see which performs better.

In [11]:
# Implement RAG Functions
def simple_rag(question, collection, client, n_chunks=3):
    """Standard RAG with original embeddings"""
    start_time = time.time()
    
    # Retrieve chunks
    results = collection.query(
        query_texts=[question],
        n_results=n_chunks
    )
    
    chunks = results['documents'][0]
    context = "\n\n".join(chunks)
    
    # Generate answer
    prompt = f"""Based on the TCS Annual Report information below, answer the question.

Context:
{context}

Question: {question}

Answer based only on the provided context:"""
    
    response = client.responses.create(
        model="gpt-4.1",
        input=prompt
    )
    
    runtime = time.time() - start_time
    
    return {
        'answer': response.output_text,
        'chunks': chunks,
        'runtime': round(runtime, 1)
    }

def adapted_rag(question, collection, client, adaptor_model, embedding_model, n_chunks=3):
    """RAG with adapted query embeddings"""
    start_time = time.time()
    
    # Get original query embedding
    original_embedding = embedding_model.encode([question])[0]
    
    # Apply adaptor transformation
    with torch.no_grad():
        original_tensor = torch.FloatTensor(original_embedding).unsqueeze(0)
        adapted_tensor = adaptor_model(original_tensor)
        adapted_embedding = adapted_tensor.squeeze(0).numpy()
    
    # Get all document embeddings from ChromaDB for similarity search
    all_results = collection.get(
        include=["embeddings", "documents"]
    )
    
    doc_embeddings = np.array(all_results['embeddings'])
    
    # Calculate similarities with adapted query
    similarities = cosine_similarity([adapted_embedding], doc_embeddings)[0]
    
    # Get top chunks
    top_indices = np.argsort(similarities)[::-1][:n_chunks]
    chunks = [all_results['documents'][i] for i in top_indices]
    
    context = "\n\n".join(chunks)
    
    # Generate answer
    prompt = f"""Based on the TCS Annual Report information below, answer the question.

Context:
{context}

Question: {question}

Answer based only on the provided context:"""
    
    response = client.responses.create(
        model="gpt-4.1",
        input=prompt
    )
    
    runtime = time.time() - start_time
    
    return {
        'answer': response.output_text,
        'chunks': chunks,
        'runtime': round(runtime, 1)
    }

print("✅ RAG functions implemented")
print("   • simple_rag: Standard retrieval")
print("   • adapted_rag: Uses trained adaptor matrix")
print("\n🎯 Ready to test on 20 questions!")

✅ RAG functions implemented
   • simple_rag: Standard retrieval
   • adapted_rag: Uses trained adaptor matrix

🎯 Ready to test on 20 questions!


In [12]:
# Run Comparison Test
print("🔬 TESTING: Simple RAG vs Adapted RAG")
print("=" * 50)

comparison_results = []

for i, question in enumerate(test_questions, 1):
    print(f"\nTesting Q{i}/20: {question[:60]}...")
    
    try:
        # Test Simple RAG
        print("  🔄 Simple RAG...", end="")
        simple_result = simple_rag(question, chroma_collection, client)
        print(f" ✅ ({simple_result['runtime']}s)")
        
        # Test Adapted RAG
        print("  🔄 Adapted RAG...", end="")
        adapted_result = adapted_rag(question, chroma_collection, client, adaptor, embedding_model)
        print(f" ✅ ({adapted_result['runtime']}s)")
        
        # Store results
        comparison_results.append({
            'question_id': i,
            'question': question,
            'simple_answer': simple_result['answer'],
            'simple_chunks': simple_result['chunks'],
            'simple_time': simple_result['runtime'],
            'adapted_answer': adapted_result['answer'],
            'adapted_chunks': adapted_result['chunks'],
            'adapted_time': adapted_result['runtime']
        })
        
    except Exception as e:
        print(f" ❌ Error: {e}")
        comparison_results.append({
            'question_id': i,
            'question': question,
            'simple_answer': f"Error: {e}",
            'simple_chunks': [],
            'simple_time': 0,
            'adapted_answer': f"Error: {e}",
            'adapted_chunks': [],
            'adapted_time': 0
        })

print(f"\n✅ Comparison testing complete!")
print(f"📊 Tested {len(comparison_results)} questions")
print(f"⏱️  Average Simple RAG time: {np.mean([r['simple_time'] for r in comparison_results if r['simple_time'] > 0]):.1f}s")
print(f"⏱️  Average Adapted RAG time: {np.mean([r['adapted_time'] for r in comparison_results if r['adapted_time'] > 0]):.1f}s")

print("\n🎯 Ready for judge evaluation!")

🔬 TESTING: Simple RAG vs Adapted RAG

Testing Q1/20: What are TCS's main research and development priorities?...
  🔄 Simple RAG... ✅ (7.0s)
  🔄 Adapted RAG... ✅ (3.5s)

Testing Q2/20: How does TCS approach digital transformation for manufacturi...
  🔄 Simple RAG... ✅ (5.2s)
  🔄 Adapted RAG... ✅ (6.4s)

Testing Q3/20: What is TCS's strategy for artificial intelligence implement...
  🔄 Simple RAG... ✅ (6.9s)
  🔄 Adapted RAG... ✅ (9.3s)

Testing Q4/20: How does TCS support clients in cloud migration initiatives?...
  🔄 Simple RAG... ✅ (6.8s)
  🔄 Adapted RAG... ✅ (7.5s)

Testing Q5/20: What role does TCS play in cybersecurity solutions?...
  🔄 Simple RAG... ✅ (3.3s)
  🔄 Adapted RAG... ✅ (3.6s)

Testing Q6/20: How does TCS contribute to sustainability and green energy i...
  🔄 Simple RAG... ✅ (6.0s)
  🔄 Adapted RAG... ✅ (5.9s)

Testing Q7/20: What is TCS's approach to talent development and reskilling?...
  🔄 Simple RAG... ✅ (4.0s)
  🔄 Adapted RAG... ✅ (5.0s)

Testing Q8/20: How does TCS su

In [13]:
# Judge Evaluation: Which RAG approach is better?
def judge_comparison(question, simple_answer, adapted_answer):
    """Use GPT-4.1 to judge which answer is better"""
    prompt = f"""Compare these two answers to a TCS Annual Report question.

Question: {question}

Answer A (Simple RAG): {simple_answer}

Answer B (Adapted RAG): {adapted_answer}

Which answer is better? Consider accuracy, completeness, and relevance.
Respond with exactly one of:
- "A" if Answer A is better
- "B" if Answer B is better  
- "TIE" if both answers are equally good

Only respond with A, B, or TIE."""
    
    try:
        response = client.responses.create(
            model="gpt-4.1",
            input=prompt
        )
        
        result = response.output_text.strip().upper()
        
        if result in ['A', 'B', 'TIE']:
            return result
        else:
            # Fallback parsing
            if 'A' in result and 'B' not in result:
                return 'A'
            elif 'B' in result and 'A' not in result:
                return 'B'
            else:
                return 'TIE'
                
    except Exception as e:
        print(f"Judge error: {e}")
        return 'TIE'

print("🤖 Starting judge evaluation...")
print("This will take a few minutes...")

# Evaluate each comparison
for i, result in enumerate(comparison_results, 1):
    question = result['question']
    simple_answer = result['simple_answer']
    adapted_answer = result['adapted_answer']
    
    print(f"Judging Q{i}/20: {question[:40]}...", end="")
    
    judgment = judge_comparison(question, simple_answer, adapted_answer)
    comparison_results[i-1]['judgment'] = judgment
    
    print(f" {judgment}")

print("\n✅ Judge evaluation complete!")

# Calculate results
judgments = [r['judgment'] for r in comparison_results]
simple_wins = judgments.count('A')
adapted_wins = judgments.count('B')
ties = judgments.count('TIE')

print(f"\n🏆 JUDGE RESULTS:")
print(f"   Simple RAG wins: {simple_wins}/{len(judgments)}")
print(f"   Adapted RAG wins: {adapted_wins}/{len(judgments)}")
print(f"   Ties: {ties}/{len(judgments)}")

print("\n🎯 Ready for final analysis!")

🤖 Starting judge evaluation...
This will take a few minutes...
Judging Q1/20: What are TCS's main research and develop... A
Judging Q2/20: How does TCS approach digital transforma... B
Judging Q3/20: What is TCS's strategy for artificial in... B
Judging Q4/20: How does TCS support clients in cloud mi... B
Judging Q5/20: What role does TCS play in cybersecurity... B
Judging Q6/20: How does TCS contribute to sustainabilit... B
Judging Q7/20: What is TCS's approach to talent develop... A
Judging Q8/20: How does TCS support government digital ... A
Judging Q9/20: What are TCS's key partnerships with tec... A
Judging Q10/20: How does TCS approach innovation in fina... B
Judging Q11/20: What is TCS's methodology for enterprise... B
Judging Q12/20: How does TCS support healthcare and life... B
Judging Q13/20: What are TCS's capabilities in data anal... B
Judging Q14/20: How does TCS approach automation in busi... B
Judging Q15/20: What role does TCS play in telecommunica... B
Judging Q16/20: 

# Step 5: Analysis and Learning Insights

Let's analyze what we learned from this embedding adaptation experiment.

In [14]:
# Final Analysis and Learning Insights
print("📊 EMBEDDING ADAPTOR EXPERIMENT - FINAL ANALYSIS")
print("=" * 60)

# Create results DataFrame
df_results = pd.DataFrame(comparison_results)

# Overall Performance Summary
print("\n🏆 OVERALL RESULTS:")
print("-" * 30)
print(f"Total questions tested: {len(comparison_results)}")
print(f"Simple RAG wins: {simple_wins} ({simple_wins/len(judgments)*100:.1f}%)")
print(f"Adapted RAG wins: {adapted_wins} ({adapted_wins/len(judgments)*100:.1f}%)")
print(f"Ties: {ties} ({ties/len(judgments)*100:.1f}%)")

# Performance Analysis
if adapted_wins > simple_wins:
    winner = "Adapted RAG"
    improvement = adapted_wins - simple_wins
elif simple_wins > adapted_wins:
    winner = "Simple RAG"
    improvement = simple_wins - adapted_wins
else:
    winner = "TIE"
    improvement = 0

print(f"\n🏅 Winner: {winner}")
if improvement > 0:
    print(f"   Advantage: +{improvement} questions ({improvement/len(judgments)*100:.1f}%)")

# Speed Analysis
avg_simple_time = np.mean([r['simple_time'] for r in comparison_results if r['simple_time'] > 0])
avg_adapted_time = np.mean([r['adapted_time'] for r in comparison_results if r['adapted_time'] > 0])

print(f"\n⏱️  SPEED ANALYSIS:")
print(f"   Simple RAG avg time: {avg_simple_time:.1f}s")
print(f"   Adapted RAG avg time: {avg_adapted_time:.1f}s")
print(f"   Speed difference: {((avg_adapted_time - avg_simple_time)/avg_simple_time*100):+.1f}%")

# Training Data Analysis
print(f"\n📈 TRAINING DATA INSIGHTS:")
print(f"   Training pairs used: {len(labeled_training_data)}")
print(f"   Relevant pairs: {relevant_count} ({relevant_count/len(labeled_training_data)*100:.1f}%)")
print(f"   Matrix parameters: {sum(p.numel() for p in adaptor.parameters()):,}")
print(f"   Training epochs: {num_epochs}")
print(f"   Final training loss: {losses[-1]:.4f}")

print(f"\n💡 KEY LEARNING INSIGHTS:")
print("-" * 30)

if adapted_wins > simple_wins:
    print("✅ Domain adaptation HELPED:")
    print(f"   • Adapted RAG won {adapted_wins}/{len(judgments)} comparisons")
    print("   • The matrix learned useful TCS-specific transformations")
    print("   • Parameter-efficient adaptation can improve retrieval")
else:
    print("❓ Domain adaptation had MIXED results:")
    print(f"   • Simple RAG still won {simple_wins}/{len(judgments)} comparisons")
    print("   • General embeddings may already capture TCS concepts well")
    print("   • More training data or different architecture might help")

print(f"\n🎯 METHODOLOGY LEARNINGS:")
print("   • Successfully implemented parameter-efficient embedding adaptation")
print("   • GPT-4.1 labeling created realistic training data")
print(f"   • Matrix training converged (loss: {losses[0]:.4f} → {losses[-1]:.4f})")
print("   • End-to-end evaluation pipeline worked")

print(f"\n🔍 WHAT THIS TEACHES US:")
print("   • Embedding spaces can be adapted with simple transformations")
print("   • Domain-specific improvements require careful evaluation")
print("   • Parameter efficiency: 147k params vs full model retraining")
print("   • Real-world adaptation benefits depend on domain gap")

# Save results
timestamp = time.strftime("%Y%m%d_%H%M%S")
results_file = f"embedding_adaptor_results_{timestamp}.csv"
df_results.to_csv(results_file, index=False)
print(f"\n💾 Results saved to: {results_file}")

print(f"\n🎉 EMBEDDING ADAPTOR EXPERIMENT COMPLETE!")
print("🎓 You've successfully learned about domain-specific embedding adaptation!")

📊 EMBEDDING ADAPTOR EXPERIMENT - FINAL ANALYSIS

🏆 OVERALL RESULTS:
------------------------------
Total questions tested: 20
Simple RAG wins: 5 (25.0%)
Adapted RAG wins: 15 (75.0%)
Ties: 0 (0.0%)

🏅 Winner: Adapted RAG
   Advantage: +10 questions (50.0%)

⏱️  SPEED ANALYSIS:
   Simple RAG avg time: 5.2s
   Adapted RAG avg time: 5.4s
   Speed difference: +4.4%

📈 TRAINING DATA INSIGHTS:
   Training pairs used: 200
   Relevant pairs: 47 (23.5%)
   Matrix parameters: 147,456
   Training epochs: 100
   Final training loss: 0.0432

💡 KEY LEARNING INSIGHTS:
------------------------------
✅ Domain adaptation HELPED:
   • Adapted RAG won 15/20 comparisons
   • The matrix learned useful TCS-specific transformations
   • Parameter-efficient adaptation can improve retrieval

🎯 METHODOLOGY LEARNINGS:
   • Successfully implemented parameter-efficient embedding adaptation
   • GPT-4.1 labeling created realistic training data
   • Matrix training converged (loss: 0.2251 → 0.0432)
   • End-to-end eva