# Personalization with Synthetic Pairs: Multi-Trait Steering via CLI

This notebook demonstrates how to use the **Wisent CLI** to:
1. **Generate synthetic contrastive pairs** for personality traits ("evil" and "Italian")
2. **Create steering vectors** from these pairs using `generate-vector-from-synthetic`
3. **Combine multiple steering vectors** using `multi-steer`
4. **Generate steered responses** with combined personality traits

The result: A model that responds with both an "evil villain" personality AND an Italian accent/style!

## CLI Commands Used:
- `generate-vector-from-synthetic`: Generate steering vectors from trait descriptions
- `multi-steer`: Combine multiple steering vectors during generation

## 1. Setup and Configuration

In [None]:
import os
import json
from pathlib import Path

# Configuration
MODEL = "meta-llama/Llama-3.2-1B-Instruct"
NUM_PAIRS = 20
TARGET_LAYER = 8  # Middle layer for personality steering
OUTPUT_DIR = Path("./personalization_outputs")

# Create output directories
OUTPUT_DIR.mkdir(exist_ok=True)
(OUTPUT_DIR / "vectors").mkdir(exist_ok=True)

print(f"Model: {MODEL}")
print(f"Target layer: {TARGET_LAYER}")
print(f"Output directory: {OUTPUT_DIR.absolute()}")

## 2. Generate Steering Vector for "Evil Villain" Trait

Using `generate-vector-from-synthetic` to create contrastive pairs and steering vectors for an evil villain personality.

In [None]:
# Evil villain trait description
EVIL_TRAIT = "evil villain personality with dramatic monologues, world domination schemes, menacing laughter, and megalomaniacal tendencies"

print(f"Evil trait: {EVIL_TRAIT}")

In [None]:
# Generate evil villain steering vector
!python -m wisent.core.main generate-vector-from-synthetic \
    --trait "{EVIL_TRAIT}" \
    --output {OUTPUT_DIR}/vectors/evil_vector.json \
    --model {MODEL} \
    --num-pairs {NUM_PAIRS} \
    --layers {TARGET_LAYER} \
    --token-aggregation average \
    --method caa \
    --normalize \
    --keep-intermediate \
    --verbose \
    --timing

## 3. Generate Steering Vector for "Italian" Trait

In [None]:
# Italian personality trait description
ITALIAN_TRAIT = "passionate Italian personality with expressive hand gestures, Italian expressions like mamma mia and capisce, love for food and family, and dramatic emotional responses"

print(f"Italian trait: {ITALIAN_TRAIT}")

In [None]:
# Generate Italian personality steering vector
!python -m wisent.core.main generate-vector-from-synthetic \
    --trait "{ITALIAN_TRAIT}" \
    --output {OUTPUT_DIR}/vectors/italian_vector.json \
    --model {MODEL} \
    --num-pairs {NUM_PAIRS} \
    --layers {TARGET_LAYER} \
    --token-aggregation average \
    --method caa \
    --normalize \
    --keep-intermediate \
    --verbose \
    --timing

## 4. Verify Generated Vectors

In [None]:
import numpy as np

def inspect_vector(vector_path):
    """Inspect a generated steering vector."""
    with open(vector_path, 'r') as f:
        data = json.load(f)
    
    print(f"\nVector: {vector_path.name}")
    print(f"  Model: {data.get('model', 'N/A')}")
    print(f"  Trait: {data.get('trait_label', data.get('trait', 'N/A'))[:60]}...")
    print(f"  Layers: {list(data.get('steering_vectors', {}).keys())}")
    
    for layer, vector in data.get('steering_vectors', {}).items():
        v = np.array(vector)
        print(f"  Layer {layer}: dim={len(v)}, norm={np.linalg.norm(v):.4f}")
    
    return data

# Inspect both vectors
evil_vector_path = OUTPUT_DIR / "vectors" / "evil_vector.json"
italian_vector_path = OUTPUT_DIR / "vectors" / "italian_vector.json"

if evil_vector_path.exists():
    evil_data = inspect_vector(evil_vector_path)
else:
    print(f"Evil vector not found at {evil_vector_path}")

if italian_vector_path.exists():
    italian_data = inspect_vector(italian_vector_path)
else:
    print(f"Italian vector not found at {italian_vector_path}")

## 5. Convert JSON Vectors to .pt Format for Multi-Steer

The `multi-steer` command expects `.pt` (PyTorch) format vectors.

In [None]:
import torch

def convert_json_to_pt(json_path, pt_path, target_layer):
    """
    Convert JSON steering vector to .pt format for multi-steer.
    """
    with open(json_path, 'r') as f:
        data = json.load(f)
    
    layer_str = str(target_layer)
    if layer_str not in data['steering_vectors']:
        available = list(data['steering_vectors'].keys())
        raise ValueError(f"Layer {target_layer} not found. Available: {available}")
    
    vector = torch.tensor(data['steering_vectors'][layer_str], dtype=torch.float32)
    
    # Save in format expected by multi-steer
    torch.save({
        'steering_vector': vector,
        'layer_index': target_layer,
        'model': data.get('model', ''),
        'trait': data.get('trait_label', data.get('trait', '')),
    }, pt_path)
    
    print(f"Converted {json_path.name} -> {pt_path.name}")
    print(f"  Layer: {target_layer}, Dim: {vector.shape[0]}")
    return pt_path

# Convert both vectors
evil_pt_path = OUTPUT_DIR / "vectors" / "evil_vector.pt"
italian_pt_path = OUTPUT_DIR / "vectors" / "italian_vector.pt"

if evil_vector_path.exists():
    convert_json_to_pt(evil_vector_path, evil_pt_path, TARGET_LAYER)

if italian_vector_path.exists():
    convert_json_to_pt(italian_vector_path, italian_pt_path, TARGET_LAYER)

## 6. Multi-Steer: Combine Evil + Italian with Equal Weights

Use the `multi-steer` CLI command to combine both steering vectors and generate responses.

In [None]:
# Test prompts for the combined personality
test_prompts = [
    "What's your favorite food?",
    "How do you spend your weekends?",
    "What motivates you in life?",
    "How do you handle setbacks?",
]

print(f"Test prompts: {len(test_prompts)}")

In [None]:
# Multi-steer with equal weights (0.5 each)
print("="*60)
print("Multi-Steer: Evil + Italian (Equal Weights)")
print("="*60)

!python -m wisent.core.main multi-steer \
    --vector {evil_pt_path}:0.5 \
    --vector {italian_pt_path}:0.5 \
    --model {MODEL} \
    --layer {TARGET_LAYER} \
    --method CAA \
    --prompt "What's your favorite food?" \
    --max-new-tokens 150 \
    --normalize-weights \
    --verbose

## 7. Run Multiple Prompts

In [None]:
# Run all test prompts with equal weights
print("="*80)
print("MULTI-STEER: Evil + Italian (Equal Weights)")
print("="*80)

for prompt in test_prompts:
    print(f"\n{'='*60}")
    print(f"PROMPT: {prompt}")
    print("="*60)
    
    !python -m wisent.core.main multi-steer \
        --vector {evil_pt_path}:0.5 \
        --vector {italian_pt_path}:0.5 \
        --model {MODEL} \
        --layer {TARGET_LAYER} \
        --method CAA \
        --prompt "{prompt}" \
        --max-new-tokens 150 \
        --normalize-weights 2>/dev/null | tail -15

## 8. Compare: Baseline vs Steered Responses

Generate unsteered baseline responses for comparison.

In [None]:
# Baseline (unsteered) response
print("="*60)
print("BASELINE (No Steering)")
print("="*60)

!python -m wisent.core.main multi-steer \
    --model {MODEL} \
    --prompt "What's your favorite food?" \
    --max-new-tokens 150 \
    --verbose

## 9. Experiment with Different Weight Combinations

In [None]:
# Different weight combinations to try
weight_experiments = [
    (0.8, 0.2, "More Evil, Less Italian"),
    (0.2, 0.8, "Less Evil, More Italian"),
    (1.0, 1.0, "Strong Both"),
    (0.3, 0.3, "Subtle Both"),
]

test_prompt = "How do you feel about teamwork?"

print(f"Testing weight combinations on: '{test_prompt}'")
print("="*80)

for evil_w, italian_w, description in weight_experiments:
    print(f"\n{'='*60}")
    print(f"{description} (evil={evil_w}, italian={italian_w})")
    print("="*60)
    
    !python -m wisent.core.main multi-steer \
        --vector {evil_pt_path}:{evil_w} \
        --vector {italian_pt_path}:{italian_w} \
        --model {MODEL} \
        --layer {TARGET_LAYER} \
        --method CAA \
        --prompt "{test_prompt}" \
        --max-new-tokens 150 \
        --normalize-weights 2>/dev/null | tail -10

## 10. Save Combined Vector for Reuse

In [None]:
# Save combined vector for later use
print("Saving combined evil+italian vector...")

!python -m wisent.core.main multi-steer \
    --vector {evil_pt_path}:0.5 \
    --vector {italian_pt_path}:0.5 \
    --model {MODEL} \
    --layer {TARGET_LAYER} \
    --method CAA \
    --prompt "Test prompt" \
    --max-new-tokens 10 \
    --normalize-weights \
    --save-combined {OUTPUT_DIR}/vectors/evil_italian_combined.pt \
    --verbose

print(f"\nCombined vector saved to {OUTPUT_DIR}/vectors/evil_italian_combined.pt")

In [None]:
# Use the pre-combined vector (single vector, no combination needed)
combined_vector_path = OUTPUT_DIR / "vectors" / "evil_italian_combined.pt"

if combined_vector_path.exists():
    print("Using pre-combined evil+italian vector...")
    
    !python -m wisent.core.main multi-steer \
        --vector {combined_vector_path}:1.0 \
        --model {MODEL} \
        --layer {TARGET_LAYER} \
        --method CAA \
        --prompt "What's your opinion on world peace?" \
        --max-new-tokens 150 \
        --verbose
else:
    print(f"Combined vector not found at {combined_vector_path}")

## 11. Summary: CLI Commands Reference

### Generate Steering Vector for a Trait
```bash
python -m wisent.core.main generate-vector-from-synthetic \
    --trait "evil villain personality with dramatic monologues..." \
    --output ./vectors/evil_vector.json \
    --model meta-llama/Llama-3.2-1B-Instruct \
    --num-pairs 20 \
    --layers 8 \
    --normalize \
    --verbose
```

### Multi-Steer with Equal Weights
```bash
python -m wisent.core.main multi-steer \
    --vector ./vectors/evil_vector.pt:0.5 \
    --vector ./vectors/italian_vector.pt:0.5 \
    --model meta-llama/Llama-3.2-1B-Instruct \
    --layer 8 \
    --method CAA \
    --prompt "What's your favorite food?" \
    --max-new-tokens 150 \
    --normalize-weights \
    --verbose
```

### Multi-Steer with Custom Weights (More Evil)
```bash
python -m wisent.core.main multi-steer \
    --vector ./vectors/evil_vector.pt:0.8 \
    --vector ./vectors/italian_vector.pt:0.2 \
    --model meta-llama/Llama-3.2-1B-Instruct \
    --layer 8 \
    --method CAA \
    --prompt "How do you handle setbacks?" \
    --max-new-tokens 150 \
    --normalize-weights
```

### Save Combined Vector for Reuse
```bash
python -m wisent.core.main multi-steer \
    --vector ./vectors/evil_vector.pt:0.5 \
    --vector ./vectors/italian_vector.pt:0.5 \
    --model meta-llama/Llama-3.2-1B-Instruct \
    --layer 8 \
    --prompt "Test" \
    --normalize-weights \
    --save-combined ./vectors/evil_italian_combined.pt
```

### Generate Baseline (Unsteered)
```bash
python -m wisent.core.main multi-steer \
    --model meta-llama/Llama-3.2-1B-Instruct \
    --prompt "What's your favorite food?" \
    --max-new-tokens 150
```

### Key Parameters:
- **`--trait`**: Natural language description of the personality trait
- **`--layers`**: Target layer(s) for activation collection
- **`--vector PATH:WEIGHT`**: Specify vectors and their weights
- **`--normalize-weights`**: Ensure weights are properly scaled
- **`--save-combined`**: Save the combined vector for later use

### Tips:
- Middle layers (layer 8 for 16-layer models) work well for personality steering
- Use descriptive trait descriptions for better pair generation
- Experiment with different weight ratios to balance traits
- Save combined vectors to avoid recomputation