# ASIDE Command Generation Notebook

This notebook generates SLURM batch scripts for running ASIDE (Architecturally Separated Instruction-Data Embeddings) experiments. It automates the creation of training and evaluation commands for multiple model configurations.

## Overview

The ASIDE method applies orthogonal rotations to data token embeddings while keeping instruction embeddings unchanged, improving instruction-data separation in language models. This notebook generates the necessary SLURM commands to:

1. **Train models** with different embedding configurations (vanilla, ISE, ASIDE)
2. **Evaluate models** on separation and safety benchmarks
3. **Run comprehensive evaluations** including AlpacaEval and prompt injection tests

## Key Embedding Types

- **`single_emb`**: Vanilla model with standard embeddings
- **`ise`**: Instructional Segment Embedding baseline
- **`forward_rot`**: ASIDE method with π/2 orthogonal rotation applied to data tokens

## Usage

1. Configure your environment path in `your_env_name`
2. Run training command generation cells to create hyperparameter sweep scripts
3. Run evaluation command generation to create testing scripts
4. Submit generated `.sh` files to SLURM scheduler

## Output Files

- `{model}_training_1.sh`, `{model}_training_2.sh`: Training job scripts (split for parallel execution)
- `{model}_evals.sh`: Evaluation job scripts


## SLURM Configuration

Define SLURM job templates for training and evaluation tasks. These templates specify resource requirements and environment setup.

In [2]:
# Configure your virtual environment path here
# TODO: Replace '...' with your actual environment path (e.g., '/path/to/your/venv/bin/activate')
your_env_name = '...'

# SLURM template for training jobs
# Requires 8 GPUs, 1TB memory, 72-hour time limit
slurm_prefix_train = f"""#!/bin/bash
#SBATCH --job-name=Training
#SBATCH --output=slurm_outputs/training_%j.txt
#SBATCH --ntasks=1
#SBATCH --cpus-per-task 10
#SBATCH --time=72:00:00
#SBATCH --mem=1024G
###SBATCH --mail-user=...
###SBATCH --mail-type=ALL
#SBATCH --no-requeue
#SBATCH --gres=gpu:1
#SBATCH --partition=gpu100
#SBATCH --export=NONE
#unset SLURM_EXPORT_ENV
module load python/3.12.8
module load cuda/12.4
module load tmux
source ~/.bashrc
source {your_env_name}
export TRANSFORMERS_CACHE='./transformer_cache'
export WANDB_MODE=disabled
"""

# SLURM template for evaluation jobs
# Requires 1 GPU, 196GB memory (less than training)
slurm_prefix_evals= f"""#!/bin/bash
#SBATCH --job-name=TrainingTinyLlama
#SBATCH --output=slurm_outputs/training_%j.txt
#SBATCH --ntasks=1
#SBATCH --cpus-per-task 10
#SBATCH --time=72:00:00
#SBATCH --mem=196G
###SBATCH --mail-user=...
###SBATCH --mail-type=ALL
#SBATCH --no-requeue
#SBATCH --gres=gpu:1
#SBATCH --partition=gpu100
#SBATCH --export=NONE
#unset SLURM_EXPORT_ENV
module load python/3.12.8
module load cuda/12.4
module load tmux
source ~/.bashrc
source {your_env_name}
export TRANSFORMERS_CACHE='./transformer_cache'
export WANDB_MODE=disabled
"""

## Example: Qwen 2.5 7B Training Commands

Generate hyperparameter sweep commands for Qwen 2.5 7B model training. This creates a comprehensive grid search across:

- **Embedding types**: ASIDE (`forward_rot`), vanilla (`single_emb`), ISE baseline (`ise`)
- **Learning rates**: 1e-6, 5e-6, 1e-5, 2e-5
- **Batch configurations**: Different batch sizes and gradient accumulation steps
- **Warmup ratios**: 0 (no warmup) and 0.1 (10% warmup)

### Key ASIDE Parameters

- `--rotation_alpha 1.57079633`: π/2 radians (90-degree rotation)
- `--embedding_init rot_isoclinic`: Isoclinic rotation initialization
- `--learned_rotation False`: Fixed rotation (not learned during training)
- `--rotation_direction right`: Direction of rotation matrix application

In [2]:
import itertools
import random

# Base command template for distributed training
base_command = "srun --export=ALL deepspeed --num_gpus=8 --master_port=29509 fine-tune.py"

# Hyperparameter grid for Mistral 7B experiments
# Creates Cartesian product of all parameter combinations
params = {
    "--model_family": ["qwen_2.5_7b"],
    "--train_version": ["SFTv70"], # Training dataset version
    "--emb_type": ["forward_rot","single_emb", "ise"], # ASIDE, vanilla, ISE baseline
    "--model_ix": ["0"], # Model index for identification
    "--run_number": [None], # Auto-assigned sequential run number
    "--train_type": ["full"], # Full fine-tuning (not LoRA)
    "--num_train_epochs": ["3"], # Training epochs
    # Combined batch size and gradient accumulation parameters
    "batch_and_accum": ["--per_device_train_batch_size 2 --gradient_accumulation_steps 4",
                       "--per_device_train_batch_size 4 --gradient_accumulation_steps 8"],
    "--learning_rate": ["1e-6", "5e-6", "1e-5", "2e-5"], # Learning rate sweep
    "--lr_scheduler_type": ["cosine"], # Cosine annealing scheduler
    "--warmup_ratio": ["0","0.1"], # No warmup vs 10% warmup
    "--logging_steps": ["10"],
    "--evaluation_strategy": ["epoch"],
    "--save_strategy": ["epoch"],
    "--eval_steps": ["1"],
    "--save_steps": ["1"],
    "--save_total_limit": ["1"],
    "--load_best_model_at_end": ["True"],
    "--prediction_loss_only": ["True"],
    "--bf16": ["True"], # Mixed precision training
    # ASIDE-specific parameters
    "--embedding_init": ["rot_isoclinic"], # Isoclinic rotation initialization
    "--rotation_alpha": ["1.57079633"], # π/2 rotation angle
    "--learned_rotation": ["False"], # Fixed rotation matrix
    "--add_linear_shift": ["False"], # No additional linear transformation
    "--rotation_direction": ["right"], # Right multiplication direction
    "--gradual_rotation": ["False"] # Apply rotation immediately, not gradually
}

# Generate all parameter combinations
keys = list(params.keys())
values = list(params.values())

commands = []
command_num = 0

for combo in itertools.product(*values):
    command = base_command
    for key, value in zip(keys, combo):
        if key == "batch_and_accum":
            # Special handling for combined batch/accumulation parameters
            command += " " + value
        elif key == "--run_number":
            # Auto-assign sequential run numbers
            command += f" {key} {command_num}"
        else:
            command += f" {key} {value}"
    command_num += 1
    commands.append(command + "\n")

# Split commands into two files for parallel execution
with open("qwen_2.5_7b.sh", "w") as file:
    file.write("\n".join([slurm_prefix_train] + commands)


print(f"{len(commands)} commands have been written")

48 commands have been written


## Llama 2 13B Training Commands

Generate training commands for Llama 2 13B model. Uses similar hyperparameter grid as Mistral but with:

- **Different batch configuration**: Adjusted for 13B model size
- **Model-specific settings**: Different train version (SFTv110) and model index

The larger model requires more conservative batch sizes to fit in GPU memory.

In [None]:
import itertools
import random

# Base command for Llama 2 13B training
base_command = "srun --export=ALL deepspeed --num_gpus=8 --master_port=29509 fine-tune.py"

# Hyperparameter configuration for Llama 2 13B
params = {
    "--model_family": ["llama_2_13b"],
    "--train_version": ["SFTv110"], # Different training dataset version
    "--emb_type": ["forward_rot","single_emb", "ise"], # Same embedding types
    "--model_ix": ["1"], # Different model index
    "--run_number": [None],
    "--train_type": ["full"],
    "--num_train_epochs": ["3"],
    # Adjusted batch sizes for larger 13B model
    "batch_and_accum": ["--per_device_train_batch_size 2 --gradient_accumulation_steps 4",
                       "--per_device_train_batch_size 2 --gradient_accumulation_steps 8"],
    "--learning_rate": ["1e-6", "5e-6", "1e-5", "2e-5"],
    "--lr_scheduler_type": ["cosine"],
    "--warmup_ratio": ["0","0.1"],
    "--logging_steps": ["10"],
    "--evaluation_strategy": ["epoch"],
    "--save_strategy": ["epoch"],
    "--eval_steps": ["1"],
    "--save_steps": ["1"],
    "--save_total_limit": ["1"],
    "--load_best_model_at_end": ["True"],
    "--prediction_loss_only": ["True"],
    "--bf16": ["True"],
    # Same ASIDE parameters as Mistral
    "--embedding_init": ["rot_isoclinic"],
    "--rotation_alpha": ["1.57079633"],
    "--learned_rotation": ["False"],
    "--add_linear_shift": ["False"],
    "--rotation_direction": ["right"],
    "--gradual_rotation": ["False"]
}

keys = list(params.keys())
values = list(params.values())

commands = []
command_num = 0

for combo in itertools.product(*values):
    command = base_command
    for key, value in zip(keys, combo):
        if key == "batch_and_accum":
            command += " " + value
        elif key == "--run_number":
            command += f" {key} {command_num}"
        else:
            command += f" {key} {value}"
    command_num += 1
    commands.append(command + "\n")

# Write split training scripts
with open("llama_2_13b_training_1.sh", "w") as file:
    file.write("\n".join([slurm_prefix_train] + commands[:len(commands)//2]))
with open("llama_2_13b_training_2.sh", "w") as file:
    file.write("\n".join([slurm_prefix_train] + commands[len(commands)//2:]))

print(f"{len(commands)} commands have been written")

48 commands have been written


## Evaluation Command Generation

Generate comprehensive evaluation commands for trained models. The evaluation pipeline includes:

1. **SEP Dataset Evaluation**: Instruction-data separation scoring
2. **AlpacaEval**: General capability assessment 
3. **Structured Query (StruQ)**: Prompt injection robustness testing

### Evaluation Pipeline Overview

Each model goes through four evaluation stages:
1. `get_model_outputs.py`: Extract model outputs for SEP dataset
2. `get_alpaca_outputs.py`: Generate outputs for AlpacaEval
3. `test_on_struq.py`: Test prompt injection robustness
4. `alpaca_eval`: Compute final AlpacaEval scores

### Model Mapping Format

The mapping dictionary specifies which trained models to evaluate:
```python
{
    "embedding_type": ("model_directory_name", "run_number"),
    ...
}
```

In [3]:
def generate_commands(mapping, model_name, sft):
    """
    Generate comprehensive evaluation commands for ASIDE experiments.
    
    This function creates a complete evaluation pipeline for trained models,
    including SEP dataset evaluation, AlpacaEval, and prompt injection testing.
    
    Args:
        mapping (dict): Dictionary mapping embedding types to model configurations.
            Format: {
                embedding_type_1: (model_type_1, run_number_1),
                embedding_type_2: (model_type_2, run_number_2), 
                ...
            }
            
        model_name (str): Base model name (e.g., "llama_3.1_8b", "mistral_7b")
        
        sft (str): Supervised fine-tuning version identifier (e.g., "SFTv110")
    
    Returns:
        list: List of command strings ready for SLURM execution
        
    Example:
        >>> mapping = {
        ...     "single_emb": ("pretrained_vanilla", "20"),
        ...     "forward_rot": ("forward_rot", "15")
        ... }
        >>> commands = generate_commands(mapping, "llama_2_13b", "SFTv110")
    """
    commands = []
    
    # ------------------------------------------------------------------
    # 1) SEP Dataset Evaluation
    #    Evaluates instruction-data separation using the SEP benchmark
    #    This is the core metric for ASIDE effectiveness
    # ------------------------------------------------------------------
    port = 29700
    for i, (embedding_type, (actual_model_type, run_number)) in enumerate(mapping.items()):
        port = port + i + 1  # Avoid port conflicts in parallel execution
        cmd = (
            f"srun --export=ALL torchrun --nproc_per_node=1 --master_port={port} "
            f"get_model_outputs.py {embedding_type} {model_name} 1 {sft} {actual_model_type} {run_number}"
        )
        commands.append(cmd)

    # ------------------------------------------------------------------
    # 2) AlpacaEval Output Generation
    #    Generates model outputs for general capability assessment
    #    Uses different datasets based on embedding type for historical reasons
    # ------------------------------------------------------------------
    for i, (embedding_type, (actual_model_type, run_number)) in enumerate(mapping.items()):
        port = port + i + 1

        # Dataset selection logic (legacy from original experiments)
        if embedding_type == "single_emb":
            data_path = "data/tatsu-lab/alpaca_eval/eval.json"
            use_input = False  # Vanilla models don't use input separation
        else:
            data_path = "data/tatsu-lab/alpaca_farm/eval.json" 
            use_input = True   # ASIDE/ISE models use input separation
            
        cmd = (
            f"srun --chdir=evals --export=ALL torchrun --nproc_per_node=1 --master_port={port} "
            f"get_alpaca_outputs.py --data-path {data_path} {'--use-input True' if use_input else ''}"
            f"--model ../models/{model_name}/{actual_model_type}/train_checkpoints/{sft}/from_base_run_{run_number}/last/ "
            f"--embedding-type {embedding_type} --batch-size 32"
        )
        commands.append(cmd)

    # ------------------------------------------------------------------
    # 3) Structured Query (StruQ) Prompt Injection Testing
    #    Tests robustness against various prompt injection attacks
    #    Core safety evaluation for ASIDE method
    # ------------------------------------------------------------------
    for i, (embedding_type, (actual_model_type, run_number)) in enumerate(mapping.items()):
        port = port + i + 1
        
        cmd = (
            f"srun --chdir=struq --export=ALL torchrun --nproc_per_node=1 --master_port={port} "
            f"test_on_struq.py --domain all --attack all "
            f"--model ../models/{model_name}/{actual_model_type}/train_checkpoints/{sft}/from_base_run_{run_number}/last/ "
            f"--embedding_type {embedding_type} --batch_size 32"
        )
        commands.append(cmd)
        
    # ------------------------------------------------------------------
    # 4) AlpacaEval Score Computation
    #    Computes final capability scores using AlpacaEval framework
    #    Uses model outputs generated in step 2
    # ------------------------------------------------------------------
    for embedding_type, (actual_model_type, run_number) in mapping.items():
        # Directory selection matches step 2 logic
        if embedding_type == "single_emb":
            directory = "alpaca_eval"
        else:
            directory = "alpaca_farm"

        # Build path to generated output JSON file
        # Format follows the pattern from get_alpaca_outputs.py
        json_path = (
            f"./data/tatsu-lab/{directory}/"
            f"{model_name}_{actual_model_type}_train_checkpoints_{sft}_from_base_run_{run_number}_last__l-1_s42.json"
        )

        cmd = (
            f"IS_ALPACA_EVAL_2=False alpaca_eval --model_outputs {json_path}"
        )
        commands.append(cmd)

    return commands


# Example configuration for Llama 2 13B evaluation
# Maps embedding types to their corresponding trained model directories and run numbers
mapping = {
    "single_emb": ("pretrained_vanilla", "20"),    # Vanilla baseline
    "ise": ("ise", "36"),                         # ISE baseline
    "forward_rot": ("forward_rot", "15"),          # ASIDE method
}
model_name = "llama_2_13b"
sft = "SFTv110"

# Generate evaluation commands
all_commands = generate_commands(mapping, model_name, sft)

# Write evaluation script
with open("llama_2_13b_evals.sh", "w") as file:
    file.write("\n".join([slurm_prefix_evals] + all_commands))

print(f"{len(all_commands)} commands have been written")

12 commands have been written


## Usage Instructions

### 1. Training Phase
```bash
# Submit training jobs
sbatch mistral_7b_training_1.sh
sbatch mistral_7b_training_2.sh  
sbatch llama_2_13b_training_1.sh
sbatch llama_2_13b_training_2.sh
```

### 2. Evaluation Phase
```bash
# After training completes, submit evaluation commands from 
llama_2_13b_evals.sh
# (Add other model evaluation scripts as generated)
```

### 3. Monitoring
- Check SLURM output logs in `slurm_outputs/` directory
- Monitor training progress and resource usage
- Evaluation results will be saved in respective output directories

### 4. Customization
To adapt for different models or experiments:
1. Modify parameter grids in the `params` dictionaries
2. Update model names and training versions
3. Adjust resource requirements in SLURM templates
4. Modify evaluation mappings for different trained models