# Instruction Formatting

**How to structure prompts with chat templates**

## Why Formatting Matters

Models need a consistent format to understand where instructions end and responses begin. Without proper formatting:

- The model doesn't know when to stop generating
- It may confuse instructions with responses
- Multi-turn conversations become impossible

**Chat templates** solve this by adding special tokens and structure around messages.

## Common Chat Formats

### Alpaca Format
```
Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Response:
{response}
```

### ChatML Format (OpenAI)
```
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
{instruction}<|im_end|>
<|im_start|>assistant
{response}<|im_end|>
```

### Llama 2 Format
```
<s>[INST] <<SYS>>
You are a helpful assistant.
<</SYS>>

{instruction} [/INST] {response} </s>
```

In [1]:
# Implementation: Alpaca-style instruction formatting

ALPACA_TEMPLATE = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Response:
{response}"""

ALPACA_TEMPLATE_WITH_INPUT = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Input:
{input}

### Response:
{response}"""

def format_alpaca(instruction: str, response: str = "", input_text: str = "") -> str:
    """Format instruction in Alpaca style."""
    if input_text:
        return ALPACA_TEMPLATE_WITH_INPUT.format(
            instruction=instruction,
            input=input_text,
            response=response
        )
    return ALPACA_TEMPLATE.format(
        instruction=instruction,
        response=response
    )

# Example
formatted = format_alpaca(
    instruction="Explain quantum computing in simple terms.",
    response="Quantum computing uses quantum mechanics to process information..."
)
print(formatted)

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Explain quantum computing in simple terms.

### Response:
Quantum computing uses quantum mechanics to process information...


In [2]:
# Implementation: ChatML-style formatting

def format_chatml(
    instruction: str,
    response: str = "",
    system: str = "You are a helpful assistant."
) -> str:
    """Format instruction in ChatML style."""
    formatted = f"<|im_start|>system\n{system}<|im_end|>\n"
    formatted += f"<|im_start|>user\n{instruction}<|im_end|>\n"
    formatted += f"<|im_start|>assistant\n{response}"
    if response:
        formatted += "<|im_end|>"
    return formatted

# Example
formatted = format_chatml(
    instruction="What is the capital of France?",
    response="The capital of France is Paris."
)
print(formatted)

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
What is the capital of France?<|im_end|>
<|im_start|>assistant
The capital of France is Paris.<|im_end|>


## Using HuggingFace Chat Templates

Modern tokenizers have built-in chat template support:

In [3]:
from transformers import AutoTokenizer

# Load a tokenizer with chat template support
tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-small")

# Create conversation
messages = [
    {"role": "user", "content": "Hello, how are you?"},
    {"role": "assistant", "content": "I'm doing well, thank you! How can I help you today?"},
    {"role": "user", "content": "Can you explain machine learning?"}
]

# Check if tokenizer has chat template
if hasattr(tokenizer, 'chat_template') and tokenizer.chat_template:
    formatted = tokenizer.apply_chat_template(messages, tokenize=False)
    print("Using built-in chat template:")
    print(formatted)
else:
    print("This tokenizer doesn't have a chat template.")
    print("We'll use a custom format instead.")

tokenizer_config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

Using built-in chat template:
Hello, how are you?<|endoftext|>I'm doing well, thank you! How can I help you today?<|endoftext|>Can you explain machine learning?<|endoftext|>


## Finding Response Start Position

For loss masking, we need to know where the response begins:

In [4]:
def find_response_start(formatted_text: str, response_marker: str = "### Response:\n") -> int:
    """Find the character position where the response starts."""
    idx = formatted_text.find(response_marker)
    if idx == -1:
        raise ValueError(f"Response marker '{response_marker}' not found")
    return idx + len(response_marker)

def find_response_start_tokens(tokenizer, formatted_text: str, response_marker: str = "### Response:\n"):
    """Find the token position where the response starts."""
    # Tokenize the full text
    full_tokens = tokenizer.encode(formatted_text, add_special_tokens=False)
    
    # Find character position
    char_pos = find_response_start(formatted_text, response_marker)
    
    # Tokenize just the prompt part
    prompt_text = formatted_text[:char_pos]
    prompt_tokens = tokenizer.encode(prompt_text, add_special_tokens=False)
    
    return len(prompt_tokens)

# Example
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("gpt2")

text = format_alpaca(
    instruction="What is 2+2?",
    response="2+2 equals 4."
)

response_start = find_response_start_tokens(tokenizer, text)
print(f"Response starts at token position: {response_start}")

tokens = tokenizer.encode(text)
print(f"Total tokens: {len(tokens)}")
print(f"Prompt tokens: {response_start}")
print(f"Response tokens: {len(tokens) - response_start}")

Response starts at token position: 36
Total tokens: 42
Prompt tokens: 36
Response tokens: 6


## Best Practices

1. **Be consistent** — Use the same format for training and inference
2. **Add special tokens** — Help the model recognize boundaries
3. **Handle edge cases** — Empty inputs, very long texts, special characters
4. **Document your format** — Others need to use the same template

## Next Steps

Now that we understand instruction formatting, let's learn about loss masking — why we only compute loss on response tokens.