Misleading documentation for `is_decoder` configuration parameter #36482

d-kleine · 2025-02-28T19:17:23Z

Issue Description

The current documentation for the is_decoder configuration parameter is misleading and causes confusion. The parameter name suggests it determines whether a model is a decoder (like GPT-2, Llama, etc.), but in reality, it specifically controls whether a model functions as a decoder component within an encoder-decoder architecture.

This is confusing because autoregressive models like GPT-2 and Llama 3 have is_decoder=False by default, despite being decoder-only architectures.

Current Behavior

The documentation currently states:

is_decoder (bool, optional, defaults to False) — Whether the model is used as decoder or not (in which case it's used as an encoder).

This description doesn't clarify that this parameter is specifically for encoder-decoder architectures and cross-attention functionality.

Reproduction

import torch
from transformers import AutoTokenizer, AutoModelForTokenClassification

seed = 42
torch.manual_seed(seed)

tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

model = AutoModelForTokenClassification.from_pretrained(
    "openai-community/gpt2",
    pad_token_id=tokenizer.eos_token_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

print("Default is_decoder:", model.config.is_decoder)

Output:

Default is_decoder: False

Expected Behavior

The documentation should clearly explain that:

The config param is_decoder=True specifically enables cross-attention layers and related functionality for models used as decoders only in encoder-decoder architectures.
Standalone autoregressive models (like GPT-2, Llama) use is_decoder=False by default despite being decoder-only architectures.

Suggested Documentation Update

Update documentation as described in Expected Behavior
Maybe remove is_decoder param from decoder-only and encoder-only transformer architecture config settings?

The text was updated successfully, but these errors were encountered:

Rocketknight1 · 2025-03-03T15:18:51Z

Hi @d-kleine, I think we'd definitely support a documentation update PR here! I'm less sure about removing the param - you can experiment with it, but I think that might be breaking for a lot of workflows. Would you be willing to make the documentaiton update PR first, and then we can think about future steps afterwards if you want?

d-kleine mentioned this issue Mar 14, 2025

doc: Clarify is_decoder usage in PretrainedConfig documentation #36724

Merged

5 tasks

stevhliu closed this as completed in #36724 Mar 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misleading documentation for `is_decoder` configuration parameter #36482

Misleading documentation for `is_decoder` configuration parameter #36482

d-kleine commented Feb 28, 2025

Rocketknight1 commented Mar 3, 2025

Misleading documentation for is_decoder configuration parameter #36482

Misleading documentation for is_decoder configuration parameter #36482

Comments

d-kleine commented Feb 28, 2025

Issue Description

Current Behavior

Reproduction

Expected Behavior

Suggested Documentation Update

Rocketknight1 commented Mar 3, 2025

Misleading documentation for `is_decoder` configuration parameter #36482

Misleading documentation for `is_decoder` configuration parameter #36482