Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misleading documentation for is_decoder configuration parameter #36482

Closed
d-kleine opened this issue Feb 28, 2025 · 1 comment · Fixed by #36724
Closed

Misleading documentation for is_decoder configuration parameter #36482

d-kleine opened this issue Feb 28, 2025 · 1 comment · Fixed by #36724

Comments

@d-kleine
Copy link
Contributor

Issue Description

The current documentation for the is_decoder configuration parameter is misleading and causes confusion. The parameter name suggests it determines whether a model is a decoder (like GPT-2, Llama, etc.), but in reality, it specifically controls whether a model functions as a decoder component within an encoder-decoder architecture.

This is confusing because autoregressive models like GPT-2 and Llama 3 have is_decoder=False by default, despite being decoder-only architectures.

Current Behavior

The documentation currently states:

is_decoder (bool, optional, defaults to False) — Whether the model is used as decoder or not (in which case it's used as an encoder).

This description doesn't clarify that this parameter is specifically for encoder-decoder architectures and cross-attention functionality.

Reproduction

import torch
from transformers import AutoTokenizer, AutoModelForTokenClassification

seed = 42
torch.manual_seed(seed)

tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

model = AutoModelForTokenClassification.from_pretrained(
    "openai-community/gpt2",
    pad_token_id=tokenizer.eos_token_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

print("Default is_decoder:", model.config.is_decoder)

Output:

Default is_decoder: False

Expected Behavior

The documentation should clearly explain that:

  1. The config param is_decoder=True specifically enables cross-attention layers and related functionality for models used as decoders only in encoder-decoder architectures.
  2. Standalone autoregressive models (like GPT-2, Llama) use is_decoder=False by default despite being decoder-only architectures.

Suggested Documentation Update

  • Update documentation as described in Expected Behavior
  • Maybe remove is_decoder param from decoder-only and encoder-only transformer architecture config settings?
@Rocketknight1
Copy link
Member

Hi @d-kleine, I think we'd definitely support a documentation update PR here! I'm less sure about removing the param - you can experiment with it, but I think that might be breaking for a lot of workflows. Would you be willing to make the documentaiton update PR first, and then we can think about future steps afterwards if you want?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants