You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current documentation for the is_decoder configuration parameter is misleading and causes confusion. The parameter name suggests it determines whether a model is a decoder (like GPT-2, Llama, etc.), but in reality, it specifically controls whether a model functions as a decoder component within an encoder-decoder architecture.
This is confusing because autoregressive models like GPT-2 and Llama 3 have is_decoder=False by default, despite being decoder-only architectures.
Current Behavior
The documentation currently states:
is_decoder (bool, optional, defaults to False) — Whether the model is used as decoder or not (in which case it's used as an encoder).
This description doesn't clarify that this parameter is specifically for encoder-decoder architectures and cross-attention functionality.
The config param is_decoder=True specifically enables cross-attention layers and related functionality for models used as decoders only in encoder-decoder architectures.
Standalone autoregressive models (like GPT-2, Llama) use is_decoder=False by default despite being decoder-only architectures.
Suggested Documentation Update
Update documentation as described in Expected Behavior
Maybe remove is_decoder param from decoder-only and encoder-only transformer architecture config settings?
The text was updated successfully, but these errors were encountered:
Hi @d-kleine, I think we'd definitely support a documentation update PR here! I'm less sure about removing the param - you can experiment with it, but I think that might be breaking for a lot of workflows. Would you be willing to make the documentaiton update PR first, and then we can think about future steps afterwards if you want?
Issue Description
The current documentation for the
is_decoder
configuration parameter is misleading and causes confusion. The parameter name suggests it determines whether a model is a decoder (like GPT-2, Llama, etc.), but in reality, it specifically controls whether a model functions as a decoder component within an encoder-decoder architecture.This is confusing because autoregressive models like GPT-2 and Llama 3 have
is_decoder=False
by default, despite being decoder-only architectures.Current Behavior
The documentation currently states:
This description doesn't clarify that this parameter is specifically for encoder-decoder architectures and cross-attention functionality.
Reproduction
Output:
Expected Behavior
The documentation should clearly explain that:
is_decoder=True
specifically enables cross-attention layers and related functionality for models used as decoders only in encoder-decoder architectures.is_decoder=False
by default despite being decoder-only architectures.Suggested Documentation Update
is_decoder
param from decoder-only and encoder-only transformer architecture config settings?The text was updated successfully, but these errors were encountered: