Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Cache only has 0 layers' during generation after upgrading Transformers from 4.49 to 4.50 #36913

Open
4 tasks
edwardlee4948 opened this issue Mar 24, 2025 · 2 comments

Comments

@edwardlee4948
Copy link

System Info

After updating transformers from version 4.49.0 to 4.50.0, running .generate() on a PEFT model throws a KeyError related to cache layers:

File "//.venv/lib/python3.12/site-packages/peft/peft_model.py", line 1874, in generate outputs = self.base_model.generate(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/unsloth/models/llama.py", line 1579, in unsloth_fast_generate output = self._old_generate(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 2326, in generate result = self._sample( ^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 3286, in _sample outputs = self(**model_inputs, return_dict=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/unsloth/models/llama.py", line 1026, in _CausalLM_fast_forward outputs = fast_forward_inference( ^^^^^^^^^^^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/unsloth/models/llama.py", line 938, in LlamaModel_fast_forward_inference seq_len = past_key_values[0][0].shape[-2] ~~~~~~~~~~~~~~~^^^ File "//.venv/lib/python3.12/site-packages/transformers/cache_utils.py", line 387, in __getitem__ raise KeyError(f"Cache only has {len(self)} layers, attempted to access layer with index {layer_idx}") KeyError: 'Cache only has 0 layers, attempted to access layer with index 0'

Let me know if you want me to cross-check for any known changes in 4.50 related to cache handling.

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

To reproduce the KeyError: Cache only has 0 layers error using Unsloth with transformers==4.50.0, here’s a minimal working example that triggers the issue during .generate()

Expected behavior

It should run .generate() without raising a KeyError, as it did under transformers==4.49.0.

@zucchini-nlp
Copy link
Member

Hey! We would need a minimal reproducer which doesn't rely on Unsloth so we can help you 🤗

cc @gante

@gante
Copy link
Member

gante commented Mar 26, 2025

Possibly related to #37014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants