Skip to content

'Cache only has 0 layers' during generation after upgrading Transformers from 4.49 to 4.50 #36913

Closed
@edwardlee4948

Description

@edwardlee4948

System Info

After updating transformers from version 4.49.0 to 4.50.0, running .generate() on a PEFT model throws a KeyError related to cache layers:

File "//.venv/lib/python3.12/site-packages/peft/peft_model.py", line 1874, in generate outputs = self.base_model.generate(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/unsloth/models/llama.py", line 1579, in unsloth_fast_generate output = self._old_generate(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 2326, in generate result = self._sample( ^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 3286, in _sample outputs = self(**model_inputs, return_dict=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/unsloth/models/llama.py", line 1026, in _CausalLM_fast_forward outputs = fast_forward_inference( ^^^^^^^^^^^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/unsloth/models/llama.py", line 938, in LlamaModel_fast_forward_inference seq_len = past_key_values[0][0].shape[-2] ~~~~~~~~~~~~~~~^^^ File "//.venv/lib/python3.12/site-packages/transformers/cache_utils.py", line 387, in __getitem__ raise KeyError(f"Cache only has {len(self)} layers, attempted to access layer with index {layer_idx}") KeyError: 'Cache only has 0 layers, attempted to access layer with index 0'

Let me know if you want me to cross-check for any known changes in 4.50 related to cache handling.

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

To reproduce the KeyError: Cache only has 0 layers error using Unsloth with transformers==4.50.0, here’s a minimal working example that triggers the issue during .generate()

Expected behavior

It should run .generate() without raising a KeyError, as it did under transformers==4.49.0.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions