Description
System Info
After updating transformers from version 4.49.0 to 4.50.0, running .generate() on a PEFT model throws a KeyError related to cache layers:
File "//.venv/lib/python3.12/site-packages/peft/peft_model.py", line 1874, in generate outputs = self.base_model.generate(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/unsloth/models/llama.py", line 1579, in unsloth_fast_generate output = self._old_generate(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 2326, in generate result = self._sample( ^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 3286, in _sample outputs = self(**model_inputs, return_dict=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/unsloth/models/llama.py", line 1026, in _CausalLM_fast_forward outputs = fast_forward_inference( ^^^^^^^^^^^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/unsloth/models/llama.py", line 938, in LlamaModel_fast_forward_inference seq_len = past_key_values[0][0].shape[-2] ~~~~~~~~~~~~~~~^^^ File "//.venv/lib/python3.12/site-packages/transformers/cache_utils.py", line 387, in __getitem__ raise KeyError(f"Cache only has {len(self)} layers, attempted to access layer with index {layer_idx}") KeyError: 'Cache only has 0 layers, attempted to access layer with index 0'
Let me know if you want me to cross-check for any known changes in 4.50 related to cache handling.
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
To reproduce the KeyError: Cache only has 0 layers error using Unsloth with transformers==4.50.0, here’s a minimal working example that triggers the issue during .generate()
Expected behavior
It should run .generate() without raising a KeyError, as it did under transformers==4.49.0.