You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After updating transformers from version 4.49.0 to 4.50.0, running .generate() on a PEFT model throws a KeyError related to cache layers:
File "//.venv/lib/python3.12/site-packages/peft/peft_model.py", line 1874, in generate outputs = self.base_model.generate(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/unsloth/models/llama.py", line 1579, in unsloth_fast_generate output = self._old_generate(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 2326, in generate result = self._sample( ^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 3286, in _sample outputs = self(**model_inputs, return_dict=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/unsloth/models/llama.py", line 1026, in _CausalLM_fast_forward outputs = fast_forward_inference( ^^^^^^^^^^^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/unsloth/models/llama.py", line 938, in LlamaModel_fast_forward_inference seq_len = past_key_values[0][0].shape[-2] ~~~~~~~~~~~~~~~^^^ File "//.venv/lib/python3.12/site-packages/transformers/cache_utils.py", line 387, in __getitem__ raise KeyError(f"Cache only has {len(self)} layers, attempted to access layer with index {layer_idx}") KeyError: 'Cache only has 0 layers, attempted to access layer with index 0'
Let me know if you want me to cross-check for any known changes in 4.50 related to cache handling.
Who can help?
No response
Information
The official example scripts
My own modified scripts
Tasks
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
To reproduce the KeyError: Cache only has 0 layers error using Unsloth with transformers==4.50.0, here’s a minimal working example that triggers the issue during .generate()
Expected behavior
It should run .generate() without raising a KeyError, as it did under transformers==4.49.0.
The text was updated successfully, but these errors were encountered:
System Info
After updating transformers from version 4.49.0 to 4.50.0, running .generate() on a PEFT model throws a KeyError related to cache layers:
File "//.venv/lib/python3.12/site-packages/peft/peft_model.py", line 1874, in generate outputs = self.base_model.generate(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/unsloth/models/llama.py", line 1579, in unsloth_fast_generate output = self._old_generate(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 2326, in generate result = self._sample( ^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 3286, in _sample outputs = self(**model_inputs, return_dict=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/unsloth/models/llama.py", line 1026, in _CausalLM_fast_forward outputs = fast_forward_inference( ^^^^^^^^^^^^^^^^^^^^^^^ File "//.venv/lib/python3.12/site-packages/unsloth/models/llama.py", line 938, in LlamaModel_fast_forward_inference seq_len = past_key_values[0][0].shape[-2] ~~~~~~~~~~~~~~~^^^ File "//.venv/lib/python3.12/site-packages/transformers/cache_utils.py", line 387, in __getitem__ raise KeyError(f"Cache only has {len(self)} layers, attempted to access layer with index {layer_idx}") KeyError: 'Cache only has 0 layers, attempted to access layer with index 0'
Let me know if you want me to cross-check for any known changes in 4.50 related to cache handling.
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
To reproduce the KeyError: Cache only has 0 layers error using Unsloth with transformers==4.50.0, here’s a minimal working example that triggers the issue during .generate()
Expected behavior
It should run .generate() without raising a KeyError, as it did under transformers==4.49.0.
The text was updated successfully, but these errors were encountered: