'Cache only has 0 layers' during generation after upgrading Transformers from 4.49 to 4.50

### System Info

After updating transformers from version 4.49.0 to 4.50.0, running .generate() on a PEFT model throws a KeyError related to cache layers:

`  File "//.venv/lib/python3.12/site-packages/peft/peft_model.py", line 1874, in generate
    outputs = self.base_model.generate(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "//.venv/lib/python3.12/site-packages/unsloth/models/llama.py", line 1579, in unsloth_fast_generate
    output = self._old_generate(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "//.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "//.venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 2326, in generate
    result = self._sample(
             ^^^^^^^^^^^^^
  File "//.venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 3286, in _sample
    outputs = self(**model_inputs, return_dict=True)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "//.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "//.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "//.venv/lib/python3.12/site-packages/unsloth/models/llama.py", line 1026, in _CausalLM_fast_forward
    outputs = fast_forward_inference(
              ^^^^^^^^^^^^^^^^^^^^^^^
  File "//.venv/lib/python3.12/site-packages/unsloth/models/llama.py", line 938, in LlamaModel_fast_forward_inference
    seq_len = past_key_values[0][0].shape[-2]
              ~~~~~~~~~~~~~~~^^^
  File "//.venv/lib/python3.12/site-packages/transformers/cache_utils.py", line 387, in __getitem__
    raise KeyError(f"Cache only has {len(self)} layers, attempted to access layer with index {layer_idx}")
KeyError: 'Cache only has 0 layers, attempted to access layer with index 0'`


Let me know if you want me to cross-check for any known changes in 4.50 related to cache handling.

### Who can help?

_No response_

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

To reproduce the KeyError: Cache only has 0 layers error using Unsloth with transformers==4.50.0, here’s a minimal working example that triggers the issue during .generate()

### Expected behavior

It should run .generate() without raising a KeyError, as it did under transformers==4.49.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

'Cache only has 0 layers' during generation after upgrading Transformers from 4.49 to 4.50 #36913

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

'Cache only has 0 layers' during generation after upgrading Transformers from 4.49 to 4.50 #36913

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions