convert.py - RuntimeError: CUDA error: invalid configuration argument

```
python convert.py -i ~/safetensor/model -o ~/EXL2/model_4bit -c ~/EXL2/0000.parquet -b 4.0 -hb 6 -l 4096 -ml 4096

 -- Beginning new job
 -- Input: /home/user/safetensor/model
 -- Output: /home/user/EXL2/model_4bit
 -- Calibration dataset: /home/user/EXL2/0000.parquet, 100 / 16 (16) rows, 4096 tokens per sample
 -- Target bits per weight: 4.0 (decoder), 6 (head)
 -- Tokenizing samples (measurement)...
/home/user/exllamav2/conversion/tokenize.py:16: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  tokens = tokenizer.encode(row[0])
 -- Token embeddings (measurement)...
 -- Measuring quantization impact...
 -- Layer: model.layers.0 (Attention)
Traceback (most recent call last):
  File "/home/user/exllamav2/convert.py", line 168, in <module>
    measure_quant(job, save_job, model)
  File "/home/user/exllamav2/conversion/quantize.py", line 184, in measure_quant
    outputs = module.forward(x, cache, attn_mask, intermediates = True)
  File "/home/user/exllamav2/exllamav2/attn.py", line 195, in forward
    return self.forward_torch(hidden_states, cache, attn_mask, past_len, intermediates)
  File "/home/user/exllamav2/exllamav2/attn.py", line 460, in forward_torch
    key_states = self.repeat_kv(key_states, self.model.config.num_key_value_groups)
  File "/home/user/exllamav2/exllamav2/attn.py", line 188, in repeat_kv
    hidden_states = hidden_states.reshape(batch, num_key_value_heads * n_rep, slen, head_dim)
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
```

Any idea what could be the cause of this issue?

Edit: Bad or unsupported safetensor files it seems
Edit2: That wasn't bad safetensors in the end, cf TL;DR

---

TL;DR: The issue is likely due to VRAM limitation, 24GB not enough for -ml 4096 -l 4096 at the moment. Some VRAM otpimization code is underway, until then you must use the default -ml 2048 -l 2048.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

convert.py - RuntimeError: CUDA error: invalid configuration argument #20

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

convert.py - RuntimeError: CUDA error: invalid configuration argument #20

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions