-
-
Notifications
You must be signed in to change notification settings - Fork 321
Closed
Description
python convert.py -i ~/safetensor/model -o ~/EXL2/model_4bit -c ~/EXL2/0000.parquet -b 4.0 -hb 6 -l 4096 -ml 4096
-- Beginning new job
-- Input: /home/user/safetensor/model
-- Output: /home/user/EXL2/model_4bit
-- Calibration dataset: /home/user/EXL2/0000.parquet, 100 / 16 (16) rows, 4096 tokens per sample
-- Target bits per weight: 4.0 (decoder), 6 (head)
-- Tokenizing samples (measurement)...
/home/user/exllamav2/conversion/tokenize.py:16: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
tokens = tokenizer.encode(row[0])
-- Token embeddings (measurement)...
-- Measuring quantization impact...
-- Layer: model.layers.0 (Attention)
Traceback (most recent call last):
File "/home/user/exllamav2/convert.py", line 168, in <module>
measure_quant(job, save_job, model)
File "/home/user/exllamav2/conversion/quantize.py", line 184, in measure_quant
outputs = module.forward(x, cache, attn_mask, intermediates = True)
File "/home/user/exllamav2/exllamav2/attn.py", line 195, in forward
return self.forward_torch(hidden_states, cache, attn_mask, past_len, intermediates)
File "/home/user/exllamav2/exllamav2/attn.py", line 460, in forward_torch
key_states = self.repeat_kv(key_states, self.model.config.num_key_value_groups)
File "/home/user/exllamav2/exllamav2/attn.py", line 188, in repeat_kv
hidden_states = hidden_states.reshape(batch, num_key_value_heads * n_rep, slen, head_dim)
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Any idea what could be the cause of this issue?
Edit: Bad or unsupported safetensor files it seems
Edit2: That wasn't bad safetensors in the end, cf TL;DR
TL;DR: The issue is likely due to VRAM limitation, 24GB not enough for -ml 4096 -l 4096 at the moment. Some VRAM otpimization code is underway, until then you must use the default -ml 2048 -l 2048.
Metadata
Metadata
Assignees
Labels
No labels