Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed at model.save_pretrained_gguf #341

ch-tseng opened this issue Apr 16, 2024 · 2 comments

Failed at model.save_pretrained_gguf #341

ch-tseng opened this issue Apr 16, 2024 · 2 comments


Copy link

I use the model: to fine-tune, but always got the error. training is OK, but model.save_pretrained_gguf failed.

==((====))== Unsloth: Fast Llama patching release 2024.4
\ /| GPU: NVIDIA GeForce RTX 3090. Max memory: 23.691 GB. Platform = Linux.
O^O/ _/ \ Pytorch: 2.2.2+cu121. CUDA = 8.6. CUDA Toolkit = 12.1.
\ / Bfloat16 = TRUE. Xformers = 0.0.25.post1. FA = False.
"--" Free Apache license:
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████| 3/3 [00:03<00:00, 1.02s/it]
You set add_prefix_space. The tokenizer needs to be converted from the slow tokenizers
Unsloth 2024.4 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.
Setting pad_token_id to eos_token_id:2 for open-end generation.
['Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n\n\n### Input:\n上上禮拜持續出現頭痛、噁心、頭暈的症狀,有時睡一下起來還是沒有緩解,大概都痛在太陽穴上面一點,有時痛在頭腦勺(較少),躺著起來頭暈頻率越來越高(本身有貧血,但近期只要姿勢一轉換就會頭暈眼前接近黑色),容易疲累,想問一下這些症狀有需要到醫院去檢查嗎?\n\n### Response:\n\n您好:\n根據您的描述,您可能有以下幾種可能的原因:\n1. 貧血:貧血是常見的問題,若沒有定期檢查,可能會導致頭暈、頭痛、疲累等症狀。\n2. 內耳問題:內耳有平衡器官,若內耳有問題,可能會導致頭暈、頭痛、噁心等症狀。\n3. 其他疾病:如甲狀腺疾病、心臟疾病、糖尿病、高血壓等,都可能會引起頭暈、頭痛、噁心等症狀。\n建議您前往醫院,讓醫師為您做詳細的檢查,以確定病因,並接受適當的治療。\n祝健康! ']
Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 46.9 out of 62.57 RAM for saving.
100%|█████████████████████████████████████████████████████████████████████████████████| 32/32 [00:00<00:00, 90.79it/s]
Unsloth: Saving tokenizer... Done.
Unsloth: Saving model... This might take 5 minutes for Llama-7b...
Unsloth: Converting llama model. Can use fast conversion = True.
==((====))== Unsloth: Conversion from QLoRA to GGUF information
\ /| [0] Installing llama.cpp will take 3 minutes.
O^O/ _/ \ [1] Converting HF to GUUF 16bits will take 3 minutes.
\ / [2] Converting GGUF 16bits to q4_k_m will take 20 minutes.
-" In total, you will have to wait around 26 minutes.

Unsloth: [0] Installing llama.cpp. This will take 3 minutes...
Unsloth: [1] Converting model at ch_taide_medicine.gguf into f16 GGUF format.
The output location will be ./ch_taide_medicine.gguf-unsloth.F16.gguf
This will take 3 minutes...
Loading model file ch_taide_medicine.gguf/model-00001-of-00003.safetensors
Loading model file ch_taide_medicine.gguf/model-00001-of-00003.safetensors
Loading model file ch_taide_medicine.gguf/model-00002-of-00003.safetensors
Loading model file ch_taide_medicine.gguf/model-00003-of-00003.safetensors
params = Params(n_vocab=56064, n_embd=4096, n_layer=32, n_ctx=4096, n_ff=11008, n_head=32, n_head_kv=32, n_experts=None, n_experts_used=None, f_norm_eps=1e-05, rope_scaling_type=None, f_rope_freq_base=10000.0, f_rope_scale=None, n_orig_ctx=None, rope_finetuned=None, ftype=<GGMLFileType.MostlyF16: 1>, path_model=PosixPath('ch_taide_medicine.gguf'))
Loaded vocab file PosixPath('ch_taide_medicine.gguf/tokenizer.json'), type 'hfft'
Vocab info: <LlamaHfVocab with 56020 base tokens and 0 added tokens>
Special vocab info: <SpecialVocab with 0 merges, special tokens {'bos': 1, 'eos': 2, 'unk': 0, 'pad': 32000}, add special tokens {'bos': True, 'eos': False}>
Permuting layer 0
Permuting layer 1
Permuting layer 2
Permuting layer 3
Permuting layer 4
Permuting layer 5
Permuting layer 6
Permuting layer 7
Permuting layer 8
Permuting layer 9
Permuting layer 10
Permuting layer 11
Permuting layer 12
Permuting layer 13
Permuting layer 14
Permuting layer 15
Permuting layer 16
Permuting layer 17
Permuting layer 18
Permuting layer 19
Permuting layer 20
Permuting layer 21
Permuting layer 22
Permuting layer 23
Permuting layer 24
Permuting layer 25
Permuting layer 26
Permuting layer 27
Permuting layer 28
Permuting layer 29
Permuting layer 30
Permuting layer 31
model.embed_tokens.weight -> token_embd.weight | BF16 | [56064, 4096]
model.layers.0.input_layernorm.weight -> blk.0.attn_norm.weight | BF16 | [4096]
model.layers.0.mlp.down_proj.weight -> blk.0.ffn_down.weight | BF16 | [4096, 11008]
model.layers.0.mlp.gate_proj.weight -> blk.0.ffn_gate.weight | BF16 | [11008, 4096]
model.layers.0.mlp.up_proj.weight -> blk.0.ffn_up.weight | BF16 | [11008, 4096]
model.layers.0.post_attention_layernorm.weight -> blk.0.ffn_norm.weight | BF16 | [4096]
model.layers.0.self_attn.k_proj.weight -> blk.0.attn_k.weight | BF16 | [4096, 4096]
model.layers.0.self_attn.o_proj.weight -> blk.0.attn_output.weight | BF16 | [4096, 4096]
model.layers.0.self_attn.q_proj.weight -> blk.0.attn_q.weight | BF16 | [4096, 4096]
model.layers.0.self_attn.v_proj.weight -> blk.0.attn_v.weight | BF16 | [4096, 4096]
model.layers.1.input_layernorm.weight -> blk.1.attn_norm.weight | BF16 | [4096]
model.layers.1.mlp.down_proj.weight -> blk.1.ffn_down.weight | BF16 | [4096, 11008]
model.layers.1.mlp.gate_proj.weight -> blk.1.ffn_gate.weight | BF16 | [11008, 4096]
model.layers.1.mlp.up_proj.weight -> blk.1.ffn_up.weight | BF16 | [11008, 4096]
model.layers.1.post_attention_layernorm.weight -> blk.1.ffn_norm.weight | BF16 | [4096]
model.layers.1.self_attn.k_proj.weight -> blk.1.attn_k.weight | BF16 | [4096, 4096]
model.layers.1.self_attn.o_proj.weight -> blk.1.attn_output.weight | BF16 | [4096, 4096]
model.layers.1.self_attn.q_proj.weight -> blk.1.attn_q.weight | BF16 | [4096, 4096]
model.layers.1.self_attn.v_proj.weight -> blk.1.attn_v.weight | BF16 | [4096, 4096]
model.layers.10.input_layernorm.weight -> blk.10.attn_norm.weight | BF16 | [4096]
model.layers.10.mlp.down_proj.weight -> blk.10.ffn_down.weight | BF16 | [4096, 11008]
model.layers.10.mlp.gate_proj.weight -> blk.10.ffn_gate.weight | BF16 | [11008, 4096]
model.layers.10.mlp.up_proj.weight -> blk.10.ffn_up.weight | BF16 | [11008, 4096]
model.layers.10.post_attention_layernorm.weight -> blk.10.ffn_norm.weight | BF16 | [4096]
model.layers.10.self_attn.k_proj.weight -> blk.10.attn_k.weight | BF16 | [4096, 4096]
model.layers.10.self_attn.o_proj.weight -> blk.10.attn_output.weight | BF16 | [4096, 4096]
model.layers.10.self_attn.q_proj.weight -> blk.10.attn_q.weight | BF16 | [4096, 4096]
model.layers.10.self_attn.v_proj.weight -> blk.10.attn_v.weight | BF16 | [4096, 4096]
model.layers.11.self_attn.k_proj.weight -> blk.11.attn_k.weight | BF16 | [4096, 4096]
model.layers.11.self_attn.q_proj.weight -> blk.11.attn_q.weight | BF16 | [4096, 4096]
model.layers.2.input_layernorm.weight -> blk.2.attn_norm.weight | BF16 | [4096]
model.layers.2.mlp.down_proj.weight -> blk.2.ffn_down.weight | BF16 | [4096, 11008]
model.layers.2.mlp.gate_proj.weight -> blk.2.ffn_gate.weight | BF16 | [11008, 4096]
model.layers.2.mlp.up_proj.weight -> blk.2.ffn_up.weight | BF16 | [11008, 4096]
model.layers.2.post_attention_layernorm.weight -> blk.2.ffn_norm.weight | BF16 | [4096]
model.layers.2.self_attn.k_proj.weight -> blk.2.attn_k.weight | BF16 | [4096, 4096]
model.layers.2.self_attn.o_proj.weight -> blk.2.attn_output.weight | BF16 | [4096, 4096]
model.layers.2.self_attn.q_proj.weight -> blk.2.attn_q.weight | BF16 | [4096, 4096]
model.layers.2.self_attn.v_proj.weight -> blk.2.attn_v.weight | BF16 | [4096, 4096]
model.layers.3.input_layernorm.weight -> blk.3.attn_norm.weight | BF16 | [4096]
model.layers.3.mlp.down_proj.weight -> blk.3.ffn_down.weight | BF16 | [4096, 11008]
model.layers.3.mlp.gate_proj.weight -> blk.3.ffn_gate.weight | BF16 | [11008, 4096]
model.layers.3.mlp.up_proj.weight -> blk.3.ffn_up.weight | BF16 | [11008, 4096]
model.layers.3.post_attention_layernorm.weight -> blk.3.ffn_norm.weight | BF16 | [4096]
model.layers.3.self_attn.k_proj.weight -> blk.3.attn_k.weight | BF16 | [4096, 4096]
model.layers.3.self_attn.o_proj.weight -> blk.3.attn_output.weight | BF16 | [4096, 4096]
model.layers.3.self_attn.q_proj.weight -> blk.3.attn_q.weight | BF16 | [4096, 4096]
model.layers.3.self_attn.v_proj.weight -> blk.3.attn_v.weight | BF16 | [4096, 4096]
model.layers.4.input_layernorm.weight -> blk.4.attn_norm.weight | BF16 | [4096]
model.layers.4.mlp.down_proj.weight -> blk.4.ffn_down.weight | BF16 | [4096, 11008]
model.layers.4.mlp.gate_proj.weight -> blk.4.ffn_gate.weight | BF16 | [11008, 4096]
model.layers.4.mlp.up_proj.weight -> blk.4.ffn_up.weight | BF16 | [11008, 4096]
model.layers.4.post_attention_layernorm.weight -> blk.4.ffn_norm.weight | BF16 | [4096]
model.layers.4.self_attn.k_proj.weight -> blk.4.attn_k.weight | BF16 | [4096, 4096]
model.layers.4.self_attn.o_proj.weight -> blk.4.attn_output.weight | BF16 | [4096, 4096]
model.layers.4.self_attn.q_proj.weight -> blk.4.attn_q.weight | BF16 | [4096, 4096]
model.layers.4.self_attn.v_proj.weight -> blk.4.attn_v.weight | BF16 | [4096, 4096]
model.layers.5.input_layernorm.weight -> blk.5.attn_norm.weight | BF16 | [4096]
model.layers.5.mlp.down_proj.weight -> blk.5.ffn_down.weight | BF16 | [4096, 11008]
model.layers.5.mlp.gate_proj.weight -> blk.5.ffn_gate.weight | BF16 | [11008, 4096]
model.layers.5.mlp.up_proj.weight -> blk.5.ffn_up.weight | BF16 | [11008, 4096]
model.layers.5.post_attention_layernorm.weight -> blk.5.ffn_norm.weight | BF16 | [4096]
model.layers.5.self_attn.k_proj.weight -> blk.5.attn_k.weight | BF16 | [4096, 4096]
model.layers.5.self_attn.o_proj.weight -> blk.5.attn_output.weight | BF16 | [4096, 4096]
model.layers.5.self_attn.q_proj.weight -> blk.5.attn_q.weight | BF16 | [4096, 4096]
model.layers.5.self_attn.v_proj.weight -> blk.5.attn_v.weight | BF16 | [4096, 4096]
model.layers.6.input_layernorm.weight -> blk.6.attn_norm.weight | BF16 | [4096]
model.layers.6.mlp.down_proj.weight -> blk.6.ffn_down.weight | BF16 | [4096, 11008]
model.layers.6.mlp.gate_proj.weight -> blk.6.ffn_gate.weight | BF16 | [11008, 4096]
model.layers.6.mlp.up_proj.weight -> blk.6.ffn_up.weight | BF16 | [11008, 4096]
model.layers.6.post_attention_layernorm.weight -> blk.6.ffn_norm.weight | BF16 | [4096]
model.layers.6.self_attn.k_proj.weight -> blk.6.attn_k.weight | BF16 | [4096, 4096]
model.layers.6.self_attn.o_proj.weight -> blk.6.attn_output.weight | BF16 | [4096, 4096]
model.layers.6.self_attn.q_proj.weight -> blk.6.attn_q.weight | BF16 | [4096, 4096]
model.layers.6.self_attn.v_proj.weight -> blk.6.attn_v.weight | BF16 | [4096, 4096]
model.layers.7.input_layernorm.weight -> blk.7.attn_norm.weight | BF16 | [4096]
model.layers.7.mlp.down_proj.weight -> blk.7.ffn_down.weight | BF16 | [4096, 11008]
model.layers.7.mlp.gate_proj.weight -> blk.7.ffn_gate.weight | BF16 | [11008, 4096]
model.layers.7.mlp.up_proj.weight -> blk.7.ffn_up.weight | BF16 | [11008, 4096]
model.layers.7.post_attention_layernorm.weight -> blk.7.ffn_norm.weight | BF16 | [4096]
model.layers.7.self_attn.k_proj.weight -> blk.7.attn_k.weight | BF16 | [4096, 4096]
model.layers.7.self_attn.o_proj.weight -> blk.7.attn_output.weight | BF16 | [4096, 4096]
model.layers.7.self_attn.q_proj.weight -> blk.7.attn_q.weight | BF16 | [4096, 4096]
model.layers.7.self_attn.v_proj.weight -> blk.7.attn_v.weight | BF16 | [4096, 4096]
model.layers.8.input_layernorm.weight -> blk.8.attn_norm.weight | BF16 | [4096]
model.layers.8.mlp.down_proj.weight -> blk.8.ffn_down.weight | BF16 | [4096, 11008]
model.layers.8.mlp.gate_proj.weight -> blk.8.ffn_gate.weight | BF16 | [11008, 4096]
model.layers.8.mlp.up_proj.weight -> blk.8.ffn_up.weight | BF16 | [11008, 4096]
model.layers.8.post_attention_layernorm.weight -> blk.8.ffn_norm.weight | BF16 | [4096]
model.layers.8.self_attn.k_proj.weight -> blk.8.attn_k.weight | BF16 | [4096, 4096]
model.layers.8.self_attn.o_proj.weight -> blk.8.attn_output.weight | BF16 | [4096, 4096]
model.layers.8.self_attn.q_proj.weight -> blk.8.attn_q.weight | BF16 | [4096, 4096]
model.layers.8.self_attn.v_proj.weight -> blk.8.attn_v.weight | BF16 | [4096, 4096]
model.layers.9.input_layernorm.weight -> blk.9.attn_norm.weight | BF16 | [4096]
model.layers.9.mlp.down_proj.weight -> blk.9.ffn_down.weight | BF16 | [4096, 11008]
model.layers.9.mlp.gate_proj.weight -> blk.9.ffn_gate.weight | BF16 | [11008, 4096]
model.layers.9.mlp.up_proj.weight -> blk.9.ffn_up.weight | BF16 | [11008, 4096]
model.layers.9.post_attention_layernorm.weight -> blk.9.ffn_norm.weight | BF16 | [4096]
model.layers.9.self_attn.k_proj.weight -> blk.9.attn_k.weight | BF16 | [4096, 4096]
model.layers.9.self_attn.o_proj.weight -> blk.9.attn_output.weight | BF16 | [4096, 4096]
model.layers.9.self_attn.q_proj.weight -> blk.9.attn_q.weight | BF16 | [4096, 4096]
model.layers.9.self_attn.v_proj.weight -> blk.9.attn_v.weight | BF16 | [4096, 4096]
model.layers.11.input_layernorm.weight -> blk.11.attn_norm.weight | BF16 | [4096]
model.layers.11.mlp.down_proj.weight -> blk.11.ffn_down.weight | BF16 | [4096, 11008]
model.layers.11.mlp.gate_proj.weight -> blk.11.ffn_gate.weight | BF16 | [11008, 4096]
model.layers.11.mlp.up_proj.weight -> blk.11.ffn_up.weight | BF16 | [11008, 4096]
model.layers.11.post_attention_layernorm.weight -> blk.11.ffn_norm.weight | BF16 | [4096]
model.layers.11.self_attn.o_proj.weight -> blk.11.attn_output.weight | BF16 | [4096, 4096]
model.layers.11.self_attn.v_proj.weight -> blk.11.attn_v.weight | BF16 | [4096, 4096]
model.layers.12.input_layernorm.weight -> blk.12.attn_norm.weight | BF16 | [4096]
model.layers.12.mlp.down_proj.weight -> blk.12.ffn_down.weight | BF16 | [4096, 11008]
model.layers.12.mlp.gate_proj.weight -> blk.12.ffn_gate.weight | BF16 | [11008, 4096]
model.layers.12.mlp.up_proj.weight -> blk.12.ffn_up.weight | BF16 | [11008, 4096]
model.layers.12.post_attention_layernorm.weight -> blk.12.ffn_norm.weight | BF16 | [4096]
model.layers.12.self_attn.k_proj.weight -> blk.12.attn_k.weight | BF16 | [4096, 4096]
model.layers.12.self_attn.o_proj.weight -> blk.12.attn_output.weight | BF16 | [4096, 4096]
model.layers.12.self_attn.q_proj.weight -> blk.12.attn_q.weight | BF16 | [4096, 4096]
model.layers.12.self_attn.v_proj.weight -> blk.12.attn_v.weight | BF16 | [4096, 4096]
model.layers.13.input_layernorm.weight -> blk.13.attn_norm.weight | BF16 | [4096]
model.layers.13.mlp.down_proj.weight -> blk.13.ffn_down.weight | BF16 | [4096, 11008]
model.layers.13.mlp.gate_proj.weight -> blk.13.ffn_gate.weight | BF16 | [11008, 4096]
model.layers.13.mlp.up_proj.weight -> blk.13.ffn_up.weight | BF16 | [11008, 4096]
model.layers.13.post_attention_layernorm.weight -> blk.13.ffn_norm.weight | BF16 | [4096]
model.layers.13.self_attn.k_proj.weight -> blk.13.attn_k.weight | BF16 | [4096, 4096]
model.layers.13.self_attn.o_proj.weight -> blk.13.attn_output.weight | BF16 | [4096, 4096]
model.layers.13.self_attn.q_proj.weight -> blk.13.attn_q.weight | BF16 | [4096, 4096]
model.layers.13.self_attn.v_proj.weight -> blk.13.attn_v.weight | BF16 | [4096, 4096]
model.layers.14.input_layernorm.weight -> blk.14.attn_norm.weight | BF16 | [4096]
model.layers.14.mlp.down_proj.weight -> blk.14.ffn_down.weight | BF16 | [4096, 11008]
model.layers.14.mlp.gate_proj.weight -> blk.14.ffn_gate.weight | BF16 | [11008, 4096]
model.layers.14.mlp.up_proj.weight -> blk.14.ffn_up.weight | BF16 | [11008, 4096]
model.layers.14.post_attention_layernorm.weight -> blk.14.ffn_norm.weight | BF16 | [4096]
model.layers.14.self_attn.k_proj.weight -> blk.14.attn_k.weight | BF16 | [4096, 4096]
model.layers.14.self_attn.o_proj.weight -> blk.14.attn_output.weight | BF16 | [4096, 4096]
model.layers.14.self_attn.q_proj.weight -> blk.14.attn_q.weight | BF16 | [4096, 4096]
model.layers.14.self_attn.v_proj.weight -> blk.14.attn_v.weight | BF16 | [4096, 4096]
model.layers.15.input_layernorm.weight -> blk.15.attn_norm.weight | BF16 | [4096]
model.layers.15.mlp.down_proj.weight -> blk.15.ffn_down.weight | BF16 | [4096, 11008]
model.layers.15.mlp.gate_proj.weight -> blk.15.ffn_gate.weight | BF16 | [11008, 4096]
model.layers.15.mlp.up_proj.weight -> blk.15.ffn_up.weight | BF16 | [11008, 4096]
model.layers.15.post_attention_layernorm.weight -> blk.15.ffn_norm.weight | BF16 | [4096]
model.layers.15.self_attn.k_proj.weight -> blk.15.attn_k.weight | BF16 | [4096, 4096]
model.layers.15.self_attn.o_proj.weight -> blk.15.attn_output.weight | BF16 | [4096, 4096]
model.layers.15.self_attn.q_proj.weight -> blk.15.attn_q.weight | BF16 | [4096, 4096]
model.layers.15.self_attn.v_proj.weight -> blk.15.attn_v.weight | BF16 | [4096, 4096]
model.layers.16.input_layernorm.weight -> blk.16.attn_norm.weight | BF16 | [4096]
model.layers.16.mlp.down_proj.weight -> blk.16.ffn_down.weight | BF16 | [4096, 11008]
model.layers.16.mlp.gate_proj.weight -> blk.16.ffn_gate.weight | BF16 | [11008, 4096]
model.layers.16.mlp.up_proj.weight -> blk.16.ffn_up.weight | BF16 | [11008, 4096]
model.layers.16.post_attention_layernorm.weight -> blk.16.ffn_norm.weight | BF16 | [4096]
model.layers.16.self_attn.k_proj.weight -> blk.16.attn_k.weight | BF16 | [4096, 4096]
model.layers.16.self_attn.o_proj.weight -> blk.16.attn_output.weight | BF16 | [4096, 4096]
model.layers.16.self_attn.q_proj.weight -> blk.16.attn_q.weight | BF16 | [4096, 4096]
model.layers.16.self_attn.v_proj.weight -> blk.16.attn_v.weight | BF16 | [4096, 4096]
model.layers.17.input_layernorm.weight -> blk.17.attn_norm.weight | BF16 | [4096]
model.layers.17.mlp.down_proj.weight -> blk.17.ffn_down.weight | BF16 | [4096, 11008]
model.layers.17.mlp.gate_proj.weight -> blk.17.ffn_gate.weight | BF16 | [11008, 4096]
model.layers.17.mlp.up_proj.weight -> blk.17.ffn_up.weight | BF16 | [11008, 4096]
model.layers.17.post_attention_layernorm.weight -> blk.17.ffn_norm.weight | BF16 | [4096]
model.layers.17.self_attn.k_proj.weight -> blk.17.attn_k.weight | BF16 | [4096, 4096]
model.layers.17.self_attn.o_proj.weight -> blk.17.attn_output.weight | BF16 | [4096, 4096]
model.layers.17.self_attn.q_proj.weight -> blk.17.attn_q.weight | BF16 | [4096, 4096]
model.layers.17.self_attn.v_proj.weight -> blk.17.attn_v.weight | BF16 | [4096, 4096]
model.layers.18.input_layernorm.weight -> blk.18.attn_norm.weight | BF16 | [4096]
model.layers.18.mlp.down_proj.weight -> blk.18.ffn_down.weight | BF16 | [4096, 11008]
model.layers.18.mlp.gate_proj.weight -> blk.18.ffn_gate.weight | BF16 | [11008, 4096]
model.layers.18.mlp.up_proj.weight -> blk.18.ffn_up.weight | BF16 | [11008, 4096]
model.layers.18.post_attention_layernorm.weight -> blk.18.ffn_norm.weight | BF16 | [4096]
model.layers.18.self_attn.k_proj.weight -> blk.18.attn_k.weight | BF16 | [4096, 4096]
model.layers.18.self_attn.o_proj.weight -> blk.18.attn_output.weight | BF16 | [4096, 4096]
model.layers.18.self_attn.q_proj.weight -> blk.18.attn_q.weight | BF16 | [4096, 4096]
model.layers.18.self_attn.v_proj.weight -> blk.18.attn_v.weight | BF16 | [4096, 4096]
model.layers.19.input_layernorm.weight -> blk.19.attn_norm.weight | BF16 | [4096]
model.layers.19.mlp.down_proj.weight -> blk.19.ffn_down.weight | BF16 | [4096, 11008]
model.layers.19.mlp.gate_proj.weight -> blk.19.ffn_gate.weight | BF16 | [11008, 4096]
model.layers.19.mlp.up_proj.weight -> blk.19.ffn_up.weight | BF16 | [11008, 4096]
model.layers.19.post_attention_layernorm.weight -> blk.19.ffn_norm.weight | BF16 | [4096]
model.layers.19.self_attn.k_proj.weight -> blk.19.attn_k.weight | BF16 | [4096, 4096]
model.layers.19.self_attn.o_proj.weight -> blk.19.attn_output.weight | BF16 | [4096, 4096]
model.layers.19.self_attn.q_proj.weight -> blk.19.attn_q.weight | BF16 | [4096, 4096]
model.layers.19.self_attn.v_proj.weight -> blk.19.attn_v.weight | BF16 | [4096, 4096]
model.layers.20.input_layernorm.weight -> blk.20.attn_norm.weight | BF16 | [4096]
model.layers.20.mlp.down_proj.weight -> blk.20.ffn_down.weight | BF16 | [4096, 11008]
model.layers.20.mlp.gate_proj.weight -> blk.20.ffn_gate.weight | BF16 | [11008, 4096]
model.layers.20.mlp.up_proj.weight -> blk.20.ffn_up.weight | BF16 | [11008, 4096]
model.layers.20.post_attention_layernorm.weight -> blk.20.ffn_norm.weight | BF16 | [4096]
model.layers.20.self_attn.k_proj.weight -> blk.20.attn_k.weight | BF16 | [4096, 4096]
model.layers.20.self_attn.o_proj.weight -> blk.20.attn_output.weight | BF16 | [4096, 4096]
model.layers.20.self_attn.q_proj.weight -> blk.20.attn_q.weight | BF16 | [4096, 4096]
model.layers.20.self_attn.v_proj.weight -> blk.20.attn_v.weight | BF16 | [4096, 4096]
model.layers.21.input_layernorm.weight -> blk.21.attn_norm.weight | BF16 | [4096]
model.layers.21.mlp.down_proj.weight -> blk.21.ffn_down.weight | BF16 | [4096, 11008]
model.layers.21.mlp.gate_proj.weight -> blk.21.ffn_gate.weight | BF16 | [11008, 4096]
model.layers.21.mlp.up_proj.weight -> blk.21.ffn_up.weight | BF16 | [11008, 4096]
model.layers.21.post_attention_layernorm.weight -> blk.21.ffn_norm.weight | BF16 | [4096]
model.layers.21.self_attn.k_proj.weight -> blk.21.attn_k.weight | BF16 | [4096, 4096]
model.layers.21.self_attn.o_proj.weight -> blk.21.attn_output.weight | BF16 | [4096, 4096]
model.layers.21.self_attn.q_proj.weight -> blk.21.attn_q.weight | BF16 | [4096, 4096]
model.layers.21.self_attn.v_proj.weight -> blk.21.attn_v.weight | BF16 | [4096, 4096]
model.layers.22.input_layernorm.weight -> blk.22.attn_norm.weight | BF16 | [4096]
model.layers.22.mlp.down_proj.weight -> blk.22.ffn_down.weight | BF16 | [4096, 11008]
model.layers.22.mlp.gate_proj.weight -> blk.22.ffn_gate.weight | BF16 | [11008, 4096]
model.layers.22.mlp.up_proj.weight -> blk.22.ffn_up.weight | BF16 | [11008, 4096]
model.layers.22.post_attention_layernorm.weight -> blk.22.ffn_norm.weight | BF16 | [4096]
model.layers.22.self_attn.k_proj.weight -> blk.22.attn_k.weight | BF16 | [4096, 4096]
model.layers.22.self_attn.o_proj.weight -> blk.22.attn_output.weight | BF16 | [4096, 4096]
model.layers.22.self_attn.q_proj.weight -> blk.22.attn_q.weight | BF16 | [4096, 4096]
model.layers.22.self_attn.v_proj.weight -> blk.22.attn_v.weight | BF16 | [4096, 4096]
model.layers.23.self_attn.k_proj.weight -> blk.23.attn_k.weight | BF16 | [4096, 4096]
model.layers.23.self_attn.o_proj.weight -> blk.23.attn_output.weight | BF16 | [4096, 4096]
model.layers.23.self_attn.q_proj.weight -> blk.23.attn_q.weight | BF16 | [4096, 4096]
model.layers.23.self_attn.v_proj.weight -> blk.23.attn_v.weight | BF16 | [4096, 4096]
lm_head.weight -> output.weight | BF16 | [56064, 4096]
model.layers.23.input_layernorm.weight -> blk.23.attn_norm.weight | BF16 | [4096]
model.layers.23.mlp.down_proj.weight -> blk.23.ffn_down.weight | BF16 | [4096, 11008]
model.layers.23.mlp.gate_proj.weight -> blk.23.ffn_gate.weight | BF16 | [11008, 4096]
model.layers.23.mlp.up_proj.weight -> blk.23.ffn_up.weight | BF16 | [11008, 4096]
model.layers.23.post_attention_layernorm.weight -> blk.23.ffn_norm.weight | BF16 | [4096]
model.layers.24.input_layernorm.weight -> blk.24.attn_norm.weight | BF16 | [4096]
model.layers.24.mlp.down_proj.weight -> blk.24.ffn_down.weight | BF16 | [4096, 11008]
model.layers.24.mlp.gate_proj.weight -> blk.24.ffn_gate.weight | BF16 | [11008, 4096]
model.layers.24.mlp.up_proj.weight -> blk.24.ffn_up.weight | BF16 | [11008, 4096]
model.layers.24.post_attention_layernorm.weight -> blk.24.ffn_norm.weight | BF16 | [4096]
model.layers.24.self_attn.k_proj.weight -> blk.24.attn_k.weight | BF16 | [4096, 4096]
model.layers.24.self_attn.o_proj.weight -> blk.24.attn_output.weight | BF16 | [4096, 4096]
model.layers.24.self_attn.q_proj.weight -> blk.24.attn_q.weight | BF16 | [4096, 4096]
model.layers.24.self_attn.v_proj.weight -> blk.24.attn_v.weight | BF16 | [4096, 4096]
model.layers.25.input_layernorm.weight -> blk.25.attn_norm.weight | BF16 | [4096]
model.layers.25.mlp.down_proj.weight -> blk.25.ffn_down.weight | BF16 | [4096, 11008]
model.layers.25.mlp.gate_proj.weight -> blk.25.ffn_gate.weight | BF16 | [11008, 4096]
model.layers.25.mlp.up_proj.weight -> blk.25.ffn_up.weight | BF16 | [11008, 4096]
model.layers.25.post_attention_layernorm.weight -> blk.25.ffn_norm.weight | BF16 | [4096]
model.layers.25.self_attn.k_proj.weight -> blk.25.attn_k.weight | BF16 | [4096, 4096]
model.layers.25.self_attn.o_proj.weight -> blk.25.attn_output.weight | BF16 | [4096, 4096]
model.layers.25.self_attn.q_proj.weight -> blk.25.attn_q.weight | BF16 | [4096, 4096]
model.layers.25.self_attn.v_proj.weight -> blk.25.attn_v.weight | BF16 | [4096, 4096]
model.layers.26.input_layernorm.weight -> blk.26.attn_norm.weight | BF16 | [4096]
model.layers.26.mlp.down_proj.weight -> blk.26.ffn_down.weight | BF16 | [4096, 11008]
model.layers.26.mlp.gate_proj.weight -> blk.26.ffn_gate.weight | BF16 | [11008, 4096]
model.layers.26.mlp.up_proj.weight -> blk.26.ffn_up.weight | BF16 | [11008, 4096]
model.layers.26.post_attention_layernorm.weight -> blk.26.ffn_norm.weight | BF16 | [4096]
model.layers.26.self_attn.k_proj.weight -> blk.26.attn_k.weight | BF16 | [4096, 4096]
model.layers.26.self_attn.o_proj.weight -> blk.26.attn_output.weight | BF16 | [4096, 4096]
model.layers.26.self_attn.q_proj.weight -> blk.26.attn_q.weight | BF16 | [4096, 4096]
model.layers.26.self_attn.v_proj.weight -> blk.26.attn_v.weight | BF16 | [4096, 4096]
model.layers.27.input_layernorm.weight -> blk.27.attn_norm.weight | BF16 | [4096]
model.layers.27.mlp.down_proj.weight -> blk.27.ffn_down.weight | BF16 | [4096, 11008]
model.layers.27.mlp.gate_proj.weight -> blk.27.ffn_gate.weight | BF16 | [11008, 4096]
model.layers.27.mlp.up_proj.weight -> blk.27.ffn_up.weight | BF16 | [11008, 4096]
model.layers.27.post_attention_layernorm.weight -> blk.27.ffn_norm.weight | BF16 | [4096]
model.layers.27.self_attn.k_proj.weight -> blk.27.attn_k.weight | BF16 | [4096, 4096]
model.layers.27.self_attn.o_proj.weight -> blk.27.attn_output.weight | BF16 | [4096, 4096]
model.layers.27.self_attn.q_proj.weight -> blk.27.attn_q.weight | BF16 | [4096, 4096]
model.layers.27.self_attn.v_proj.weight -> blk.27.attn_v.weight | BF16 | [4096, 4096]
model.layers.28.input_layernorm.weight -> blk.28.attn_norm.weight | BF16 | [4096]
model.layers.28.mlp.down_proj.weight -> blk.28.ffn_down.weight | BF16 | [4096, 11008]
model.layers.28.mlp.gate_proj.weight -> blk.28.ffn_gate.weight | BF16 | [11008, 4096]
model.layers.28.mlp.up_proj.weight -> blk.28.ffn_up.weight | BF16 | [11008, 4096]
model.layers.28.post_attention_layernorm.weight -> blk.28.ffn_norm.weight | BF16 | [4096]
model.layers.28.self_attn.k_proj.weight -> blk.28.attn_k.weight | BF16 | [4096, 4096]
model.layers.28.self_attn.o_proj.weight -> blk.28.attn_output.weight | BF16 | [4096, 4096]
model.layers.28.self_attn.q_proj.weight -> blk.28.attn_q.weight | BF16 | [4096, 4096]
model.layers.28.self_attn.v_proj.weight -> blk.28.attn_v.weight | BF16 | [4096, 4096]
model.layers.29.input_layernorm.weight -> blk.29.attn_norm.weight | BF16 | [4096]
model.layers.29.mlp.down_proj.weight -> blk.29.ffn_down.weight | BF16 | [4096, 11008]
model.layers.29.mlp.gate_proj.weight -> blk.29.ffn_gate.weight | BF16 | [11008, 4096]
model.layers.29.mlp.up_proj.weight -> blk.29.ffn_up.weight | BF16 | [11008, 4096]
model.layers.29.post_attention_layernorm.weight -> blk.29.ffn_norm.weight | BF16 | [4096]
model.layers.29.self_attn.k_proj.weight -> blk.29.attn_k.weight | BF16 | [4096, 4096]
model.layers.29.self_attn.o_proj.weight -> blk.29.attn_output.weight | BF16 | [4096, 4096]
model.layers.29.self_attn.q_proj.weight -> blk.29.attn_q.weight | BF16 | [4096, 4096]
model.layers.29.self_attn.v_proj.weight -> blk.29.attn_v.weight | BF16 | [4096, 4096]
model.layers.30.input_layernorm.weight -> blk.30.attn_norm.weight | BF16 | [4096]
model.layers.30.mlp.down_proj.weight -> blk.30.ffn_down.weight | BF16 | [4096, 11008]
model.layers.30.mlp.gate_proj.weight -> blk.30.ffn_gate.weight | BF16 | [11008, 4096]
model.layers.30.mlp.up_proj.weight -> blk.30.ffn_up.weight | BF16 | [11008, 4096]
model.layers.30.post_attention_layernorm.weight -> blk.30.ffn_norm.weight | BF16 | [4096]
model.layers.30.self_attn.k_proj.weight -> blk.30.attn_k.weight | BF16 | [4096, 4096]
model.layers.30.self_attn.o_proj.weight -> blk.30.attn_output.weight | BF16 | [4096, 4096]
model.layers.30.self_attn.q_proj.weight -> blk.30.attn_q.weight | BF16 | [4096, 4096]
model.layers.30.self_attn.v_proj.weight -> blk.30.attn_v.weight | BF16 | [4096, 4096]
model.layers.31.input_layernorm.weight -> blk.31.attn_norm.weight | BF16 | [4096]
model.layers.31.mlp.down_proj.weight -> blk.31.ffn_down.weight | BF16 | [4096, 11008]
model.layers.31.mlp.gate_proj.weight -> blk.31.ffn_gate.weight | BF16 | [11008, 4096]
model.layers.31.mlp.up_proj.weight -> blk.31.ffn_up.weight | BF16 | [11008, 4096]
model.layers.31.post_attention_layernorm.weight -> blk.31.ffn_norm.weight | BF16 | [4096]
model.layers.31.self_attn.k_proj.weight -> blk.31.attn_k.weight | BF16 | [4096, 4096]
model.layers.31.self_attn.o_proj.weight -> blk.31.attn_output.weight | BF16 | [4096, 4096]
model.layers.31.self_attn.q_proj.weight -> blk.31.attn_q.weight | BF16 | [4096, 4096]
model.layers.31.self_attn.v_proj.weight -> blk.31.attn_v.weight | BF16 | [4096, 4096]
model.norm.weight -> output_norm.weight | BF16 | [4096]
Writing ch_taide_medicine.gguf-unsloth.F16.gguf, format 1
Traceback (most recent call last):
File "/GPUData/working/unsloth/", line 44, in
if True: model.save_pretrained_gguf("ch_taide_medicine.gguf", tokenizer, quantization_method = "quantized")
File "/home/chtseng/envs/LM2/lib/python3.10/site-packages/unsloth/", line 1333, in unsloth_save_pretrained_gguf
file_location = save_to_gguf(model_type, new_save_directory, quantization_method, first_conversion, makefile)
File "/home/chtseng/envs/LM2/lib/python3.10/site-packages/unsloth/", line 957, in save_to_gguf
raise RuntimeError(
RuntimeError: Unsloth: Quantization failed for ./ch_taide_medicine.gguf-unsloth.F16.gguf
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone
cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j
Once that's done, redo the quantization.

Copy link

@ch-tseng Sorry on the issue - you'll have to unfortunately manually convert it to GGUF via

Copy link

I'm unsure if the latest release fixes this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet

No branches or pull requests

2 participants