GGUF model wont save out (tried mulitple fixes) #3537

Joshua-Nolan · 2025-10-31T13:45:19Z

Joshua-Nolan
Oct 31, 2025

Hi everyone, very new to unsloth.

I taken this notebook and edited the dataset to read in my custom one, works fine for that and the inferencing works

BUT when i go to save out i get the following error:

`Unsloth: Preparing converter script...
INFO:unsloth_zoo.llama_cpp: Unsloth: Identifying llama.cpp gguf supported architectures...
ERROR:unsloth_zoo.llama_cpp: Unsloth: Error during download or introspection of original script: Failed to execute module convert_hf_to_gguf_original_gguf_yaxzp8q5 from /workspace/unsloth-notebooks/llama.cpp/original_gguf_yaxzp8q5.py
Traceback (most recent call last):
File "/opt/conda/lib/python3.11/site-packages/unsloth_zoo/llama_cpp.py", line 490, in _load_module_from_path
spec.loader.exec_module(module)
File "", line 940, in exec_module
File "", line 241, in _call_with_frames_removed
File "/workspace/unsloth-notebooks/llama.cpp/original_gguf_yaxzp8q5.py", line 4157, in
class Qwen3VLTextModel(Qwen3Model):
File "/workspace/unsloth-notebooks/llama.cpp/original_gguf_yaxzp8q5.py", line 4158, in Qwen3VLTextModel
model_arch = gguf.MODEL_ARCH.QWEN3VL
^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/enum.py", line 786, in getattr
raise AttributeError(name) from None
AttributeError: QWEN3VL

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/opt/conda/lib/python3.11/site-packages/unsloth_zoo/llama_cpp.py", line 535, in _download_convert_hf_to_gguf
module = _load_module_from_path(temp_original_file_path, original_module_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/unsloth_zoo/llama_cpp.py", line 494, in _load_module_from_path
raise ImportError(f"Failed to execute module {module_name} from {filepath}") from e
ImportError: Failed to execute module convert_hf_to_gguf_original_gguf_yaxzp8q5 from /workspace/unsloth-notebooks/llama.cpp/original_gguf_yaxzp8q5.py

AttributeError Traceback (most recent call last)
File /opt/conda/lib/python3.11/site-packages/unsloth_zoo/llama_cpp.py:490, in _load_module_from_path(filepath, module_name)
489 try:
--> 490 spec.loader.exec_module(module)
491 except Exception as e:
492 # Clean up registry if exec fails

File :940, in exec_module(self, module)

File :241, in _call_with_frames_removed(f, *args, **kwds)

File /workspace/unsloth-notebooks/llama.cpp/original_gguf_yaxzp8q5.py:4157
4153 return super().modify_tensors(data_torch, name, bid)
4156 @ModelBase.register("Qwen3VLForConditionalGeneration")
-> 4157 class Qwen3VLTextModel(Qwen3Model):
4158 model_arch = gguf.MODEL_ARCH.QWEN3VL

File /workspace/unsloth-notebooks/llama.cpp/original_gguf_yaxzp8q5.py:4158, in Qwen3VLTextModel()
4156 @ModelBase.register("Qwen3VLForConditionalGeneration")
4157 class Qwen3VLTextModel(Qwen3Model):
-> 4158 model_arch = gguf.MODEL_ARCH.QWEN3VL
4160 def set_gguf_parameters(self):

File /opt/conda/lib/python3.11/enum.py:786, in EnumType.getattr(cls, name)
785 except KeyError:
--> 786 raise AttributeError(name) from None

AttributeError: QWEN3VL

The above exception was the direct cause of the following exception:

ImportError Traceback (most recent call last)
File /opt/conda/lib/python3.11/site-packages/unsloth_zoo/llama_cpp.py:535, in _download_convert_hf_to_gguf(name)
534 try:
--> 535 module = _load_module_from_path(temp_original_file_path, original_module_name)
536 finally:
537 # Restore environment

File /opt/conda/lib/python3.11/site-packages/unsloth_zoo/llama_cpp.py:494, in _load_module_from_path(filepath, module_name)
493 del sys.modules[module_name]
--> 494 raise ImportError(f"Failed to execute module {module_name} from {filepath}") from e
495 return module

ImportError: Failed to execute module convert_hf_to_gguf_original_gguf_yaxzp8q5 from /workspace/unsloth-notebooks/llama.cpp/original_gguf_yaxzp8q5.py

The above exception was the direct cause of the following exception:

RuntimeError Traceback (most recent call last)
File /opt/conda/lib/python3.11/site-packages/unsloth/save.py:1835, in unsloth_save_pretrained_gguf(self, save_directory, tokenizer, quantization_method, first_conversion, push_to_hub, token, private, is_main_process, state_dict, save_function, max_shard_size, safe_serialization, variant, save_peft_format, tags, temporary_location, maximum_memory_usage)
1834 try:
-> 1835 all_file_locations, want_full_precision, is_vlm_update = save_to_gguf(
1836 model_name=model_name,
1837 model_type=model_type,
1838 model_dtype=model_dtype,
1839 is_sentencepiece=False,
1840 model_directory=save_directory,
1841 quantization_method=quantization_methods,
1842 first_conversion=first_conversion,
1843 is_vlm=is_vlm, # Pass VLM flag
1844 is_gpt_oss = is_gpt_oss, # Pass gpt_oss Flag
1845 )
1846 except Exception as e:

File /opt/conda/lib/python3.11/site-packages/unsloth/save.py:1093, in save_to_gguf(model_name, model_type, model_dtype, is_sentencepiece, model_directory, quantization_method, first_conversion, is_vlm, is_gpt_oss)
1092 with use_local_gguf():
-> 1093 converter_path, supported_text_archs, supported_vision_archs = _download_convert_hf_to_gguf()
1095 # Step 3: Initial GGUF conversion

File /opt/conda/lib/python3.11/site-packages/unsloth_zoo/llama_cpp.py:598, in _download_convert_hf_to_gguf(name)
597 except OSError as remove_error: logger.warning(f"Could not remove temp file {temp_original_file_path}: {remove_error}")
--> 598 raise RuntimeError(f"Failed during download/introspection of original script: {e}") from e
599 finally:

RuntimeError: Failed during download/introspection of original script: Failed to execute module convert_hf_to_gguf_original_gguf_yaxzp8q5 from /workspace/unsloth-notebooks/llama.cpp/original_gguf_yaxzp8q5.py

During handling of the above exception, another exception occurred:

RuntimeError Traceback (most recent call last)
Cell In[17], line 9
6 if False: model.push_to_hub_gguf("hf/model", tokenizer, token = "")
8 # Save to 16bit GGUF
----> 9 if True: model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16")
10 if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "f16", token = "")
11 print("Model Saved")

File /opt/conda/lib/python3.11/site-packages/unsloth/save.py:1855, in unsloth_save_pretrained_gguf(self, save_directory, tokenizer, quantization_method, first_conversion, push_to_hub, token, private, is_main_process, state_dict, save_function, max_shard_size, safe_serialization, variant, save_peft_format, tags, temporary_location, maximum_memory_usage)
1848 raise RuntimeError(
1849 f"Unsloth: GGUF conversion failed in Kaggle environment.\n"
1850 f"This is likely due to the 20GB disk space limit.\n"
1851 f"Try saving to /tmp directory or use a smaller model.\n"
1852 f"Error: {e}"
1853 )
1854 else:
-> 1855 raise RuntimeError(f"Unsloth: GGUF conversion failed: {e}")
1857 # Step 9: Create Ollama modelfile
1858 modelfile_location = None

RuntimeError: Unsloth: GGUF conversion failed: Failed during download/introspection of original script: Failed to execute module convert_hf_to_gguf_original_gguf_yaxzp8q5 from /workspace/unsloth-notebooks/llama.cpp/original_gguf_yaxzp8q5.py`

I'm using the Docker image on windows 11 with wsl - i have edited my docker setting in docker engine to have defaultKeepStorage to 100gb (it was 20 gb before) really not sure why its happening and very stuck I've tried other fixes from this discussions page but none work. I really would appreciate any help with this it's the last hurdle for me

Santhoshty · 2025-11-04T06:23:19Z

Santhoshty
Nov 4, 2025

I've been having this issue as well in Colab. I get an output after training fine but the GGUF file doesn't show up. Says something along the lines of "model not found"

3 replies

rolandtannous Nov 5, 2025

which exact notebook?

Santhoshty Nov 6, 2025

https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Alpaca.ipynb#scrollTo=upcOlWe7A1vc

No changes were made to the notebook always stalls at "Unsloth: install GGUF and other packages"

rolandtannous Nov 6, 2025

oh the reason this is happening, is that your session on colab is actually crashing because it's running out of memory, so the model object is no longer in memory. I am already investigating and troubleshooting that specifc issue. Once i resolve it , you'll be good to go on colab. I'll ping you , once i do.

rolandtannous · 2025-11-05T07:06:43Z

rolandtannous
Nov 5, 2025

@nolan-josh apologies for the issues you're having.
I'd appreciate if you could join our discord https://discord.com/invite/unsloth and ping me there., My nickname is the same as in github, We could possibly jointly try to troubleshoot and resove the particular issue you are having as i don't have access to a windows machine to be able to reproduce your error.

0 replies

altbodhi · 2025-11-25T03:02:28Z

altbodhi
Nov 25, 2025

I check memory usage during merge. It not grow. For example Llama3_(8B)_Ollama with 2 NVIDIA GeForce RTX 4070 (2866MiB and
4260MiB). My dataset very small less 400 items and default model.

[unsloth_zoo.llama_cpp|ERROR]Unsloth: Error during download or introspection of original script: Failed to execute module convert_hf_to_gguf_original_gguf_0pm8f04b

class RND1Model(Qwen2MoeModel):
File "llama.cpp/original_gguf_0pm8f04b.py", line 4188, in RND1Model
model_arch = gguf.MODEL_ARCH.RND1
^^^^^^^^^^^^^^^^^^^^

AttributeError: type object 'MODEL_ARCH' has no attribute 'RND1'

model_name = "unsloth/llama-3-8b-bnb-4bit", device_map="auto"

10 replies

altbodhi Nov 25, 2025

Yesterday, i train and export to ollama success unsloth/DeepSeek-R1-Distill-Llama-8B-bnb-4bit.(but without modelfile)
Now i return to this script and run with unsloth/llama-3-8b-bnb-4bit.

Tell me, please. I need do any actions after trainer_stats = trainer.train() befor export to ollama format?

altbodhi Nov 25, 2025

Unsloth: ##### The current model auto adds a BOS token.
Unsloth: ##### Your chat template has a BOS token. We shall remove it temporarily.
Unsloth: Merging model weights to 16-bit format...
Found HuggingFace hub cache directory: /home/user/.cache/huggingface/hub
Checking cache directory for required files...
Cache check failed: model-00001-of-00004.safetensors not found in local cache.
Not all required files found in cache. Will proceed with downloading.
Checking cache directory for required files...
Cache check failed: tokenizer.model not found in local cache.
Not all required files found in cache. Will proceed with downloading.
Unsloth: Preparing safetensor model files: 100%|██████████| 4/4 [00:00<00:00, 9532.51it/s]
Note: tokenizer.model not found (this is OK for non-SentencePiece models)
Unsloth: Merging weights into 16bit: 100%|██████████| 4/4 [02:49<00:00, 42.31s/it]
Unsloth: Merge process complete. Saved to `/home/user/proba2/model`
Unsloth: Converting to GGUF format...
==((====))==  Unsloth: Conversion from HF to GGUF information
   \\   /|    [0] Installing llama.cpp might take 3 minutes.
O^O/ \_/ \    [1] Converting HF to GGUF bf16 might take 3 minutes.
\        /    [2] Converting GGUF bf16 to ['q4_k_m'] might take 10 minutes each.
 "-____-"     In total, you will have to wait at least 16 minutes.

Unsloth: llama.cpp found in the system. Skipping installation.
Unsloth: Preparing converter script...
[unsloth_zoo.llama_cpp|ERROR]Unsloth: Error during download or introspection of original script: Failed to execute module convert_hf_to_gguf_original_gguf_rr7dz6nu

I do not understand why model during export downloads to ./model but seek in local cache

altbodhi Nov 25, 2025

do you mind manually removing the llama.cpp folder and trying again?

It not finish but i think it's ok. But i made mistake by install ollama from script )) because it already install from snap. how to revert installation?

altbodhi Nov 25, 2025

okey, remove llama.cpp from notebook location resolve export to ollama. But Modelfile does not create (this is my 2th export). Any idea why?

rolandtannous Nov 25, 2025

oh that's because we don't have an ollama Modelfile template for DeepSeek in our modelfile mapper. I'll look into adding it.

Uh oh!

GGUF model wont save out (tried mulitple fixes) #3537

Uh oh!

Replies: 3 comments · 13 replies

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Replies: 3 comments 13 replies