-
-
Notifications
You must be signed in to change notification settings - Fork 561
Ideogram
Quote: Ideogram 4 is Ideogram's first open-weight text-to-image model. It is a state-of-the-art foundation model trained from scratch — not a fine-tune of any existing model. It introduces a structured JSON prompting interface, with multilingual text rendering, deep language understanding, explicit bounding-box layout and color-palette controls, and native 2k resolution images.
The original Ideogram-4 model was released in pre-quantized FP8 and NF4 variants only
SD.Next ships Ideogram-4 with
- BF16 weights so users can experiment with any type of quantization using native quantization-during-load features.
- SDNQ UINT4 pre-quantized weights for users who want to use it immediately without experimenting with quantization.
Both variants are available in SD.Next networks -> reference models for immediate use.
Warning
Ideogram-4 is a large model and aggressive quantization is highly recommended
Due to both RAM and VRAM requirements, it is not usable without quantization on most consumer hardware
Ideogram-4 is trained on 2MP images and can generate native 2k resolution without tiling or upscaling given the sufficient VRAM.
Ideogram-4 consists of:
-
text-encoder: qwen-3-vl-8b -
transformer: primary 9b transformer -
unconditional_transformer: second 9b transformer -
vae: flux-2-vae
Note
A typical model uses the same UNet or transformer for both positive and negative guidance.
For Ideogram-4, there is no negative prompt as such. Instead, the second transformer denoises with zeroed text features. Using two separate transformers during each step enables stronger guidance and better prompt adherence, but it also raises VRAM requirements above typical models.
Tip
This can cause VRAM swapping if both transformers are not pinned to VRAM. Pin both transformers to VRAM if you have enough memory.
Experimental: Settings -> Model options -> Ideogram 4 -> Pin transformers to VRAM
Tip
SD.Next provides an experimental option to disable the second transformer. This saves VRAM, but quality drops and noise increases, so keep it enabled if you can afford it.
Experimental: Settings -> Model options -> Ideogram 4 -> Enable conditional guidance
By default, Ideogram-4 uses adaptive guidance scheduling: 7.0 for the first 90% of steps, then 3.0 for the final steps
Tip
If guidance scale is 0 or 1, negative guidance is not calculated and the second transformer is not used. This is equivalent to running a TURBO-style model.
Ideogram-4 requires a large number of steps to generate good results
The authors' default is 48
Warning
Ideogram-4 is not usable without its specific JSON-based prompt format.
Structure:
{
"high_level_description": "",
"compositional_deconstruction": {
"background": "",
"elements": [
{
"type": "obj",
"desc": ""
}
]
}
}Tip
SD.Next provides an LLM-based prompt enhancer that can rewrite a normal text prompt into Ideogram-4 JSON format
Or you can also write the JSON prompt yourself and pass it directly
Settings: Model options -> Ideogram 4 -> Enable prompt-enhance
This enables or disables the prompt enhancer, which uses the same Qwen-3-VL-8B model that Ideogram-4 uses for text encoding with an additional 1.1GB lm_head
The enhancer expands and rewrites prompts into Ideogram-4 JSON format
It is enabled by default because Ideogram-4 is not usable without a JSON prompt
It is not fast—approximately 30 seconds on an RTX4090
If you pass a detailed JSON prompt, the enhancer detects it and skips enhancement
If prompt enhance is disabled and you pass a non-JSON prompt, SD.Next converts it naively into Ideogram-4 JSON format