Skip to content

Ideogram

Vladimir Mandic edited this page Jun 6, 2026 · 2 revisions

Ideogram-4

Quote: Ideogram 4 is Ideogram's first open-weight text-to-image model. It is a state-of-the-art foundation model trained from scratch — not a fine-tune of any existing model. It introduces a structured JSON prompting interface, with multilingual text rendering, deep language understanding, explicit bounding-box layout and color-palette controls, and native 2k resolution images.

Variants

The original Ideogram-4 model was released in FP8 and NF4 variants. SD.Next ships it with BF16 weights so users can experiment with any type of quantization using native quantization-during-load features.

Warning

Ideogram-4 is a large model and aggressive quantization is highly recommended
Due to both RAM and VRAM requirements, it is not usable without quantization on most consumer hardware

Components

Ideogram-4 consists of:

  • text-encoder: qwen-3-vl-8b
  • transformer: primary 9b transformer
  • unconditional_transformer: second 9b transformer
  • vae: flux-2-vae

Note

A typical model uses the same UNet or transformer for both positive and negative guidance.

For Ideogram-4, there is no negative prompt as such. Instead, the second transformer denoises with zeroed text features. Using two separate transformers during each step enables stronger guidance and better prompt adherence, but it also raises VRAM requirements above typical models.

Tip

This can cause VRAM swapping if both transformers are not pinned to VRAM. Pin both transformers to VRAM if you have enough memory.

Experimental: Settings -> Model options -> Ideogram 4 -> Pin transformers to VRAM

Tip

SD.Next provides an experimental option to disable the second transformer. This saves VRAM, but quality drops and noise increases, so keep it enabled if you can afford it.

Experimental: Settings -> Model options -> Ideogram 4 -> Enable conditional guidance

Guidance

By default, Ideogram-4 uses adaptive guidance scheduling: 7.0 for the first 90% of steps, then 3.0 for the final steps

Tip

If guidance scale is 0 or 1, negative guidance is not calculated and the second transformer is not used. This is equivalent to running a TURBO-style model.

Steps

Ideogram-4 requires a large number of steps to generate good results
The authors' default is 48

Prompts

Warning

Ideogram-4 is not usable without its specific JSON-based prompt format.

Structure:

{
  "high_level_description": "",
  "compositional_deconstruction": {
    "background": "",
    "elements": [
      {
        "type": "obj",
        "desc": ""
      }
    ]
  }
}

Tip

SD.Next provides an LLM-based prompt enhancer that can rewrite a normal text prompt into Ideogram-4 JSON format
Or you can also write the JSON prompt yourself and pass it directly

Settings: Model options -> Ideogram 4 -> Enable prompt-enhance

This enables or disables the prompt enhancer, which uses the same Qwen-3-VL-8B model that Ideogram-4 uses for text encoding. The enhancer expands and rewrites prompts into Ideogram-4 JSON format. It is enabled by default because Ideogram-4 is not usable without a JSON prompt. It is not fast—approximately 30 seconds on an RTX 4090.

If you pass a detailed JSON prompt, the enhancer detects it and skips enhancement. If prompt enhance is disabled and you pass a non-JSON prompt, SD.Next converts it naively into Ideogram-4 JSON format.

Clone this wiki locally