Skip to content

Ideogram

Vladimir Mandic edited this page Jun 8, 2026 · 2 revisions

Ideogram-4

Quote: Ideogram 4 is Ideogram's first open-weight text-to-image model. It is a state-of-the-art foundation model trained from scratch — not a fine-tune of any existing model. It introduces a structured JSON prompting interface, with multilingual text rendering, deep language understanding, explicit bounding-box layout and color-palette controls, and native 2k resolution images.

Variants

The original Ideogram-4 model was released in pre-quantized FP8 and NF4 variants only
SD.Next ships Ideogram-4 with

  • BF16 weights so users can experiment with any type of quantization using native quantization-during-load features.
  • SDNQ UINT4 pre-quantized weights for users who want to use it immediately without experimenting with quantization.

Both variants are available in SD.Next networks -> reference models for immediate use.

Warning

Ideogram-4 is a large model and aggressive quantization is highly recommended
Due to both RAM and VRAM requirements, it is not usable without quantization on most consumer hardware

Resolution

Ideogram-4 is trained on 2MP images and can generate native 2k resolution without tiling or upscaling given the sufficient VRAM.

Components

Ideogram-4 consists of:

  • text-encoder: qwen-3-vl-8b
  • transformer: primary 9b transformer
  • unconditional_transformer: second 9b transformer
  • vae: flux-2-vae

Note

A typical model uses the same UNet or transformer for both positive and negative guidance.

For Ideogram-4, there is no negative prompt as such. Instead, the second transformer denoises with zeroed text features. Using two separate transformers during each step enables stronger guidance and better prompt adherence, but it also raises VRAM requirements above typical models.

Tip

This can cause VRAM swapping if both transformers are not pinned to VRAM. Pin both transformers to VRAM if you have enough memory.

Experimental: Settings -> Model options -> Ideogram 4 -> Pin transformers to VRAM

Tip

SD.Next provides an experimental option to disable the second transformer. This saves VRAM, but quality drops and noise increases, so keep it enabled if you can afford it.

Experimental: Settings -> Model options -> Ideogram 4 -> Enable conditional guidance

Guidance

By default, Ideogram-4 uses adaptive guidance scheduling: 7.0 for the first 90% of steps, then 3.0 for the final steps

Tip

If guidance scale is 0 or 1, negative guidance is not calculated and the second transformer is not used. This is equivalent to running a TURBO-style model.

Steps

Ideogram-4 requires a large number of steps to generate good results
The authors' default is 48

Prompts

Warning

Ideogram-4 is not usable without its specific JSON-based prompt format.

Structure:

{
  "high_level_description": "",
  "compositional_deconstruction": {
    "background": "",
    "elements": [
      {
        "type": "obj",
        "desc": ""
      }
    ]
  }
}

Tip

SD.Next provides an LLM-based prompt enhancer that can rewrite a normal text prompt into Ideogram-4 JSON format
Or you can also write the JSON prompt yourself and pass it directly

Settings: Model options -> Ideogram 4 -> Enable prompt-enhance

This enables or disables the prompt enhancer, which uses the same Qwen-3-VL-8B model that Ideogram-4 uses for text encoding with an additional 1.1GB lm_head
The enhancer expands and rewrites prompts into Ideogram-4 JSON format
It is enabled by default because Ideogram-4 is not usable without a JSON prompt
It is not fast—approximately 30 seconds on an RTX4090

If you pass a detailed JSON prompt, the enhancer detects it and skips enhancement
If prompt enhance is disabled and you pass a non-JSON prompt, SD.Next converts it naively into Ideogram-4 JSON format

Clone this wiki locally