Support different-length pos/neg prompts for FLUX.1-schnell variants like Chroma #11120

josephrocca · 2025-03-20T05:49:36Z

What does this PR do?

Context:

Chroma is a large-scale Apache 2.0 fine-tune of FLUX.1 Schnell. It is currently one of the top trending text-to-image models, and has been for several days now:

Someone recently asked about diffusers support:

https://huggingface.co/lodestones/Chroma/discussions/5

I've currently got it working in diffusers:

https://gist.github.com/josephrocca/385d9868ac52ea6f854b3ab96ec0ad25

but as you can see from the comments at the top of that script, it requires a couple of changes to diffusers source code for it to work out of the box.

Changes:

One such change is due to Chroma requiring masking/truncation of prompts (all but the final padding token).

Currently diffusers requires that prompts are the same length, since it assumes that the full 512 T5 tokens will be used for both positive and negative prompts.

So check_inputs blocks it, and if we remove that check, then we get this error:

  File "/opt/conda/lib/python3.11/site-packages/diffusers/pipelines/flux/pipeline_flux.py", line 904, in __call__
    neg_noise_pred = self.transformer(
                     ^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/accelerate/hooks.py", line 176, in new_forward
    output = module._old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/diffusers/models/transformers/transformer_flux.py", line 522, in forward
    encoder_hidden_states, hidden_states = block(
                                           ^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/diffusers/models/transformers/transformer_flux.py", line 180, in forward
    attention_outputs = self.attn(
                        ^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/diffusers/models/attention_processor.py", line 588, in forward
    return self.processor(
           ^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/diffusers/models/attention_processor.py", line 2318, in __call__
    query = apply_rotary_emb(query, image_rotary_emb)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/diffusers/models/embeddings.py", line 1208, in apply_rotary_emb
    out = (x.float() * cos + x_rotated.float() * sin).to(x.dtype)
           ~~~~~~~~~~^~~~~

So we need to pass the negative prompt ids into the negative prompt forward pass, instead of passing the positive prompt ids into both.

Who can review?

@yiyixuxu @sayakpaul @DN6

…like Chroma

sayakpaul · 2025-03-20T06:39:36Z

Thanks for your PR!

Before we get to reviewing the PR, could you please provide some side-by-side results with Schnell and Chroma on the same inputs (including the seeds)?

asomoza · 2025-03-20T07:17:07Z

there some more context in this issue: #11010

P.S.: I tested V13

josephrocca · 2025-03-20T16:56:31Z

could you please provide some side-by-side results with Schnell and Chroma on the same inputs (including the seeds)?

Oh, sure thing, see links below for some comparison grids - but some quick notes:

If you're wondering how different/trained/diverged Chroma is from Schnell, these pics may be useful.
If you're looking to understand how "high quality" or "aesthetic" Chroma is, then these will not be particularly useful, since Chroma is still training, and has almost entirely been trained on 512²px images so far, with no post training preference tuning. I think it's not ready to be compared on aesthetics or fine details at this point.
I used ChatGPT 4.5 to generate the prompts used for these: https://chatgpt.com/share/67dc3cb0-fbb4-8007-a661-0184968418ad

Image Grids:

Chroma v15
Schnell
Chroma v15 + aesthetic keywords - a crude attempt to replicate aesthetic/preference tuning using some aesthetics-related keywords before and after the prompt. See the commented notes in this gist for the prompt I used.

Also, if you skim the above grids, note that some of the images from Schnell look quite "clean" and coherent, and this is definitely an advantage that Schnell currently has, but note that in some cases Chroma should arguably win based on the style specified in the prompt ("courtroom sketch"):

Compared to Chroma:

And Chroma with aesthetic keywords to try to emulate aesthetic tuning that Schnell has:

You can see that although Schnell's is cleaner (and arguably slightly more coherent, though sample size is a bit small here), Chroma is definitely more faithful to the style specified in the prompt.

Also note that, as with other models that haven't had CFG baked in, you can get entirely different 'vibes' by tweaking Chroma's CFG - above I've used 5, with 20 steps (lodestone is currently doing some small-scale experiments as a precursor to a few-step lora for chroma).

My experience so far with testing Chroma is that it has a lot more "soul" than Schnell and Dev - it's quite fun to play with.

nitinmukesh · 2025-03-20T17:07:17Z

Awesome. I looked at some of the outputs, Anubis one is amazing among many others.

josephrocca · 2025-03-20T18:03:41Z

Side note: Playing with the official Chroma ComfyUI workflow just now with v15, I noticed that there are some potential differences in quality/coherence compared to my diffusers code which generated the above images - e.g. notice the alignment to the "bored human judge" in this seed=0 image with Chroma, which was less evident in the above examples:

So please take the above example images with a pinch of salt - Chroma quality may be better than what I've shown here. It could be due to quantization, or subtitles around ComfyUI sampling. I'd need to take bigger sample sizes to know if ComfyUI outputs are actually better, but I'm going to sleep now :)

hlky

Thanks @josephrocca

HuggingFaceDocBuilderDev · 2025-03-21T13:59:37Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

josephrocca · 2025-03-22T18:12:29Z

I'm not sure about the conventions in diffusers, but since prompt truncation is equivalent to prompt masking, I wonder whether it'd be worth also/instead supporting masking for flux?

This is working code, inserted here in transformer_flux.py:

if joint_attention_kwargs is not None and "encoder_attention_mask" in joint_attention_kwargs and joint_attention_kwargs["encoder_attention_mask"] is not None:
    encoder_attention_mask = joint_attention_kwargs.pop("encoder_attention_mask")
    max_seq_length = encoder_hidden_states.shape[1]
    seq_length = encoder_attention_mask.sum(dim=-1)
    batch_size = encoder_attention_mask.shape[0]
    encoder_attention_mask_with_padding = encoder_attention_mask.clone()
    for i in range(batch_size):
        current_seq_len = int(seq_length[i].item())
        if current_seq_len < max_seq_length:
            available_padding = max_seq_length - current_seq_len
            tokens_to_unmask = min(1, available_padding) # unmask one of the padding tokens
            encoder_attention_mask_with_padding[i, current_seq_len : current_seq_len + tokens_to_unmask] = 1
    attention_mask = torch.cat(
        [
            encoder_attention_mask_with_padding,
            torch.ones([hidden_states.shape[0], hidden_states.shape[1]], device=encoder_attention_mask.device),
        ],
        dim=1,
    )
    attention_mask = attention_mask.float().T @ attention_mask.float()
    attention_mask = (
        attention_mask[None, None, ...]
        .repeat(encoder_hidden_states.shape[0], self.config.num_attention_heads, 1, 1)
        .int()
        .bool()
    )
    joint_attention_kwargs["attention_mask"] = attention_mask

bghira · 2025-03-23T00:52:03Z

can you demonstrate an example where zeroing the end of the prompt is equivalent to attention masking where the softmax scores for padding sequence is near -infinity?

github-actions · 2025-04-19T15:02:47Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Support different-length pos/neg prompts for FLUX.1-schnell variants …

5db5ec9

…like Chroma

sayakpaul requested a review from yiyixuxu March 20, 2025 06:39

josephrocca mentioned this pull request Mar 20, 2025

Support Chroma - Flux based model with architecture changes #11010

Closed

hlky approved these changes Mar 21, 2025

View reviewed changes

github-actions bot added the stale Issues that haven't received updates label Apr 19, 2025

yiyixuxu removed the stale Issues that haven't received updates label Apr 21, 2025

yiyixuxu approved these changes Apr 21, 2025

View reviewed changes

Merge branch 'main' into patch-1

3f8267d

yiyixuxu merged commit a00c73a into huggingface:main Apr 21, 2025
12 checks passed

nitinmukesh mentioned this pull request May 14, 2025

Please support Chroma is a 8.9 billion parameter rectified flow transformer capable of generating images from text descriptions. Based on FLUX.1 [schnell] with heavy architectural modifications. mit-han-lab/nunchaku#167

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support different-length pos/neg prompts for FLUX.1-schnell variants like Chroma #11120

Support different-length pos/neg prompts for FLUX.1-schnell variants like Chroma #11120

Uh oh!

josephrocca commented Mar 20, 2025 •

edited

Loading

Uh oh!

sayakpaul commented Mar 20, 2025

Uh oh!

asomoza commented Mar 20, 2025 •

edited

Loading

Uh oh!

josephrocca commented Mar 20, 2025 •

edited

Loading

Uh oh!

nitinmukesh commented Mar 20, 2025

Uh oh!

josephrocca commented Mar 20, 2025 •

edited

Loading

Uh oh!

hlky left a comment

Uh oh!

HuggingFaceDocBuilderDev commented Mar 21, 2025

Uh oh!

josephrocca commented Mar 22, 2025

Uh oh!

bghira commented Mar 23, 2025

Uh oh!

github-actions bot commented Apr 19, 2025

Uh oh!

Uh oh!

Uh oh!

Support different-length pos/neg prompts for FLUX.1-schnell variants like Chroma #11120

Support different-length pos/neg prompts for FLUX.1-schnell variants like Chroma #11120

Uh oh!

Conversation

josephrocca commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Context:

Changes:

Who can review?

Uh oh!

sayakpaul commented Mar 20, 2025

Uh oh!

asomoza commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

josephrocca commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Image Grids:

Uh oh!

nitinmukesh commented Mar 20, 2025

Uh oh!

josephrocca commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hlky left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Mar 21, 2025

Uh oh!

josephrocca commented Mar 22, 2025

Uh oh!

bghira commented Mar 23, 2025

Uh oh!

github-actions bot commented Apr 19, 2025

Uh oh!

Uh oh!

Uh oh!

josephrocca commented Mar 20, 2025 •

edited

Loading

asomoza commented Mar 20, 2025 •

edited

Loading

josephrocca commented Mar 20, 2025 •

edited

Loading

josephrocca commented Mar 20, 2025 •

edited

Loading