About stable-diffusion-2-1 #14

Cococyh · 2023-02-06T01:43:40Z

Hi,

I apply your method on stable-diffusion-1.5, it works.
But when I load the pre-trained model stable-diffusion-2-1, I meet this error, it seems like the pipeline is not work.

Seed: 0
0%| | 0/50 [00:01<?, ?it/s]
Traceback (most recent call last):
File "/home/ubuntu/12T_1/szs/code/stable_diffusion/stable-diffusion-webui/Attend-and-Excite-diffusers/run.py", line 90, in
main()
File "/home/ubuntu/12T_1/szs/code/stable_diffusion/stable-diffusion-webui/venv/lib/python3.10/site-packages/pyrallis/argparsing.py", line 158, in wrapper_inner
response = fn(cfg, *args, **kwargs)
File "/home/ubuntu/12T_1/szs/code/stable_diffusion/stable-diffusion-webui/Attend-and-Excite-diffusers/run.py", line 73, in main
image = run_on_prompt(prompt=config.prompt,
File "/home/ubuntu/12T_1/szs/code/stable_diffusion/stable-diffusion-webui/Attend-and-Excite-diffusers/run.py", line 44, in run_on_prompt
outputs = model(prompt=prompt,
File "/home/ubuntu/12T_1/szs/code/stable_diffusion/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/ubuntu/12T_1/szs/code/stable_diffusion/stable-diffusion-webui/Attend-and-Excite-diffusers/pipeline_attend_and_excite.py", line 506, in call
max_attention_per_index = self._aggregate_and_get_max_attention_per_token(
File "/home/ubuntu/12T_1/szs/code/stable_diffusion/stable-diffusion-webui/Attend-and-Excite-diffusers/pipeline_attend_and_excite.py", line 224, in _aggregate_and_get_max_attention_per_token
attention_maps = aggregate_attention(
File "/home/ubuntu/12T_1/szs/code/stable_diffusion/stable-diffusion-webui/Attend-and-Excite-diffusers/utils/ptp_utils.py", line 232, in aggregate_attention
out = torch.cat(out, dim=0)
RuntimeError: torch.cat(): expected a non-empty list of Tensors

AttendAndExcite · 2023-02-06T13:10:03Z

Hi @Cococyh, thanks for your interest!
looks like the attention maps are not being saved by the controller.
we have a pending pull request to upgrade the version of diffusers that may help solve your issue.
if the issue persists after the PR is merged, we will try to help investigating this issue :)

Cococyh · 2023-02-07T01:12:00Z

Hi @AttendAndExcite, thanks for your reply!
I update diffusers to 0.12.1, and I find out I was using the stabilityai/stable-diffusion-2-1-inpainting model, when I replace it to stabilityai/stable-diffusion-2-1, it works well! :)
When I set (width, height) as (256, 256), it still works, and the result picture is 256*256, but set (width, height) as (768, 768), I meet this error, it seems like a little different from the previously error.

Traceback (most recent call last):
File "/home/ubuntu/12T_1/szs/code/stable_diffusion/stable-diffusion-webui/Attend-and-Excite-diffusers/run.py", line 90, in
main()
File "/home/ubuntu/12T_1/szs/code/stable_diffusion/stable-diffusion-webui/venv/lib/python3.10/site-packages/pyrallis/argparsing.py", line 158, in wrapper_inner
response = fn(cfg, *args, **kwargs)
File "/home/ubuntu/12T_1/szs/code/stable_diffusion/stable-diffusion-webui/Attend-and-Excite-diffusers/run.py", line 73, in main
image = run_on_prompt(prompt=config.prompt,
File "/home/ubuntu/12T_1/szs/code/stable_diffusion/stable-diffusion-webui/Attend-and-Excite-diffusers/run.py", line 44, in run_on_prompt
outputs = model(prompt=prompt,
File "/home/ubuntu/12T_1/szs/code/stable_diffusion/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/ubuntu/12T_1/szs/code/stable_diffusion/stable-diffusion-webui/Attend-and-Excite-diffusers/pipeline_attend_and_excite.py", line 506, in call
max_attention_per_index = self._aggregate_and_get_max_attention_per_token(
File "/home/ubuntu/12T_1/szs/code/stable_diffusion/stable-diffusion-webui/Attend-and-Excite-diffusers/pipeline_attend_and_excite.py", line 224, in _aggregate_and_get_max_attention_per_token
attention_maps = aggregate_attention(
File "/home/ubuntu/12T_1/szs/code/stable_diffusion/stable-diffusion-webui/Attend-and-Excite-diffusers/utils/ptp_utils.py", line 232, in aggregate_attention
out = torch.cat(out, dim=0)
RuntimeError: torch.cat(): expected a non-empty list of Tensors

XavierXiao · 2023-02-07T05:08:30Z

I think this is because when you have 768-sized image, there is not an attention dimension 16, you need to adjust the attention dimension (to 24 I guess) correspondingly.

The forked codes is runnable for SDv2, but it seems like it is not effective. I found that the iterative refinement will always reach the max iter and the loss does not change much. I am trying to investigate what's wrong.

AttendAndExcite · 2023-02-07T05:47:54Z

Hi @XavierXiao, and @Cococyh.
@Cococyh, as mentioned by @XavierXiao, if your model does not have 16x16 attention maps you would need to adjust to work on the most semantic attention resolution. I’d recommend visualizing all the attention resolutions separately and determining which gives the maps that correspond to the generated objects best.

@XavierXiao, which prompts did you work on? It could be an issue of expressiveness of Stable Diffusion, have you tried the prompts from the paper?

XavierXiao · 2023-02-07T06:12:58Z

I am on a smaller GPU so I cannot run 768x768 generation. So I try stable diffusion 2-base which is a 512*512 model, with prompt A frog and a pink bench which is in the paper. But I found the gradient to be very small so the iterative refinement will always reach the max _iter, and the sample will look very similar to the original SD's sample.

Since it is still a 512 model, I use the default hyper parameters.

Here is a screenshot. As you can see, I print out the grad norm and it is small. The loss does not decrease much.

XavierXiao · 2023-02-07T06:16:54Z

And the samples with (top) and without (bottom) A&E.

Maybe some configs need to be changed? You can have a try with the given SD version and prompt.

AttendAndExcite · 2023-02-07T06:28:03Z

Thanks for the additional information, @XavierXiao!
We will do our best to investigate the issue as well but in the meantime, here are some tips that may be useful in case you’d like to explore a solution:

Since this is a newly trained model, I’d take a look at the visualizations of the attention maps at different resolutions (to make sure the 16x16 maps are indeed still the most semantic ones).
I see that the model uses a different text encoder. We perform an additional Softmax operation to discard the <sot> token, it could be that this is not necessary with the ViT-H text encoder (not sure but the attention visualization should answer this question).
Related to the previous point- the attention normalization should result in generated subject tokens obtaining a maximal attention value close to 1, if this is not the case for successful generations, it could indicate that the attention normalization does not fit the model.
This is less likely, but it could be that you need to tune the hyperparameters for this case separately (you can try increasing the scale factor).

Cococyh · 2023-02-07T07:30:48Z

@AttendAndExcite @XavierXiao, As your say, I set attention dimension to 24 then it looks like I get a nice result, the result picture is 768*768.
The prompt is "a cat and a dog".

And this is the SD version result:

XavierXiao · 2023-02-07T07:58:38Z

@Cococyh Can you try a more complicated example like "a frog and a pink bench"? As I say above the loss hardly decrease.

Cococyh · 2023-02-07T08:15:12Z

@AttendAndExcite ,when use sd2.1-768, the token word such as frog's max attention is too small, but use sd1.5, the number is close to 1. Then I change the scale_factor to 100, it's not effective.

@XavierXiao

AttendAndExcite · 2023-02-07T10:14:17Z

@XavierXiao @Cococyh the loss appears to be quite high for the cat and the dog too- this indicates that it may be an issue of the normalization of the probabilities. We will do our best to look into this ASAP :)

XavierXiao · 2023-02-07T20:19:36Z

I found that by changing this line to attention_for_text = attention_maps[:, :, 1:8] improves the convergence a lot on SD2.1. This is just a stupid way to hard-code, ideally the 8 should be len(non_padded_token_ids). This makes sense because out of 77 tokens, there is a sot, a eot and the remaining ones are all padded, which should not be included in the computation. I think this should be done in SD v1 as well.

AttendAndExcite · 2023-02-07T20:25:19Z

Thanks @XavierXiao, this actually makes a lot of sense, as I mentioned above, since the model uses a different text encoder the attention values may vary. It is entirely possible that the attention value of <eot> is high for SD 2.1 therefore removing it and normalizing without it helps the attention values be closer to 1 and then the optimization is easier.
This was not the case for SD 1.4, so it didn’t matter.
I’m leaving this issue open and we will modify the code accordingly once we officially add support for SD 2.1

AttendAndExcite · 2023-03-01T20:44:07Z

Hi @Cococyh, and @XavierXiao, thanks for the discussion! Our code now officially supports SD 2.1 via the sd_2_1 parameter (see our README and generate_images.ipynb notebook for details).

AttendAndExcite closed this as completed Mar 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About stable-diffusion-2-1 #14

About stable-diffusion-2-1 #14

Cococyh commented Feb 6, 2023

AttendAndExcite commented Feb 6, 2023

Cococyh commented Feb 7, 2023 •

edited

XavierXiao commented Feb 7, 2023 •

edited

AttendAndExcite commented Feb 7, 2023

XavierXiao commented Feb 7, 2023 •

edited

XavierXiao commented Feb 7, 2023

AttendAndExcite commented Feb 7, 2023 •

edited

Cococyh commented Feb 7, 2023 •

edited

XavierXiao commented Feb 7, 2023

Cococyh commented Feb 7, 2023 •

edited

AttendAndExcite commented Feb 7, 2023

XavierXiao commented Feb 7, 2023 •

edited

AttendAndExcite commented Feb 7, 2023 •

edited

AttendAndExcite commented Mar 1, 2023

About stable-diffusion-2-1 #14

About stable-diffusion-2-1 #14

Comments

Cococyh commented Feb 6, 2023

AttendAndExcite commented Feb 6, 2023

Cococyh commented Feb 7, 2023 • edited

XavierXiao commented Feb 7, 2023 • edited

AttendAndExcite commented Feb 7, 2023

XavierXiao commented Feb 7, 2023 • edited

XavierXiao commented Feb 7, 2023

AttendAndExcite commented Feb 7, 2023 • edited

Cococyh commented Feb 7, 2023 • edited

XavierXiao commented Feb 7, 2023

Cococyh commented Feb 7, 2023 • edited

AttendAndExcite commented Feb 7, 2023

XavierXiao commented Feb 7, 2023 • edited

AttendAndExcite commented Feb 7, 2023 • edited

AttendAndExcite commented Mar 1, 2023

Cococyh commented Feb 7, 2023 •

edited

XavierXiao commented Feb 7, 2023 •

edited

XavierXiao commented Feb 7, 2023 •

edited

AttendAndExcite commented Feb 7, 2023 •

edited

Cococyh commented Feb 7, 2023 •

edited

Cococyh commented Feb 7, 2023 •

edited

XavierXiao commented Feb 7, 2023 •

edited

AttendAndExcite commented Feb 7, 2023 •

edited