Description
With the default VAE tiling, some image generations can seemingly be corrupted by previously generated images.
A test with plain text2img, model cyberrealisticPony_semiRealV30, DMD LoRA at 1.0, LCM 8 steps, CFG 1, Seed 1, 1024x640, VAE tiling on, in sequence (no batch, just clicking Generate repeatedly):
- prompt: "car", 5 generations:
- prompt: "forest", 1 generation:
- prompt "forest", 2 generations:
- prompt "forest", 1 generation:
These were generated on rocm build a7706be , but I see the same behavior with Vulkan on the 1.93.2 build.
I can't reproduce it when disabling VAE tiling, so this could be related to leejet/stable-diffusion.cpp#588 (but I get a warning "Requested buffer size (4362076160) exceeds device memory allocation limit (4294967292)!" when disabling VAE tiling, so I don't know if I can really trust this test).
Metadata
Metadata
Assignees
Labels
No labels
Activity
LostRuins commentedon Jun 15, 2025
Yes, this is a known issue that seems hardware specific.
Goes back to at least 1.78, months ago. We were never able to find out why, but not everyone gets it.
What are your hardware and system specs?
wbruna commentedon Jun 16, 2025
Yeah, I was able to hit this bug with @stduhpf 's simple web server too, so those VAE fixes are not enough.
On the other hand, I need just three or four attempts to reproduce it with VAE (or TAESD) tiling, while I can't reproduce it at all with normal VAE.
On Vulkan:
AMD Radeon RX 7600 XT (RADV NAVI33) (radv) | uma: 0 | fp16: 1 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
Happens on ROCm too. The specs:
Device 0: AMD Radeon RX 7600 XT, gfx1100 (0x1100), VMM: no, Wave Size: 32
, running withHSA_OVERRIDE_GFX_VERSION=11.0.0
.16G VRAM, 40G RAM, Linux kernel 6.12.27, amdgpu from mainline. Nothing else running on the card (display is on the iGPU).
LostRuins commentedon Jun 17, 2025
pararace was using a 4060Ti, so its not AMD specific.
stduhpf commentedon Jun 17, 2025
@wbruna Can you give me setps to reproduce it with my server? I can't make it happen.
Also does stduhpf/stable-diffusion.cpp@e201588 fix it?
wbruna commentedon Jun 17, 2025
For me, it's is (or was) enough to render a large-ish image (like 1024x576) repeated times, with VAE or TAESD tiling, to eventually hit it.
Interesting. That may have fixed it, thanks! I didn't hit the bug again for some 30 or 40 renders. So, perhaps it's that zero-filling that's hardware or system specific? But why would that only affect tiling...
There may be another initialization bug lurking somewhere else. Running a generation for a second time (same seed, and a non-random sampler), the second image changes very slightly, in a few details (easy to see on the simple server, because it keeps showing the previous image until the new one is displayed on its place). Afterwards, new renders are repetitions of that second one.
stduhpf commentedon Jun 17, 2025
Tiling works by adding the decoded tiles to the output buffer one after the other, rather than completely overwriting the content. This is done to be able to smoothly blend between neighboring tiles. So it makes sense that if the output buffer is not empty at the beginning, whatever was in there will interfere with the decoded image.
What's stranger to me is why isn't it happening to everyone more often? Because I used my server a lot and never had this issue. I was even sure that creating a new tensor would automatically initialize it to 0, but maybe it's just undefined behavior.
wbruna commentedon Jun 17, 2025
Yeah, that makes sense.
Looking at
ggml.c
, the memory pool initialization ends up just callingmalloc
. So, at least on Linux, it depends on glibc's policy for that pool size: it could come directly from mmap (so the OS zeroes it out), or it could come from the process heap (so it may reuse a previously free'd area). If, for instance, the system is configured to return memory aggressively to the OS, it could completely avoid reusing the process heap for that memory pool.wbruna commentedon Jun 19, 2025
@LostRuins , I'm reusing this VAE-related issue to avoid polluting the Chroma PR.
Koboldcpp 924dfa7 , with model Fluently V4 LCM, seed 2, prompt "clear blue sky, few clouds", width 960, height 640, 10 steps, VAE tiling:
That darker "band" (or a lighter one) should be very noticeable with pretty much any generation with a 960 pixels side and areas of uniform color.
Same parameters on sd.cpp, with the proposed fix for leejet/stable-diffusion.cpp#588 :
For comparison, the version without tiling:
Of course, the "fixed" image still has a few visible banded artifacts; I've chosen kind of a worst case for tiling, just to make the effect more evident.
merged leejet/stable-diffusion.cpp#588 to fix vae tiling, ref #1603
LostRuins commentedon Jun 20, 2025
Fair enough, merged the fix. Tested seems to work.
LostRuins commentedon Jun 21, 2025
Please try v1.94 which should be fixed.
wbruna commentedon Jun 21, 2025
1.94 seems to be working fine! No noisy or ghost images, no dark band, for some 30 generations. Tested with both Vulkan and ROCm. Thanks @LostRuins and @stduhpf !