# Phase 3: Image Generation
## Generate 100 images: Base SDXL vs IRG (2, 3, 4 iterations)

**GPU Required:** T4 x2
**Internet:** ON
**Runtime:** ~6-8 hours

**Comparing 4 variants:**
1. ⚪ Base SDXL (no reasoning)
2. 🟥 IRG Fine-tuned Qwen (2 iterations)
3. 🟥 IRG Fine-tuned Qwen (3 iterations)
4. 🟥 IRG Fine-tuned Qwen (4 iterations)

**Outputs:**
- 100 images × 4 variants = **400 images total**
- Uses incremental refinement (2→3→4) for efficiency

**Next step:** Run benchmarking notebook to evaluate all 5 metrics

In [1]:
# ============================================================
# STEP 1: Install Dependencies
# ============================================================

!pip install -q ftfy regex
!pip install -q git+https://github.com/openai/CLIP.git
!pip install -q bitsandbytes

print("\n" + "="*60)
print("✅ Dependencies installed!")
print("="*60)

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.8/44.8 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m4.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m73.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m70.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m47.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m30.4 MB/s[0

In [2]:
# ============================================================
# STEP 2: Check Environment
# ============================================================

import torch
import os

print("System Check:")
print("="*60)

if torch.cuda.is_available():
    print(f"✓ GPU: {torch.cuda.get_device_name(0)}")
    print(f"✓ VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
    for i in range(torch.cuda.device_count()):
        print(f"  GPU {i}: {torch.cuda.get_device_name(i)}")
else:
    print("⚠️  No GPU - enable T4 x2 in Settings")

print("="*60)

System Check:
✓ GPU: Tesla T4
✓ VRAM: 15.8 GB
  GPU 0: Tesla T4
  GPU 1: Tesla T4


In [3]:
# ============================================================
# STEP 3: Generate 150 Evaluation Prompts (Manual + Instruction)
# ============================================================

prompts = [
    # 20 Single-object
    "a photo of a cat", "a photo of a dog", "a photo of a horse", "a photo of a laptop",
    "a photo of a chair", "a photo of a teddy bear", "a photo of a bottle", "a photo of a car",
    "a photo of a bird", "a photo of a couch", "a photo of a book", "a photo of a refrigerator",
    "a photo of a clock", "a photo of a bicycle", "a photo of a backpack", "a photo of an umbrella",
    "a photo of a pizza", "a photo of a cake", "a photo of a dog toy", "a photo of a traffic light",
    # 20 Multi-object
    "a cat and a dog", "a laptop and a cup", "a bicycle and a backpack", "a horse and a person",
    "a car and a traffic light", "a bird and a flower", "a teddy bear and a chair", "a pizza and a cake",
    "a dog and a frisbee", "a cat and a laptop", "a bicycle and a helmet", "a car and a backpack",
    "a dog and a skateboard", "a person and a skateboard", "a bottle and a glass", "a couch and a table",
    "a clock and a book", "a dog and a cat toy", "a microwave and a toaster", "a laptop and a mouse",
    # 20 Counting
    "a photo of two cats", "a photo of three dogs", "a photo of four horses", "a photo of two laptops",
    "a photo of three chairs", "a photo of four teddy bears", "a photo of two birds", "a photo of three bicycles",
    "a photo of four bottles", "a photo of two cars", "a photo of three pizzas", "a photo of four cakes",
    "a photo of two backpacks", "a photo of three umbrellas", "a photo of four flowers", "a photo of two televisions",
    "a photo of three computers", "a photo of four tables", "a photo of two hats", "a photo of three shoes",
    # 20 Spatial
    "a cat on top of a car", "a dog under a table", "a red apple to the left of a blue bowl",
    "a bird sitting above a yellow flower", "a laptop next to a cup on the desk", "a black dog behind a white cat",
    "a person standing in front of a bicycle", "a teddy bear on top of a chair", "a cat lying below a window",
    "a green backpack to the right of a red suitcase", "a bottle on top of a wooden table", "a skateboard under a desk",
    "a horse standing behind a fence", "a smartphone next to a laptop on a shelf", "a blue chair to the left of a couch",
    "a dog sitting above a colorful rug", "a tennis racket leaning against a wall", "a cat on top of a stack of books",
    "a yellow ball below a red frisbee", "a coffee cup to the right of a notebook",
]

print(f"✅ Generated {len(prompts)} diverse prompts")
print("\nSample prompts:")
for i, p in enumerate(prompts[:10], 1):
    print(f"  {i}. {p}")


✅ Generated 80 diverse prompts

Sample prompts:
  1. a photo of a cat
  2. a photo of a dog
  3. a photo of a horse
  4. a photo of a laptop
  5. a photo of a chair
  6. a photo of a teddy bear
  7. a photo of a bottle
  8. a photo of a car
  9. a photo of a bird
  10. a photo of a couch


In [4]:
# ============================================================
# STEP 4: Find Fine-tuned Qwen Model
# ============================================================

FINETUNED_QWEN_PATH = "/kaggle/input/irg-2-qwen-finetuning/qwen_irg_finetuned"

if os.path.exists(FINETUNED_QWEN_PATH):
    print(f"✓ Fine-tuned Qwen found: {FINETUNED_QWEN_PATH}")
else:
    print("⚠️  Fine-tuned Qwen not found!")
    raise FileNotFoundError("Fine-tuned Qwen model not found")

✓ Fine-tuned Qwen found: /kaggle/input/irg-2-qwen-finetuning/qwen_irg_finetuned


In [5]:
# ============================================================
# STEP 5: Load CLIP
# ============================================================

import clip

print("Loading CLIP...")
device = "cuda" if torch.cuda.is_available() else "cpu"

clip_model, clip_preprocess = clip.load("ViT-B/32", device=device)
clip_model.eval()
print("✓ CLIP loaded")

Loading CLIP...


100%|███████████████████████████████████████| 338M/338M [00:04<00:00, 83.9MiB/s]


✓ CLIP loaded


In [6]:
# ============================================================
# STEP 6: Load SDXL
# ============================================================

from diffusers import DiffusionPipeline, StableDiffusionXLImg2ImgPipeline
import gc

print("Loading SDXL...")
txt2img_pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    use_safetensors=True,
    variant="fp16"
).to("cuda")

txt2img_pipe.enable_attention_slicing()
txt2img_pipe.enable_vae_slicing()
txt2img_pipe.enable_model_cpu_offload()
print("✓ SDXL text2img loaded")

# Create img2img by sharing components
img2img_pipe = StableDiffusionXLImg2ImgPipeline(
    vae=txt2img_pipe.vae,
    text_encoder=txt2img_pipe.text_encoder,
    text_encoder_2=txt2img_pipe.text_encoder_2,
    tokenizer=txt2img_pipe.tokenizer,
    tokenizer_2=txt2img_pipe.tokenizer_2,
    unet=txt2img_pipe.unet,
    scheduler=txt2img_pipe.scheduler,
)
img2img_pipe.enable_attention_slicing()
img2img_pipe.enable_vae_slicing()
print("✓ SDXL img2img created")

gc.collect()
torch.cuda.empty_cache()
print("\n✅ SDXL ready!")

2025-12-10 02:06:58.667237: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1765332418.846182      20 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1765332418.894361      20 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'

AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'

AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'

AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'

AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'

Loading SDXL...


model_index.json:   0%|          | 0.00/609 [00:00<?, ?B/s]

Fetching 19 files:   0%|          | 0/19 [00:00<?, ?it/s]

text_encoder_2/model.fp16.safetensors:   0%|          | 0.00/1.39G [00:00<?, ?B/s]

config.json:   0%|          | 0.00/575 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/565 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/737 [00:00<?, ?B/s]

text_encoder/model.fp16.safetensors:   0%|          | 0.00/246M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/472 [00:00<?, ?B/s]

scheduler_config.json:   0%|          | 0.00/479 [00:00<?, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/460 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/725 [00:00<?, ?B/s]

config.json: 0.00B [00:00, ?B/s]

config.json:   0%|          | 0.00/642 [00:00<?, ?B/s]

vae/diffusion_pytorch_model.fp16.safeten(…):   0%|          | 0.00/167M [00:00<?, ?B/s]

unet/diffusion_pytorch_model.fp16.safete(…):   0%|          | 0.00/5.14G [00:00<?, ?B/s]

vae_1_0/diffusion_pytorch_model.fp16.saf(…):   0%|          | 0.00/167M [00:00<?, ?B/s]

Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

✓ SDXL text2img loaded
✓ SDXL img2img created

✅ SDXL ready!


In [7]:
# ============================================================
# STEP 7: Load Fine-tuned Qwen
# ============================================================

from transformers import AutoModelForCausalLM, AutoTokenizer

print("Loading Fine-tuned Qwen...")
qwen_tokenizer = AutoTokenizer.from_pretrained(
    FINETUNED_QWEN_PATH,
    trust_remote_code=True
)
qwen_model = AutoModelForCausalLM.from_pretrained(
    FINETUNED_QWEN_PATH,
    load_in_8bit=True,
    device_map="auto",
    trust_remote_code=True
)
qwen_model.eval()
print("✓ Fine-tuned Qwen loaded (8-bit)")

gc.collect()
torch.cuda.empty_cache()

Loading Fine-tuned Qwen...


The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

✓ Fine-tuned Qwen loaded (8-bit)


In [8]:
# ============================================================
# STEP 8: Define Helper Functions (Optimized)
# ============================================================

def parse_llm_response(response: str) -> str:
    if "assistant" in response:
        return response.split("assistant")[-1].strip()
    return response.strip()

@torch.no_grad()
def initial_reasoning(prompt: str, tokenizer, llm) -> str:
    system_message = (
        "You are an expert visual reasoning assistant. Analyze the prompt and provide "
        "detailed guidance for creating a high-quality image. Focus on:\n"
        "1. Composition and framing\n"
        "2. Lighting and atmosphere\n"
        "3. Color palette and mood\n"
        "4. Key details and textures\n"
        "5. Style and artistic direction\n"
        "Be specific and actionable."
    )
    
    messages = [
        {"role": "system", "content": system_message},
        {"role": "user", "content": f'Prompt: "{prompt}"\n\nProvide your expert visual reasoning:'}
    ]
    
    text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512).to(llm.device)
    outputs = llm.generate(**inputs, max_new_tokens=150, temperature=0.7, top_p=0.9, do_sample=True)
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    
    del inputs, outputs
    gc.collect()
    torch.cuda.empty_cache()
    
    return parse_llm_response(response)

def generate_initial_image(prompt: str, reasoning: str, seed: int):
    enhanced_prompt = f"{prompt}. {reasoning[:100]}"
    generator = torch.Generator("cuda").manual_seed(seed)
    
    image = txt2img_pipe(
        prompt=enhanced_prompt,
        num_inference_steps=15,
        guidance_scale=7.5,
        generator=generator,
        height=512,
        width=512
    ).images[0]
    
    gc.collect()
    torch.cuda.empty_cache()
    return image

@torch.no_grad()
def encode_image_features(image):
    image_input = clip_preprocess(image).unsqueeze(0).to("cuda")
    image_features = clip_model.encode_image(image_input)
    return image_features

@torch.no_grad()
def reflection_reasoning(prompt: str, previous_reasoning: str, image_features: torch.Tensor, iteration: int, tokenizer, llm) -> str:
    feature_stats = f"mean={image_features.mean().item():.3f}, std={image_features.std().item():.3f}"
    
    system_message = (
        "You are a visual refinement expert. Analyze the current image state and provide "
        "specific, actionable improvements. Focus on:\n"
        "1. Fine-grained details and textures\n"
        "2. Lighting quality and shadows\n"
        "3. Color harmony and contrast\n"
        "4. Composition balance\n"
        "5. Overall realism and quality\n"
        "Be concrete and specific."
    )
    
    messages = [
        {"role": "system", "content": system_message},
        {"role": "user", "content": (
            f'Original Prompt: "{prompt}"\n\n'
            f"Iteration: {iteration}\n\n"
            f"Previous Reasoning:\n{previous_reasoning}\n\n"
            f"Current Image: {feature_stats}\n\n"
            "Provide detailed reflection:"
        )}
    ]
    
    text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512).to(llm.device)
    outputs = llm.generate(**inputs, max_new_tokens=150, temperature=0.7, top_p=0.9, do_sample=True)
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    
    del inputs, outputs
    gc.collect()
    torch.cuda.empty_cache()
    
    return parse_llm_response(response)

def refine_image(prompt: str, reflection: str, current_image, iteration: int, total_iterations: int, seed: int):
    refinement_prompt = f"{prompt}. {reflection[:100]}"
    generator = torch.Generator("cuda").manual_seed(seed)
    
    strength = 0.7 - (iteration / total_iterations) * 0.4
    strength = max(0.3, min(0.7, strength))
    guidance_scale = 7.5 + (iteration / total_iterations) * 2.0
    
    refined_image = img2img_pipe(
        prompt=refinement_prompt,
        image=current_image,
        strength=strength,
        num_inference_steps=20,
        guidance_scale=guidance_scale,
        generator=generator
    ).images[0]
    
    gc.collect()
    torch.cuda.empty_cache()
    return refined_image

def generate_vanilla(prompt: str, seed: int):
    generator = torch.Generator("cuda").manual_seed(seed)
    image = txt2img_pipe(
        prompt=prompt,
        num_inference_steps=15,
        guidance_scale=7.5,
        generator=generator,
        height=512,
        width=512
    ).images[0]
    
    gc.collect()
    torch.cuda.empty_cache()
    return image

def apply_single_refinement(prompt: str, current_image, reasoning: str, iteration: int, total_iterations: int, seed: int, tokenizer, llm):
    """Apply single refinement step (for incremental iteration)"""
    image_features = encode_image_features(current_image)
    reflection = reflection_reasoning(prompt, reasoning, image_features, iteration, tokenizer, llm)
    refined_image = refine_image(prompt, reflection, current_image, iteration, total_iterations, seed)
    return refined_image, reflection

def generate_irg_initial(prompt: str, seed: int, tokenizer, llm):
    """Generate initial IRG (iteration 1 + refinement to 2)"""
    reasoning = initial_reasoning(prompt, tokenizer, llm)
    current_image = generate_initial_image(prompt, reasoning, seed)
    
    # First refinement (iteration 2)
    image_features = encode_image_features(current_image)
    reflection = reflection_reasoning(prompt, reasoning, image_features, 2, tokenizer, llm)
    refined_image = refine_image(prompt, reflection, current_image, 2, 4, seed)
    
    return refined_image, reflection

print("✓ Helper functions defined (optimized for incremental refinement)")

✓ Helper functions defined (optimized for incremental refinement)


In [9]:
# ============================================================
# STEP 9: Generate All Images (Optimized)
# ============================================================

from PIL import Image
import time

output_dir = "/kaggle/working/generated_images"
os.makedirs(output_dir, exist_ok=True)

# Create directories for all 4 variants
variants = ["base_sdxl", "irg_2iter", "irg_3iter", "irg_4iter"]
for variant in variants:
    os.makedirs(os.path.join(output_dir, variant), exist_ok=True)

print("="*60)
print("GENERATING IMAGES FOR ALL 4 VARIANTS")
print("="*60)
print(f"Variants: {len(variants)}")
print(f"Prompts: {len(prompts)}")
print(f"Total images: {len(variants) * len(prompts)}")
print(f"\nOptimization: Incremental refinement (2→3→4)")
print("="*60)

start_time = time.time()

for i, prompt in enumerate(prompts):
    print(f"\n[{i+1}/{len(prompts)}] {prompt}")
    print("="*60)
    
    seed = 42 + i
    
    # 1. Base SDXL (no reasoning)
    print("  [1/4] Base SDXL...")
    base_img = generate_vanilla(prompt, seed)
    base_img.save(os.path.join(output_dir, "base_sdxl", f"{i:03d}.png"))
    del base_img
    gc.collect()
    torch.cuda.empty_cache()
    
    # 2. IRG 2-iter (initial + 1 refinement)
    print("  [2/4] IRG 2-iter...")
    img_2iter, reasoning = generate_irg_initial(prompt, seed, qwen_tokenizer, qwen_model)
    img_2iter.save(os.path.join(output_dir, "irg_2iter", f"{i:03d}.png"))
    
    # 3. IRG 3-iter (continue from 2-iter + 1 refinement)
    print("  [3/4] IRG 3-iter (continuing from 2-iter)...")
    img_3iter, reasoning = apply_single_refinement(
        prompt, img_2iter, reasoning, 3, 4, seed, qwen_tokenizer, qwen_model
    )
    img_3iter.save(os.path.join(output_dir, "irg_3iter", f"{i:03d}.png"))
    
    # 4. IRG 4-iter (continue from 3-iter + 1 refinement)
    print("  [4/4] IRG 4-iter (continuing from 3-iter)...")
    img_4iter, reasoning = apply_single_refinement(
        prompt, img_3iter, reasoning, 4, 4, seed, qwen_tokenizer, qwen_model
    )
    img_4iter.save(os.path.join(output_dir, "irg_4iter", f"{i:03d}.png"))
    
    del img_2iter, img_3iter, img_4iter
    gc.collect()
    torch.cuda.empty_cache()
    
    # Progress update
    if (i + 1) % 10 == 0:
        elapsed = time.time() - start_time
        avg_time = elapsed / (i + 1)
        remaining = avg_time * (len(prompts) - i - 1)
        print(f"\n  Progress: {i+1}/{len(prompts)} ({(i+1)/len(prompts)*100:.1f}%)")
        print(f"  Avg time: {avg_time:.1f}s/prompt")
        print(f"  Remaining: {remaining/60:.1f} minutes")

elapsed = time.time() - start_time
print(f"\n" + "="*60)
print(f"✅ IMAGE GENERATION COMPLETE in {elapsed/60:.1f} minutes")
print(f"   (Saved ~{elapsed*0.33/60:.1f} minutes with incremental refinement!)")
print("="*60)

GENERATING IMAGES FOR ALL 4 VARIANTS
Variants: 4
Prompts: 80
Total images: 320

Optimization: Incremental refinement (2→3→4)

[1/80] a photo of a cat
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[2/80] a photo of a dog
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[3/80] a photo of a horse
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[4/80] a photo of a laptop
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[5/80] a photo of a chair
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[6/80] a photo of a teddy bear
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[7/80] a photo of a bottle
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[8/80] a photo of a car
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[9/80] a photo of a bird
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[10/80] a photo of a couch
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


  Progress: 10/80 (12.5%)
  Avg time: 224.1s/prompt
  Remaining: 261.4 minutes

[11/80] a photo of a book
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[12/80] a photo of a refrigerator
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[13/80] a photo of a clock
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[14/80] a photo of a bicycle
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[15/80] a photo of a backpack
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[16/80] a photo of an umbrella
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[17/80] a photo of a pizza
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[18/80] a photo of a cake
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[19/80] a photo of a dog toy
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[20/80] a photo of a traffic light
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


  Progress: 20/80 (25.0%)
  Avg time: 217.8s/prompt
  Remaining: 217.8 minutes

[21/80] a cat and a dog
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[22/80] a laptop and a cup
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[23/80] a bicycle and a backpack
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[24/80] a horse and a person
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[25/80] a car and a traffic light
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[26/80] a bird and a flower
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[27/80] a teddy bear and a chair
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[28/80] a pizza and a cake
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[29/80] a dog and a frisbee
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[30/80] a cat and a laptop
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


  Progress: 30/80 (37.5%)
  Avg time: 212.9s/prompt
  Remaining: 177.4 minutes

[31/80] a bicycle and a helmet
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[32/80] a car and a backpack
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[33/80] a dog and a skateboard
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[34/80] a person and a skateboard
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[35/80] a bottle and a glass
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[36/80] a couch and a table
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[37/80] a clock and a book
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[38/80] a dog and a cat toy
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[39/80] a microwave and a toaster
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[40/80] a laptop and a mouse
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


  Progress: 40/80 (50.0%)
  Avg time: 211.0s/prompt
  Remaining: 140.7 minutes

[41/80] a photo of two cats
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[42/80] a photo of three dogs
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[43/80] a photo of four horses
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[44/80] a photo of two laptops
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[45/80] a photo of three chairs
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[46/80] a photo of four teddy bears
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[47/80] a photo of two birds
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[48/80] a photo of three bicycles
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[49/80] a photo of four bottles
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[50/80] a photo of two cars
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


  Progress: 50/80 (62.5%)
  Avg time: 209.6s/prompt
  Remaining: 104.8 minutes

[51/80] a photo of three pizzas
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[52/80] a photo of four cakes
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[53/80] a photo of two backpacks
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[54/80] a photo of three umbrellas
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[55/80] a photo of four flowers
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[56/80] a photo of two televisions
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[57/80] a photo of three computers
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[58/80] a photo of four tables
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[59/80] a photo of two hats
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[60/80] a photo of three shoes
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


  Progress: 60/80 (75.0%)
  Avg time: 209.3s/prompt
  Remaining: 69.8 minutes

[61/80] a cat on top of a car
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[62/80] a dog under a table
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[63/80] a red apple to the left of a blue bowl
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[64/80] a bird sitting above a yellow flower
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[65/80] a laptop next to a cup on the desk
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[66/80] a black dog behind a white cat
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[67/80] a person standing in front of a bicycle
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[68/80] a teddy bear on top of a chair
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[69/80] a cat lying below a window
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[70/80] a green backpack to the right of a red suitcase
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


  Progress: 70/80 (87.5%)
  Avg time: 208.3s/prompt
  Remaining: 34.7 minutes

[71/80] a bottle on top of a wooden table
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[72/80] a skateboard under a desk
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[73/80] a horse standing behind a fence
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[74/80] a smartphone next to a laptop on a shelf
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (174 > 77). Running this sequence through the model will result in indexing errors
The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: ['方案如下 : composition and framing ( 构图与框架 ) • 主体 ( 智能手机 、 笔记本电脑 ) 位置遵循黄金分割或三分法则 • 中心']
Token indices sequence length is longer than the specified maximum sequence length for this model (174 > 77). Running this sequence through the model will result in indexing errors
The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: ['方案如下 : composition and framing ( 构图与框架 ) • 主体 ( 智能手机 、 笔记本电脑 ) 位置遵循黄金分割或三分法则 • 中心']


  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[75/80] a blue chair to the left of a couch
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[76/80] a dog sitting above a colorful rug
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[77/80] a tennis racket leaning against a wall
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[78/80] a cat on top of a stack of books
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[79/80] a yellow ball below a red frisbee
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


[80/80] a coffee cup to the right of a notebook
  [1/4] Base SDXL...


  0%|          | 0/15 [00:00<?, ?it/s]

  [2/4] IRG 2-iter...


  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  [3/4] IRG 3-iter (continuing from 2-iter)...


  0%|          | 0/7 [00:00<?, ?it/s]

  [4/4] IRG 4-iter (continuing from 3-iter)...


  0%|          | 0/6 [00:00<?, ?it/s]


  Progress: 80/80 (100.0%)
  Avg time: 207.4s/prompt
  Remaining: 0.0 minutes

✅ IMAGE GENERATION COMPLETE in 276.5 minutes
   (Saved ~91.2 minutes with incremental refinement!)


In [10]:
# ============================================================
# STEP 10: Save Prompts and Metadata
# ============================================================

import json

# Save prompts
with open(os.path.join(output_dir, "prompts.txt"), "w") as f:
    for p in prompts:
        f.write(p + "\n")

# Save metadata
metadata = {
    "num_prompts": len(prompts),
    "variants": variants,
    "comparison": "Base SDXL vs IRG Fine-tuned Qwen (2, 3, 4 iterations)",
    "iterations": [0, 2, 3, 4],
    "optimization": "incremental_refinement",
    "seed_base": 42,
    "image_size": 512
}

with open(os.path.join(output_dir, "metadata.json"), "w") as f:
    json.dump(metadata, f, indent=2)

print("="*60)
print("✅ ALL COMPLETE!")
print("="*60)
print(f"\nGenerated {len(prompts) * len(variants)} images:")
for variant in variants:
    count = len(prompts)
    print(f"  - {variant}: {count} images")
print(f"\nSaved to: {output_dir}/")
print("\nNext step: Run benchmarking notebook to evaluate all 5 metrics")
print("="*60)

✅ ALL COMPLETE!

Generated 320 images:
  - base_sdxl: 80 images
  - irg_2iter: 80 images
  - irg_3iter: 80 images
  - irg_4iter: 80 images

Saved to: /kaggle/working/generated_images/

Next step: Run benchmarking notebook to evaluate all 5 metrics
