<a href="https://colab.research.google.com/github/vitchierath/Gen_Ai_miniprojects/blob/main/genproj.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install diffusers transformers accelerate scipy

Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.wh

In [3]:
import os
import time
from PIL import Image, ImageDraw, ImageFont
import cv2
import numpy as np
from diffusers import StableDiffusionPipeline, StableDiffusionImg2ImgPipeline
from transformers import pipeline as hf_pipeline, BlipProcessor, BlipForConditionalGeneration
import torch
import warnings

# Suppress warnings
warnings.filterwarnings("ignore")


use_cpu = True  # Set to False for GPU after debugging

# Initialize models
def load_model(model_class, model_id, pipeline_type="standard", retries=2):
    for attempt in range(retries):
        try:
            device = "cpu" if use_cpu else ("cuda" if torch.cuda.is_available() else "cpu")
            pipe = model_class.from_pretrained(
                model_id,
                safety_checker=None,
                torch_dtype=torch.float32 if device == "cpu" else torch.float16
            )
            pipe = pipe.to(device)
            print(f"✅ {model_id} ({pipeline_type}) loaded on {device}.")
            return pipe
        except Exception as e:
            print(f"⚠️ Attempt {attempt+1}/{retries} failed: {e}")
            if attempt < retries - 1:
                time.sleep(2)
            if device == "cuda" and attempt == retries - 1:
                print(f"⚠️ Falling back to CPU.")
                device = "cpu"
    raise Exception(f"Failed to load {model_id} after {retries} attempts.")

# Use reliable publicly available models
try:
    # Try stabilityai/stable-diffusion-2-1 for good quality without requiring too much memory
    pipe_realistic = load_model(StableDiffusionPipeline, "stabilityai/stable-diffusion-2-1", "standard")
except Exception as e:
    print(f"⚠️ Failed to load SD 2.1. Falling back to SD 1.5...")
    try:
        # Last resort - original SD 1.5
        pipe_realistic = load_model(StableDiffusionPipeline, "runwayml/stable-diffusion-v1-5", "standard")
    except Exception as e2:
        print(f"⚠️ Critical error loading any stable diffusion model: {e2}")
        exit(1)

try:
    # Use the same model for img2img to maintain consistency
    model_id = pipe_realistic.config._name_or_path
    pipe_img2img = load_model(StableDiffusionImg2ImgPipeline, model_id, "img2img")
except Exception as e:
    print(f"⚠️ Failed to load matching img2img model. Trying SD 1.5...")
    try:
        pipe_img2img = load_model(StableDiffusionImg2ImgPipeline, "runwayml/stable-diffusion-v1-5", "img2img")
    except Exception as e2:
        print(f"⚠️ Critical error loading img2img model: {e2}")
        exit(1)

try:
    generator = hf_pipeline("text-generation", model="gpt2")
    print("✅ GPT-2 loaded.")
except Exception as e:
    print(f"⚠️ Failed to load GPT-2: {e}")
    exit(1)

try:
    blip_processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
    blip_model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")
    if not use_cpu and torch.cuda.is_available():
        blip_model = blip_model.to("cuda")
    print("✅ BLIP loaded.")
except Exception as e:
    print(f"⚠️ Failed to load BLIP: {e}")
    exit(1)

# Generate prompt
def generate_prompt(input_text, is_initial=True, prev_image_filename=None):
    if is_initial:
        return f"A photorealistic scene depicting {input_text}, highly detailed, professional photography, sharp focus, 8k."
    try:
        if prev_image_filename and os.path.exists(prev_image_filename):
            image = Image.open(prev_image_filename).convert("RGB")
            inputs = blip_processor(images=image, return_tensors="pt")
            if not use_cpu and torch.cuda.is_available():
                inputs = {k: v.to("cuda") for k, v in inputs.items()}
            outputs = blip_model.generate(**inputs)
            prev_description = blip_processor.decode(outputs[0], skip_special_tokens=True)
            image.close()
        else:
            prev_description = "the previous scene"
    except Exception as e:
        print(f"⚠️ BLIP analysis error: {e}")
        prev_description = "the previous scene"
    return f"A photorealistic scene evolving from '{prev_description}', incorporating {input_text}, highly detailed, professional photography, sharp focus, 8k."

# Scene analysis with BLIP
def analyze_scene(image_filename):
    try:
        if os.path.exists(image_filename):
            image = Image.open(image_filename).convert("RGB")
            inputs = blip_processor(images=image, return_tensors="pt")
            if not use_cpu and torch.cuda.is_available():
                inputs = {k: v.to("cuda") for k, v in inputs.items()}
            outputs = blip_model.generate(**inputs)
            description = blip_processor.decode(outputs[0], skip_special_tokens=True)
            image.close()
            return description if description else "A realistic scene."
        return "A realistic scene."
    except Exception as e:
        print(f"⚠️ Scene analysis error: {e}")
        return "A realistic scene."

# Generate dialogue
def generate_dialogue(scene_analysis):
    try:
        prompt = (
            f"Scene: '{scene_analysis}'. "
            f"Create a 1-2 line dialogue for characters, capturing the mood."
        )
        dialogue = generator(
            prompt, max_new_tokens=30, do_sample=True, temperature=0.9
        )[0]['generated_text']
        dialogue = dialogue[len(prompt):].strip()
        dialogue = dialogue[:dialogue.rfind(".") + 1] if "." in dialogue else dialogue
        return dialogue if dialogue else "A quiet moment unfolds."
    except Exception as e:
        print(f"⚠️ Dialogue error: {e}")
        return "A quiet moment unfolds."

# Create comic panel
def create_comic_panel(image_filenames, dialogues):
    try:
        images = [Image.open(f).convert("RGB").resize((256, 256)) for f in image_filenames]
        num_images = len(images)
        panel_width = 256 * min(num_images, 4)  # Max 4 panels per row
        panel_height = 320  # Space for dialogue
        comic = Image.new("RGB", (panel_width, panel_height), "white")
        draw = ImageDraw.Draw(comic)

        try:
            font = ImageFont.truetype("arial.ttf", 14)
        except:
            font = ImageFont.load_default()

        for i, img in enumerate(images):
            x = (i % 4) * 256
            comic.paste(img, (x, 0))
            dialogue = dialogues[i][:40]  # Truncate for space
            draw.text((x + 10, 260), dialogue, fill="black", font=font)
            img.close()

        output_filename = f"comic_{int(time.time())}.png"
        comic.save(output_filename)
        comic.close()
        return output_filename
    except Exception as e:
        print(f"⚠️ Comic panel error: {e}")
        return None

# Create video
def create_video(image_filenames):
    try:
        output_filename = f"video_{int(time.time())}.mp4"
        fourcc = cv2.VideoWriter_fourcc(*"mp4v")
        out = cv2.VideoWriter(output_filename, fourcc, 1.0, (256, 256))

        for f in image_filenames:
            if os.path.exists(f):
                img = cv2.imread(f)
                img = cv2.resize(img, (256, 256))
                out.write(img)
                for _ in range(2):  # 3 seconds at 1 fps
                    out.write(img)

        out.release()
        return output_filename
    except Exception as e:
        print(f"⚠️ Video error: {e}")
        return None

# Main loop
history = []
session_log = []
last_image_filename = None
is_initial = True

print("🧠 Realistic Art Odyssey Generator Activated!")
print("→ Enter initial prompt (e.g., 'two rows of soldiers facing each other').")
print("→ Provide follow-up instructions (e.g., 'add a stormy sky').")
print("→ Type 'stop' to generate story and choose comic or video.\n")

while True:
    user_input = input("🌀 Input: ").strip()
    if user_input.lower() == "stop":
        break
    if not user_input:
        print("⚠️ Input cannot be empty.\n")
        continue

    input_type = "initial prompt" if is_initial else "follow-up instruction"
    current_prompt = generate_prompt(user_input, is_initial, last_image_filename)
    is_initial = False

    try:
        timestamp = int(time.time())
        image_filename = f"art_{timestamp}.png"

        # For simplicity and reliability, use 512x512 as standard size
        height = 512
        width = 512

        if input_type == "initial prompt" or last_image_filename is None or not os.path.exists(last_image_filename):
            # Generate a new image with the text-to-image pipeline
            print(f"Generating new image for prompt: '{current_prompt}'")
            image = pipe_realistic(
                current_prompt,
                guidance_scale=7.5,
                height=height,
                width=width,
                num_inference_steps=30
            ).images[0]
        else:
            # Try image-to-image with explicit error handling
            try:
                print(f"Attempting img2img with prompt: '{current_prompt}'")
                # Open and convert the initial image
                init_image = Image.open(last_image_filename).convert("RGB")
                # Resize to expected dimensions
                init_image = init_image.resize((height, width), Image.LANCZOS)

                # Debug info
                print(f"Input image type: {type(init_image)}, size: {init_image.size}")

                # Run img2img pipeline with explicit parameters
                result = pipe_img2img(
                    prompt=current_prompt,
                    image=init_image,  # Use 'image' parameter name instead of 'init_image'
                    strength=0.75,     # How much to transform the image (0-1)
                    guidance_scale=7.5,
                    num_inference_steps=30
                )
                image = result.images[0]
                init_image.close()
                print("✅ Img2img successful")
            except Exception as e:
                print(f"⚠️ Img2img failed: {e}. Generating new image.")
                # Fallback to text2img if img2img fails
                image = pipe_realistic(
                    current_prompt,
                    guidance_scale=7.5,
                    height=height,
                    width=width,
                    num_inference_steps=30
                ).images[0]

        # Resize to 256x256 for consistency with video/comic functions
        image = image.resize((256, 256), Image.LANCZOS)
        image.save(image_filename)

        # Generate analysis and dialogue
        scene_analysis = analyze_scene(image_filename)
        dialogue = generate_dialogue(scene_analysis)

        # Add to history
        history.append((user_input, input_type, current_prompt, image_filename, scene_analysis, dialogue))
        session_log.append(
            f"Input: {user_input} ({input_type})\nImage: {image_filename}\nAnalysis: {scene_analysis}\nDialogue: {dialogue}\n"
        )

        last_image_filename = image_filename

        print(f"✅ Image saved: {image_filename}")
        print(f"📊 Analysis: {scene_analysis}")
        print(f"💬 Dialogue: {dialogue}\n")

    except Exception as e:
        print(f"⚠️ Generation error: {e}\n")
        continue

# Save session log
try:
    with open(f"session_log_{int(time.time())}.txt", "w") as f:
        f.write("\n".join(session_log))
    print("✅ Session log saved.")
except Exception as e:
    print(f"⚠️ Failed to save session log: {e}")

# Generate story
def generate_final_story(history):
    print("\n🧠 Generating Story...\n")
    intro = "A journey through realistic scenes:\n"
    for i, (input_text, input_type, prompt, filename, analysis, dialogue) in enumerate(history):
        intro += (
            f"\nScene {i+1} — Input: '{input_text}' ({input_type})\n"
            f"Image: {filename}\nAnalysis: {analysis}\nDialogue: {dialogue}\n"
        )

    intro += "\nAs the journey halted, the scenes formed a compelling narrative...\n"

    try:
        story = generator(
            intro, max_new_tokens=300, do_sample=True, temperature=0.9
        )[0]['generated_text']
        story = story[:story.rfind(".") + 1]
        return story
    except Exception as e:
        return f"⚠️ Story generation failed: {e}"

final_story = generate_final_story(history)
print("📖 STORY OUTPUT:\n")
print(final_story)

# Choose output
if history:
    print("\n🎨 Choose output:")
    print("1. Comic panel (with dialogues)")
    print("2. Video (image sequence)")
    choice = input("Enter 1 or 2: ").strip()

    image_filenames = [h[3] for h in history]
    dialogues = [h[5] for h in history]

    if choice == "1":
        comic_filename = create_comic_panel(image_filenames, dialogues)
        if comic_filename:
            print(f"✅ Comic panel saved: {comic_filename}")
    elif choice == "2":
        video_filename = create_video(image_filenames)
        if video_filename:
            print(f"✅ Video saved: {video_filename}")
    else:
        print("⚠️ Invalid choice.")

if not use_cpu:
    try:
        if torch.cuda.is_available():
            torch.cuda.empty_cache()
            print("✅ GPU memory cleared.")
    except Exception as e:
        print(f"⚠️ GPU memory cleanup error: {e}")

Loading pipeline components...:   0%|          | 0/6 [00:00<?, ?it/s]

✅ stabilityai/stable-diffusion-2-1 (standard) loaded on cpu.


Loading pipeline components...:   0%|          | 0/6 [00:00<?, ?it/s]

✅ stabilityai/stable-diffusion-2-1 (img2img) loaded on cpu.


Device set to use cpu


✅ GPT-2 loaded.
✅ BLIP loaded.
🧠 Realistic Art Odyssey Generator Activated!
→ Enter initial prompt (e.g., 'two rows of soldiers facing each other').
→ Provide follow-up instructions (e.g., 'add a stormy sky').
→ Type 'stop' to generate story and choose comic or video.

🌀 Input: two rows of soldiers facing each other
Generating new image for prompt: 'A photorealistic scene depicting two rows of soldiers facing each other, highly detailed, professional photography, sharp focus, 8k.'


  0%|          | 0/30 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


✅ Image saved: art_1745738774.png
📊 Analysis: a group of police officers
💬 Dialogue: The story will follow the young police commissioner, his sister and his young wife's son, but with some added time in between.

🌀 Input: shooting circle in the wall araised. so do the inspection officer
Attempting img2img with prompt: 'A photorealistic scene evolving from 'a group of police officers', incorporating shooting circle in the wall araised. so do the inspection officer, highly detailed, professional photography, sharp focus, 8k.'
Input image type: <class 'PIL.Image.Image'>, size: (512, 512)


  0%|          | 0/22 [00:00<?, ?it/s]

✅ Img2img successful


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


✅ Image saved: art_1745739926.png
📊 Analysis: a group of people standing in a line
💬 Dialogue: This script is also useful for helping a character talk about an idea or topic.

🌀 Input: stop


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


✅ Session log saved.

🧠 Generating Story...

📖 STORY OUTPUT:

A journey through realistic scenes:

Scene 1 — Input: 'two rows of soldiers facing each other' (initial prompt)
Image: art_1745738774.png
Analysis: a group of police officers
Dialogue: The story will follow the young police commissioner, his sister and his young wife's son, but with some added time in between.

Scene 2 — Input: 'shooting circle in the wall araised. so do the inspection officer' (follow-up instruction)
Image: art_1745739926.png
Analysis: a group of people standing in a line
Dialogue: This script is also useful for helping a character talk about an idea or topic.

As the journey halted, the scenes formed a compelling narrative...

Scene 3 — Input: 'dining, singing' (follow-up instruction)

Image: art_1745738774.png

Analysis: two guards

Scene 4 — Input: The scene that plays across the screen, with all of the characters acting as soldiers.

Image: art_1745739926.png

Analysis: two men standing in line at a res