# Chinese Classical Poetry Visualization System
This system generates visual interpretations of Chinese classical poems using AI models, combining modified SDXL for image generation and GLM-4 for poem analysis.

## Setup and Dependencies
The following cell installs required packages and imports necessary libraries.

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [3]:
# Required pip installations
!pip install transformers
!pip install diffusers
!pip install accelerate
!pip install zhipuai
!pip install moviepy
!pip install bayesian-optimization
!pip install xformers
!pip install safetensors
!pip install triton

# Imports
import os
import gc
import json
import time
import torch
import numpy as np
import pandas as pd
from PIL import Image
from tqdm import tqdm
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
from PIL import Image, ImageDraw, ImageFont
from moviepy.editor import ImageClip, concatenate_videoclips
import traceback
import torch.nn.functional as F
from transformers import CLIPProcessor, CLIPModel
from diffusers import DiffusionPipeline, EulerDiscreteScheduler
from bayes_opt import BayesianOptimization
from zhipuai import ZhipuAI
from concurrent.futures import ThreadPoolExecutor
import re
import triton
import subprocess

# Disable warnings
import warnings
warnings.filterwarnings('ignore')

# Enable cuda if available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')



Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/xformers/__init__.py", line 57, in _is_triton_available
    import triton  # noqa
ModuleNotFoundError: No module named 'triton'


## Main System Implementation
This cell contains the complete implementation including:
- Utility functions (font management, GPU memory, Drive operations)
- PoemAnalyzer class (GLM-4 based poem analysis and ontology)
- BayesianStableDiffusion class (Modified SDXL image generation)
- ModelComparisonExperiment class (experiment management)
- Main execution flow

Key Features:
- Automated poem analysis and understanding
- High-quality image generation with refinement
- CLIP-guided image selection
- Video generation with text overlays
- Performance optimization and reporting

Usage:
1. Run the cell
2. Select a poem when prompted
3. Wait for processing (3-5 minutes per line)
4. View generated images and video
5. Check performance metrics

In [12]:
def find_available_font():
    """Find an available font for text rendering."""
    font_paths = [
        "/usr/share/fonts/truetype/noto/NotoSansCJK-Bold.ttc",
        "/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf",
        "/usr/share/fonts/truetype/liberation/LiberationSans-Bold.ttf"
    ]

    for path in font_paths:
        try:
            if os.path.exists(path):
                return path
        except:
            pass
    return None

def clear_gpu_memory():
    """Clear GPU memory and cache."""
    gc.collect()
    torch.cuda.empty_cache()
    if torch.cuda.is_available():
        torch.cuda.synchronize()

def save_to_drive(video_path):
    """Save the output to Google Drive if mounted"""
    from google.colab import drive
    try:
        drive.mount('/content/drive')
        import shutil
        drive_path = f"/content/drive/MyDrive/Colab Notebooks/Capstone/Video Generated/{os.path.basename(video_path)}"
        os.makedirs(os.path.dirname(drive_path), exist_ok=True)
        shutil.copy(video_path, drive_path)
        print(f"Video saved to Drive: {drive_path}")
    except Exception as e:
        print(f"Could not save to Drive: {e}")

def load_poem_from_json(json_file_path, poem_title):
    """Load a specific poem from the JSON file."""
    try:
        with open(json_file_path, 'r', encoding='utf-8') as f:
            data = json.load(f)

        for poem in data['poems']:
            if poem['title'] == poem_title:
                return poem

        print(f"Poem '{poem_title}' not found in the database.")
        return None

    except Exception as e:
        print(f"Error loading poem data: {str(e)}")
        return None

# Define models to compare
MODELS_TO_COMPARE = {
    "SDXL": "stabilityai/stable-diffusion-xl-base-1.0"
}

class EnhancedPoemAnalyzer:
    def __init__(self, api_key="968cd0b672b9b5133d01741721558a95.xYFanKwaJ2ShpQuZ"):
        self.client = ZhipuAI(api_key=api_key)
        self.chunk_cache = {}

    def translate_to_english(self, chinese_text):
        prompt = f"""Please translate the following Chinese text to English, maintaining the visual and descriptive nature of the content:

        Chinese text:
        {chinese_text}

        Requirements:
        1. Translate to natural, fluent English
        2. Preserve all visual descriptions and imagery
        3. Keep any technical or specific terms
        4. Maintain the original structure where applicable
        """

        response = self.client.chat.completions.create(
            model="glm-4",
            messages=[{"role": "user", "content": prompt}]
        )

        return response.choices[0].message.content.strip()

    def interpret_cultural_terms(self, text, context):
        prompt = f"""请分析这句诗中的专有名词或文化意象，将其转换为具体的视觉描述：

        原诗上下文：
        {context}

        需要分析的句子：
        {text}

        请识别所有的专有名词、人物称谓或文化意象，并给出具体的视觉描述。
        格式要求：
        1. 每个词一行
        2. 用 "词：视觉描述" 的格式
        3. 描述必须是具体的、可视化的，避免抽象概念
        4. 描述要符合诗歌语境
        5. 请直接用英文描述

        示例：
        若"王孙"在此诗中表达隐居的文人，应描述为"a scholarly man in traditional robes meditating in nature"
        若"渔父"出现，应描述为"an old fisherman in simple clothes on a wooden boat"
        """

        response = self.client.chat.completions.create(
            model="glm-4",
            messages=[{"role": "user", "content": prompt}]
        )

        interpretations = {}
        for line in response.choices[0].message.content.strip().split('\n'):
            if '：' in line:
                term, desc = line.split('：', 1)
                interpretations[term.strip()] = desc.strip()

        interpreted_text = text
        for term, desc in interpretations.items():
            interpreted_text = interpreted_text.replace(term, desc)

        return interpreted_text

    def get_poem_understanding(self, full_poem):
        study_prompt = f"""请从视觉角度分析这首诗的整体意境、场景和情感，重点描述：
        1. 主要场景和环境特征
        2. 光线和时间的变化
        3. 人物的动作和状态
        4. 整体氛围和情感基调

        诗文：
        {full_poem}

        请用英文回答，使用具体的视觉语言描述，避免抽象概念。"""

        response = self.client.chat.completions.create(
            model="glm-4",
            messages=[{"role": "user", "content": study_prompt}]
        )

        return response.choices[0].message.content

    def analyze_chunk_detail(self, chunk, category, context_dict):
        cache_key = f"{chunk}_{category}"
        if cache_key in self.chunk_cache:
            return self.chunk_cache[cache_key]

        previous_chunk = context_dict.get('previous_chunk', '')
        interpreted_chunk = self.interpret_cultural_terms(chunk, context_dict['full_poem'])

        prompts = {
            "subject_action": f"""
            Analyze the subjects (people/animal/living beings) and their actions in this line/segment of poetry.
            Original: "{chunk}"
            Previous line: "{previous_chunk}"
            Interpreted: "{interpreted_chunk}"

            Return in this exact format:
            subjects: [concrete description of each person/animal/living being, their clothing/appearance if mentioned]
            actions: [specific descriptions of actions]

            Requirements:
            - The output must be consistent with and referred to the poem understanding
            - For subjects, always include full description
            - If no explicit subject in current line, use the subject from previous line
            - If previous line has subject "孤鸿" (lonely goose), current line should use "the lonely goose" as subject
            - All actions must be visually concrete (e.g., for "不敢顾", show "lifting its wings away from")
            - Avoid literary descriptions and Chinese terms
            - If this is the first line ({context_dict['chunk_index'] == 0}), only include subjects explicitly mentioned
            - If subject is not further described in terms of appearance or clothing in this or previous line, do not mention "not further describe" or "not explicitly stated" in final output
            - Do not put explanations or interpretations in brackets
            - List each subject and action exactly once
            - Use concise, visual descriptions
            """,

            "scene_setting": f"""
            Analyze the scene and environmental elements in this line:
            Original: "{chunk}"
            Interpreted: "{interpreted_chunk}"

            Please return in this format:
            locations: [specific scene locations]
            objects: [specific physical objects, flora, fauna, celestial elements and architectural elements]
            Requirements:
            - The output must be consistent with and referred to the poem understanding
            - Descriptions must be concrete and visual
            - Avoid display of Chinese in final output
            - All physical elements must be included! (e.g. 举杯邀明月 will have cup and moon; 巢在三珠树 will have nests on branches, three pearl trees)
            - Take note of the amounts too (e.g. three birds)
            - Be extremely specific about objects (e.g., "three pearl trees" rather than just "trees")
            - No explanations in the brackets
            - List each element exactly once
            - Use concise, visual terms
            """,

            "time_weather": f"""
            Analyze the time and weather elements in this line:
            Original: "{chunk}"
            Interpreted: "{interpreted_chunk}"

            Please return in this format:
            time: [specific time, e.g., sunset, dawn]
            weather: [specific weather conditions]
            Requirements:
            - The output must be consistent with and referred to the poem understanding
            - Use commonly recognized natural phenomena
            - Avoid display of Chinese in final output
            - If there is a moon mentioned, set time to night
            - No explanations in the brackets
            - List each element exactly once
            - Use concise, visual terms
            """,

            "mood": f"""
            Analyze the visual atmosphere and emotional elements in this line:
            Original: "{chunk}"
            Interpreted: "{interpreted_chunk}"

            Please return in this format:
            lighting: [specific lighting effects]
            atmosphere: [specific visual mood]
            color_tone: [main color tones]
            Requirements:
            - The output must be consistent with and referred to the poem understanding
            - All descriptions should be directly usable for image generation
            - Avoid display of Chinese in final output
            - No explanations in the brackets
            - List each element exactly once
            - Use concise, visual terms
            """
        }

        response = self.client.chat.completions.create(
            model="glm-4",
            messages=[{"role": "user", "content": prompts[category]}]
        )

        result = response.choices[0].message.content.strip()
        self.chunk_cache[cache_key] = result
        return result

    def analyze_chunk_parallel(self, chunk, context):
        with ThreadPoolExecutor(max_workers=4) as executor:
            futures = {
                "subject_action": executor.submit(self.analyze_chunk_detail, chunk, "subject_action", context),
                "scene_setting": executor.submit(self.analyze_chunk_detail, chunk, "scene_setting", context),
                "time_weather": executor.submit(self.analyze_chunk_detail, chunk, "time_weather", context),
                "mood": executor.submit(self.analyze_chunk_detail, chunk, "mood", context)
            }

            return {
                "text": chunk,
                "subject_action": futures["subject_action"].result(),
                "scene_setting": futures["scene_setting"].result(),
                "time_weather": futures["time_weather"].result(),
                "mood": futures["mood"].result()
            }

    def pack_chunk_to_prompt(self, chunk_analysis, overall_understanding):
        try:
            elements = {
                'primary_subjects': [],
                'secondary_subjects': [],
                'actions': [],
                'objects': [],
                'environment': [],
                'lighting': [],
                'atmosphere': [],
                'style': ['RTX ultra detailed realism', 'traditional China']
            }

            # Extract subjects and their relationships
            if 'subject_action' in chunk_analysis:
                content = chunk_analysis['subject_action']
                if 'subjects: [' in content and 'actions: [' in content:
                    subjects = content.split('subjects: [')[1].split(']')[0].split(', ')
                    actions = content.split('actions: [')[1].split(']')[0].split(', ')

                    # Separate primary and secondary subjects
                    for subject in subjects:
                        if 'hunter' in subject.lower():
                            elements['secondary_subjects'].append(subject)
                        else:
                            elements['primary_subjects'].append(subject)
                    elements['actions'].extend(actions)

            # Build the prompt with proper relationships
            prompt_parts = []

            # Combine subjects and actions
            if elements['primary_subjects'] and elements['secondary_subjects']:
                subjects_str = f"{', '.join(elements['primary_subjects'])}, {', '.join(elements['secondary_subjects'])}"
                prompt_parts.append(f"featuring {subjects_str}")
            elif elements['primary_subjects']:
                prompt_parts.append(f"featuring {', '.join(elements['primary_subjects'])}")

            if elements['actions']:
                prompt_parts.append(f"with {', '.join(elements['actions'])}")

            # Add remaining elements
            if elements['objects']:
                prompt_parts.append(f"including {', '.join(elements['objects'])}")
            if elements['environment']:
                prompt_parts.append(f"in {', '.join(elements['environment'])}")
            if elements['lighting']:
                prompt_parts.append(f"with {', '.join(elements['lighting'])}")
            if elements['atmosphere']:
                prompt_parts.append(f"creating {', '.join(elements['atmosphere'])} atmosphere")

            prompt_parts.append(', '.join(elements['style']))

            return ' | '.join(filter(None, prompt_parts))

        except Exception as e:
            print(f"Error in pack_chunk_to_prompt: {e}")
            return "Traditional Chinese landscape painting in elegant style"

    def get_contextual_analysis(self, full_poem, segment, context_dict):
        overall_understanding = self.get_poem_understanding(full_poem)
        detailed_analysis = self.analyze_chunk_parallel(segment, context_dict)
        compact_prompt = self.pack_chunk_to_prompt(detailed_analysis, overall_understanding)

        translated_poem = self.translate_to_english(full_poem)

        return {
            "original_poem": full_poem,
            "translated_poem": translated_poem,
            "overall_understanding": overall_understanding,
            "detailed_analysis": detailed_analysis,
            "compact_prompt": compact_prompt
        }

class BayesianStableDiffusion:
    def __init__(self, model_id="stabilityai/stable-diffusion-xl-base-1.0", num_inference_steps=50,
                 clip_model_name="openai/clip-vit-base-patch32"):
        self.device = "cuda" if torch.cuda.is_available() else "cpu"
        self.model_id = model_id
        self.refiner_id = "stabilityai/stable-diffusion-xl-refiner-1.0"

        print(f"Initializing models on device: {self.device}")
        if torch.cuda.is_available():
            print(f"Available CUDA memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
            clear_gpu_memory()

        try:
            # Load base model
            print(f"Loading base model {model_id}...")
            self.base = DiffusionPipeline.from_pretrained(
                model_id,
                torch_dtype=torch.float16,
                variant="fp16",
                use_safetensors=True
            ).to(self.device)

            # Load refiner model
            print(f"Loading refiner model...")
            self.refiner = DiffusionPipeline.from_pretrained(
                self.refiner_id,
                torch_dtype=torch.float16,
                variant="fp16",
                use_safetensors=True,
                text_encoder_2=self.base.text_encoder_2,
                vae=self.base.vae,
            ).to(self.device)

            # Configure schedulers
            self.base.scheduler = EulerDiscreteScheduler.from_config(
                self.base.scheduler.config,
                use_karras_sigmas=True
            )
            self.refiner.scheduler = EulerDiscreteScheduler.from_config(
                self.refiner.scheduler.config,
                use_karras_sigmas=True
            )

            # Enable optimizations for both models
            for pipe in [self.base, self.refiner]:
                try:
                    pipe.enable_attention_slicing(slice_size="auto")
                    pipe.enable_vae_slicing()
                    pipe.enable_xformers_memory_efficient_attention()
                except Exception as e:
                    print(f"Warning: Could not enable some optimizations: {e}")

            print("Loading CLIP model...")
            self.num_inference_steps = num_inference_steps
            self.clip_processor = CLIPProcessor.from_pretrained(clip_model_name)
            self.clip_model = CLIPModel.from_pretrained(clip_model_name).to(self.device)
            self.clip_model.eval()

            print("Model initialization completed")

        except Exception as e:
            print(f"Error initializing model: {str(e)}")
            traceback.print_exc()
            raise

    def generate_images(self, prompt, negative_prompt="", num_samples=5, guidance_scale=7.5, temperature=1.0):
        try:
            clear_gpu_memory()

            print(f"Generating {num_samples} images with prompt: {prompt}")

            # First pass with base model
            base_images = self.base(
                prompt=[prompt] * num_samples,
                negative_prompt=[negative_prompt] * num_samples,
                num_inference_steps=30,
                denoising_end=0.8,
                guidance_scale=guidance_scale,
                width=1024,
                height=1024,
            ).images

            # Second pass with refiner
            refined_images = []
            for base_image in base_images:
                refined = self.refiner(
                    prompt=prompt,
                    negative_prompt=negative_prompt,
                    image=base_image,
                    num_inference_steps=20,
                    denoising_start=0.8,
                    guidance_scale=guidance_scale,
                ).images[0]
                refined_images.append(refined)

            if not refined_images:
                raise ValueError("No images were generated")

            # Ensure all images are in RGB mode
            images = [img.convert('RGB') if isinstance(img, Image.Image) else Image.fromarray(img).convert('RGB')
                    for img in refined_images]

            # Compute CLIP scores
            likelihoods = self.compute_clip_likelihoods(images, prompt)

            clear_gpu_memory()
            return images, likelihoods

        except Exception as e:
            print(f"Error in generate_images: {str(e)}")
            traceback.print_exc()
            return [], np.array([])

    def compute_clip_likelihoods(self, images, prompt):
        try:
            inputs = self.clip_processor(
                text=[prompt] * len(images),
                images=images,
                return_tensors="pt",
                padding=True
            ).to(self.device)

            with torch.no_grad():
                outputs = self.clip_model(**inputs)
                image_embeds = F.normalize(outputs.image_embeds, p=2, dim=1)
                text_embeds = F.normalize(outputs.text_embeds, p=2, dim=1)
                cosine_similarity = F.cosine_similarity(image_embeds, text_embeds, dim=1)
                likelihoods = (cosine_similarity + 1) / 2
            return likelihoods.cpu().numpy()

        except Exception as e:
            print(f"Error in compute_clip_likelihoods: {str(e)}")
            traceback.print_exc()
            return np.array([0.0] * len(images))

    def compute_mean_and_variance(self, images):
        if isinstance(images[0], Image.Image):
            images = [np.array(img) for img in images]
        images_array = np.array(images) / 255.0
        mean_image = np.mean(images_array, axis=0)
        variance_image = np.var(images_array, axis=0)
        return mean_image, variance_image

class ModelComparisonExperiment:
    def __init__(self):
        self.models = {}
        self.results = {}
        self.best_images_sequence = {}
        self.poem_analyzer = EnhancedPoemAnalyzer()
        self.load_models()

    def load_models(self):
        for model_name, model_id in MODELS_TO_COMPARE.items():
            print(f"Loading {model_name}...")
            try:
                self.models[model_name] = BayesianStableDiffusion(
                    model_id=model_id,
                    num_inference_steps=50
                )
                print(f"Successfully loaded {model_name}")
            except Exception as e:
                print(f"Error loading {model_name}: {str(e)}")

    def run_comparison(self, poem):
        print("\nStarting poem analysis...")

        # Get overall poem understanding first
        overall_analysis = self.poem_analyzer.get_poem_understanding(poem)

        # Split poem into chunks and keep track of their order
        chunks = [chunk.strip() for chunk in re.split('[，。？！]', poem) if chunk.strip()]

        results = {
            model_name: {
                'images': [],
                'scores': [],
                'generation_times': [],
                'clip_scores': [],
                'optimization_results': [],
                'best_images': []
            } for model_name in self.models.keys()
        }

        # Create context dictionary for each chunk
        chunk_contexts = {}
        for i, chunk in enumerate(chunks):
            chunk_contexts[chunk] = {
                'full_poem': poem,
                'previous_chunk': chunks[i-1] if i > 0 else None,
                'chunk_index': i
            }

        for chunk in tqdm(chunks, desc="Processing chunks"):
            print(f"\nProcessing chunk: {chunk}")

            # Pass the context dictionary to get_contextual_analysis
            chunk_analysis = self.poem_analyzer.get_contextual_analysis(
                poem,
                chunk,
                chunk_contexts[chunk]
            )


            for model_name, model in self.models.items():
                print(f"\nUsing model: {model_name}")

                try:
                    start_time = time.time()

                    # Use the enhanced prompt generation
                    main_prompt = chunk_analysis["compact_prompt"]
                    negative_prompt = "low quality, blurry, bad anatomy, bad composition, deformed"

                    print(f"Generated prompt: {main_prompt}")

                    optimal_scale = optimize_guidance_scale(
                        model,
                        main_prompt,
                        negative_prompt,
                        num_samples=5
                    )

                    images, likelihoods = model.generate_images(
                        main_prompt,
                        negative_prompt=negative_prompt,
                        num_samples=5,
                        guidance_scale=optimal_scale
                    )

                    if images and len(images) > 0 and len(likelihoods) > 0:
                        generation_time = time.time() - start_time
                        best_idx = np.argmax(likelihoods)
                        best_image = images[best_idx]

                        results[model_name]['best_images'].append({
                            'image': best_image,
                            'text': chunk,
                            'prompt': main_prompt,
                            'likelihood': likelihoods[best_idx],
                            'analysis': chunk_analysis
                        })

                        results[model_name]['images'].append(images[best_idx])
                        results[model_name]['scores'].append(likelihoods[best_idx])
                        results[model_name]['generation_times'].append(generation_time)
                        results[model_name]['clip_scores'].append(np.mean(likelihoods))
                        results[model_name]['optimization_results'].append(optimal_scale)

                        self.display_model_comparison(
                            images,
                            likelihoods,
                            model_name,
                            main_prompt,
                            generation_time,
                            optimal_scale,
                            chunk_analysis
                        )
                    else:
                        print(f"No valid images generated for {model_name}")

                except Exception as e:
                    print(f"Error processing chunk with {model_name}: {str(e)}")
                    traceback.print_exc()
                    continue

        self.results = results
        return results

    def display_model_comparison(self, images, likelihoods, model_name, prompt, generation_time, guidance_scale, analysis):
        mean_image, variance_image = self.models[model_name].compute_mean_and_variance(images)

        n = len(images) + 2
        fig = plt.figure(figsize=(5*n, 12))
        gs = gridspec.GridSpec(4, n, height_ratios=[1, 1, 8, 1])

        prompt_ax = plt.subplot(gs[1, :])
        prompt_ax.axis('off')
        prompt_ax.text(0.5, 0.5, f"Model: {model_name}\nPrompt: {prompt}",
                      ha='center', va='center', wrap=True,
                      fontsize=12)

        axes = [plt.subplot(gs[2, i]) for i in range(n)]
        best_idx = np.argmax(likelihoods)

        for i, (ax, img) in enumerate(zip(axes[:len(images)], images)):
            ax.imshow(img)
            ax.axis('off')

            if i == best_idx:
                title = f"Selected Image\nLikelihood: {likelihoods[i]:.3f}"
                ax.set_title(title, color='green', fontweight='bold')
            else:
                title = f"Sample {i+1}\nLikelihood: {likelihoods[i]:.3f}"
                ax.set_title(title)

        axes[-2].imshow(mean_image)
        axes[-2].axis('off')
        axes[-2].set_title("Mean Image")

        axes[-1].imshow(variance_image, cmap='viridis')
        axes[-1].axis('off')
        axes[-1].set_title("Variance Image")

        metrics_ax = plt.subplot(gs[3, :])
        metrics_ax.axis('off')
        metrics_text = f"Generation Time: {generation_time:.2f}s | "
        metrics_text += f"Mean CLIP Score: {np.mean(likelihoods):.3f} | "
        metrics_text += f"Optimal Guidance Scale: {guidance_scale:.2f}"
        metrics_ax.text(0.5, 0.5, metrics_text,
                       ha='center', va='center',
                       fontsize=10)

        plt.tight_layout()
        plt.show()

def optimize_guidance_scale(model, prompt, negative_prompt="", num_samples=5):
    def objective(guidance_scale):
        try:
            images, likelihoods = model.generate_images(
                prompt,
                negative_prompt=negative_prompt,
                num_samples=num_samples,
                guidance_scale=guidance_scale
            )
            return np.mean(likelihoods) if len(likelihoods) > 0 else 0.0
        except Exception as e:
            print(f"Error in objective function: {str(e)}")
            return 0.0

    try:
        optimizer = BayesianOptimization(
            f=objective,
            pbounds={"guidance_scale": (7.0, 12.0)},
            random_state=42,
            verbose=0
        )

        optimizer.maximize(
            init_points=2,
            n_iter=5
        )
        return optimizer.max['params']['guidance_scale']
    except Exception as e:
        print(f"Error in optimization: {str(e)}")
        return 7.5

def main():
    # Load available poems and let user choose
    json_file_path = '/content/drive/MyDrive/Colab Notebooks/Capstone/Poem Database/poem_database.json'  # Replace with your JSON file path
    try:
        with open(json_file_path, 'r', encoding='utf-8') as f:
            data = json.load(f)
            print("\nAvailable poems:")
            for poem in data['poems']:
                print(f"- {poem['title']}")
    except Exception as e:
        print(f"Error loading poems file: {e}")
        return

    # Get user input
    poem_title = input("\nWhat poem would you like to visualize? ")

    # Load the selected poem
    poem_data = load_poem_from_json(json_file_path, poem_title)
    if not poem_data:
        print("Failed to load poem data.")
        return

    poem = poem_data['content']
    print(f"\nLoaded poem: {poem_data['title']}")
    print(f"Author: {poem_data['author']}")
    print(f"Content: {poem}")

    # Create and run the experiment
    experiment = ModelComparisonExperiment()
    results = experiment.run_comparison(poem)

    # Generate visualization video with poem data
    # experiment.create_visualization_video(poem, poem_data)  # Pass both poem content and metadata

    # Create report
    report = pd.DataFrame({
        model_name: {
            'Mean CLIP Score': np.mean(data['clip_scores']) if data['clip_scores'] else 0.0,
            'Mean Generation Time': np.mean(data['generation_times']) if data['generation_times'] else 0.0,
            'Mean Optimal Scale': np.mean(data['optimization_results']) if data['optimization_results'] else 0.0,
            'Best Score': max(data['scores']) if data['scores'] else 0.0,
            'Worst Score': min(data['scores']) if data['scores'] else 0.0
        }
        for model_name, data in results.items()
    }).T

    print("\nModel Comparison Report:")
    print(report)

    plt.figure(figsize=(15, 5))
    metrics = ['Mean CLIP Score', 'Mean Generation Time', 'Mean Optimal Scale']
    for i, metric in enumerate(metrics, 1):
        plt.subplot(1, 3, i)
        report[metric].plot(kind='bar')
        plt.title(metric)
        plt.xticks(rotation=45)

    plt.tight_layout()
    plt.show()

if __name__ == "__main__":
    main()


Available poems:
- 感遇四首其一
- 感遇四首其二
- 感遇四首其三
- 感遇四首其四
- 下终南山过斛斯山人宿置酒
- 月下独酌
- 春思
- 望岳
- 赠卫八处士
- 佳人
- 梦李白二首·其一
- 梦李白二首·其二
- 送别
- 送綦毋潜落第还乡
- 青溪
- 渭川田家
- 西施咏
- 同从弟南斋玩月忆山阴崔少府
- 郡斋雨中与诸文士燕集
- 初发扬子寄元大校书
- 寄全椒山中道士
- 长安遇冯著
- 夕次盱眙县
- 东郊
- 送杨氏女
- 晨诣超师院读禅经
- 溪居
- 塞上曲·其一
- 塞下曲
- 关山月
- 子夜四时歌·春歌
- 子夜四时歌·夏歌
- 子夜四时歌·秋歌
- 子夜四时歌·冬歌
- 烈女操
- 游子吟
- 登幽州台歌
- 古意
- 送陈章甫
- 琴歌
- 听董大弹胡笳弄兼寄语房给事
- 听安万善吹觱篥歌
- 夜归鹿门山歌
- 庐山谣寄卢侍御虚舟
- 梦游天姥吟留别 
- 金陵酒肆留别 

What poem would you like to visualize? 感遇四首其一

Loaded poem: 感遇四首其一
Author: 张九龄
Content: 孤鸿海上来，池潢不敢顾。侧见双翠鸟，巢在三珠树。矫矫珍木巅，得无金丸惧？美服患人指，高明逼神恶？今我游冥冥，弋者何所慕！
Loading SDXL...
Initializing models on device: cuda
Available CUDA memory: 42.48 GB
Loading base model stabilityai/stable-diffusion-xl-base-1.0...


Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

Loading refiner model...


Loading pipeline components...:   0%|          | 0/5 [00:00<?, ?it/s]

Loading CLIP model...
Model initialization completed
Successfully loaded SDXL

Starting poem analysis...


Processing chunks:   0%|          | 0/10 [00:00<?, ?it/s]


Processing chunk: 孤鸿海上来

Using model: SDXL
Generated prompt: featuring the lonely goose | with coming from across the sea | RTX ultra detailed realism, traditional China
Generating 5 images with prompt: featuring the lonely goose | with coming from across the sea | RTX ultra detailed realism, traditional China


  0%|          | 0/19 [00:00<?, ?it/s]

Processing chunks:   0%|          | 0/10 [00:49<?, ?it/s]


KeyboardInterrupt: 

## Results and Output
The system will:
- Display generated images for each poem segment
- Save a visualization video to your Drive
- Show performance metrics and comparison charts

Video output location:
`/content/drive/MyDrive/Colab Notebooks/Capstone/Video Generated/`