# Text-to-Video

We make two API calls.

1) text-to-image: The first API call is to `OpenAI` to create an image accroding to user prompt.
2) image-to-video: The second API call is to use Stable Diffusion model from `StabilityAI` to create a video using an image.

**Warning**: Run this notebook locally to avoid timeout.

## Explore Text-to-Image

Here we use `OpenAI` tool.

In [None]:
! pip install openai

### Exploration

In [None]:
from google.colab import userdata
from openai import OpenAI

In [None]:
OPENAI_API_KEY  = userdata.get('OPENAI_API_KEY')
client = OpenAI(api_key=OPENAI_API_KEY)

In [None]:
response = client.images.generate(
  model="dall-e-3",
  prompt="a white siamese cat",
  size="1024x1024",
  quality="standard",
  n=1,
)

image_url = response.data[0].url

In [None]:
import requests
from PIL import Image
from io import BytesIO

In [None]:
# Send a GET request to the image URL
response = requests.get(image_url)

# Open the image using Pillow
image = Image.open(BytesIO(response.content))

In [None]:
image

### Functionize the Code

In [None]:
import requests
from PIL import Image
from io import BytesIO
import numpy as np
from typing import Tuple, Dict

def generate_image(prompt: str) -> Dict[int, Tuple[Image.Image, np.ndarray]]:
    """
    Generates an image based on the given prompt using OpenAI's DALL-E 3 model, fetches the image,
    converts it to Pillow format and a numpy array, and stores them in a dictionary.

    Args:
    prompt (str): The prompt to generate an image from.

    Returns:
    Dict[int, Tuple[Image.Image, np.ndarray]]: A dictionary with a single key-value pair where the key is 0,
    and the value is a tuple of the Pillow image and its numpy array representation.
    """
    # Initialize the dictionary to store the result
    result = {}

    # Assume 'client' is previously defined and authenticated OpenAI client
    response = client.images.generate(
        model="dall-e-3",
        prompt=prompt,
        size="1024x1024",
        quality="standard",
        n=1
    )

    # Fetch the image using the URL provided in the response
    image_response = requests.get(response.data[0].url)

    # Open the image using Pillow
    pillow_image = Image.open(BytesIO(image_response.content))

    # Convert the Pillow image to a numpy array
    numpy_image = np.array(pillow_image)

    # Store the Pillow image and numpy array in the result dictionary under index 0
    result[0] = (pillow_image, numpy_image)

    return result


In [None]:
%%time

output_objects = generate_image("a white siamese cat")

In [None]:
type(output_objects[0][0]), output_objects[0][0]

## Image-to-Video

We use `pipeline` from `HuggingFace`.

In [None]:
! pip install diffusers

In [None]:
! pip install accelerate

In [None]:
from PIL import Image
from diffusers import DiffusionPipeline
import imageio
import numpy as np

In [None]:
# Load the pretrained model
pipeline = DiffusionPipeline.from_pretrained("stabilityai/stable-video-diffusion-img2vid-xt")


def create_video_from_image(pipeline: DiffusionPipeline, pillow_image: Image.Image) -> None:
    """
    Generates a video from a static Pillow image using a provided pretrained video diffusion model.

    Args:
    pipeline (DiffusionPipeline): The pretrained video diffusion model pipeline.
    pillow_image (Image.Image): The Pillow image object.

    Returns:
    None: The function saves the generated video locally.
    """
    # Generate the video
    video = pipeline(pillow_image)["sample"]  # This will return a list of frames as PIL images

    # Save the video - this example assumes the output is a list of PIL images
    with imageio.get_writer('generated_video.mp4', fps=30) as writer:
        for frame in video:
            writer.append_data(np.array(frame))

    print("Video has been saved as 'generated_video.mp4'.")

In [None]:
%%time

# Example usage (assuming the pipeline has been loaded elsewhere):
create_video_from_image(pipeline, pillow_image=output_objects[0][0])