### 🖼️➡️📝 Image-to-Text with Gradio and a Vision LLM

In this notebook, you'll build a Gradio app that lets you upload an image and ask a question about it.  
The model will analyze the image and respond using natural language.

You'll learn how to:
- Upload an image in Gradio
- Pass both image and text to a vision-enabled language model
- Display the model's response in the interface

**🛠️ TODO**

The image file needs to be passed into the model — but right now, the code just has a placeholder.

Find the line with `"images": [...]` and replace it with `["image.jpg"]` so that the model actually receives the image when generating its response.

In [None]:
import gradio as gr
import ollama
from PIL import Image

MODEL = "gemma3:4b-it-qat"

def analyze_image(image, prompt):
    if image is None:
        return "Please upload an image."
    
    # Convert image to format expected by ollama
    image = image.convert("RGB")
    image.save("image.jpg")
    
    response = ollama.chat(
        model=MODEL,
        messages=[
            {"role": "system", "content": "You are a helpful and kind wizard, that gives his opinion about images."},
            {"role": "user", "content": prompt, "images": ["image.jpg"]}
        ],
    )
    return response["message"]["content"]

gr.Interface(
    fn=analyze_image,
    inputs=[
        gr.Image(type="pil", label="Upload Image"),
        gr.Textbox(label="What do you want to know about this image? (optional)")
    ],
    outputs=gr.Textbox(label="Vision Model Output"),
    title="Image-to-Text (Vision LLM)",
    description="Ask a question about the uploaded image."
).launch(server_name="0.0.0.0", server_port=8080)