# How to pass multimodal data directly to models

Here we demonstrate how to pass [multimodal](/docs/concepts/multimodality/) input directly to models. 
We currently expect all input to be passed in the same format as [OpenAI expects](https://platform.openai.com/docs/guides/vision).
For other model providers that support multimodal input, we have added logic inside the class to convert to the expected format.

In this example we will ask a [model](/docs/concepts/chat_models/#multimodality) to describe an image.

In [1]:
from dotenv import load_dotenv

In [2]:
load_dotenv()

True

In [4]:
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"

In [5]:
from langchain_core.messages import HumanMessage
from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-4o")

The most commonly supported way to pass in images is to pass it in as a byte string.
This should work for most model integrations.

In [6]:
import base64

import httpx

image_data = base64.b64encode(httpx.get(image_url).content).decode("utf-8")

In [7]:
message = HumanMessage(
    content=[
        {"type": "text", "text": "describe the weather in this image"},
        {
            "type": "image_url",
            "image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
        },
    ],
)
response = model.invoke([message])
print(response.content)

The weather in the image appears to be clear and sunny. The sky is mostly blue with some scattered white clouds. The sunlight suggests it's a bright day with good visibility. Ideal conditions for a pleasant outdoor stroll.


We can feed the image URL directly in a content block of type "image_url". Note that only some model providers support this.

In [8]:
message = HumanMessage(
    content=[
        {"type": "text", "text": "describe the weather in this image"},
        {"type": "image_url", "image_url": {"url": image_url}},
    ],
)
response = model.invoke([message])
print(response.content)

The weather in the image appears clear and sunny with a bright blue sky and a few scattered clouds. The lighting suggests it might be late afternoon, with the sun casting warm, golden tones on the landscape. The overall scene looks calm and pleasant, indicative of a mild and fair weather day.


We can also pass in multiple images.

In [9]:
message = HumanMessage(
    content=[
        {"type": "text", "text": "are these two images the same?"},
        {"type": "image_url", "image_url": {"url": image_url}},
        {"type": "image_url", "image_url": {"url": image_url}},
    ],
)
response = model.invoke([message])
print(response.content)

The two images are the same.


## Tool calls

Some multimodal models support [tool calling](/docs/concepts/tool_calling) features as well. To call tools using such models, simply bind tools to them in the [usual way](/docs/how_to/tool_calling), and invoke the model using content blocks of the desired type (e.g., containing image data).

In [10]:
from typing import Literal

from langchain_core.tools import tool


@tool
def weather_tool(weather: Literal["sunny", "cloudy", "rainy"]) -> None:
    """Describe the weather"""
    pass


model_with_tools = model.bind_tools([weather_tool])

message = HumanMessage(
    content=[
        {"type": "text", "text": "describe the weather in this image"},
        {"type": "image_url", "image_url": {"url": image_url}},
    ],
)
response = model_with_tools.invoke([message])
print(response.tool_calls)

[]


In [11]:
print(response.content)

The weather in the image appears sunny with a clear blue sky and a few scattered clouds.


# Prompt Template

In [12]:
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
image_data = base64.b64encode(httpx.get(image_url).content).decode("utf-8")

In [13]:
from langchain_core.prompts import ChatPromptTemplate

In [14]:
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "Describe the image provided"),
        (
            "user",
            [
                {
                    "type": "image_url",
                    "image_url": {"url": "data:image/jpeg;base64,{image_data}"},
                }
            ],
        ),
    ]
)

In [15]:
chain = prompt | model

In [16]:
response = chain.invoke({"image_data": image_data})
print(response.content)

The image shows a serene landscape with a wooden boardwalk path cutting through a lush green field. The grass is vibrant, and the scene extends to the horizon with small bushes and trees scattered in the background. The sky is a bright blue with scattered white clouds, creating a peaceful and open atmosphere.


In [19]:
prompt2 = ChatPromptTemplate.from_messages(
    [
        ("system", "compare the two pictures provided"),
        (
            "user",
            [
                {
                    "type": "image_url",
                    "image_url": {"url": "data:image/jpeg;base64,{image_data1}"},
                },
                {
                    "type": "image_url",
                    "image_url": {"url": "data:image/jpeg;base64,{image_data2}"},
                },
            ],
        ),
    ]
)

In [20]:
chain2 = prompt2 | model

In [22]:
response = chain2.invoke({"image_data1": image_data, "image_data2": image_data})
print(response.content)

The two images are identical. They both show a wooden path through a grassy field with a blue sky overhead, featuring wispy clouds. The images have the same composition, colors, and elements.
