# 如何使用多模态数据调用工具

:::info 预备知识

本指南假定您熟悉以下概念：

- [聊天模型](/docs/concepts/chat_models)
- [LangChain 工具](/docs/concepts/tools)

:::

在这里，我们演示如何使用多模态数据（如图像）调用工具。

某些多模态模型，例如可以对图像或音频进行推理的模型，也支持[工具调用](/docs/concepts/#tool-calling)功能。

要使用此类模型调用工具，只需以[常规方式](/docs/how_to/tool_calling)将工具绑定到模型，并使用所需类型的内容块（例如包含图像数据的内容块）调用模型即可。

下面，我们演示使用 [OpenAI](/docs/integrations/platforms/openai) 和 [Anthropic](/docs/integrations/platforms/anthropic) 的示例。在所有情况下，我们将使用相同的图像和工具。首先选择一个图像，并构建一个占位符工具，该工具期望输入字符串 "sunny"、"cloudy" 或 "rainy"。我们将要求模型描述图像中的天气。

:::note
`tool` 函数在 `@langchain/core` 版本 0.2.7 及以上中可用。

如果您使用的是 core 的旧版本，则应使用 [`DynamicStructuredTool`](https://api.js.langchain.com/classes/langchain_core.tools.DynamicStructuredTool.html) 实例化并使用它。
:::

In [2]:
import { tool } from "@langchain/core/tools";
import { z } from "zod";

const imageUrl = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg";

const weatherTool = tool(async ({ weather }) => {
  console.log(weather);
  return weather;
}, {
  name: "multiply",
  description: "Describe the weather",
  schema: z.object({
    weather: z.enum(["sunny", "cloudy", "rainy"])
  }),
});

## OpenAI

对于OpenAI，我们可以将图像URL直接作为类型为"image_url"的内容块输入：

In [2]:
import { HumanMessage } from "@langchain/core/messages";
import { ChatOpenAI } from "@langchain/openai";

const model = new ChatOpenAI({
  model: "gpt-4o",
}).bindTools([weatherTool]);

const message = new HumanMessage({
  content: [
    {
      type: "text",
      text: "describe the weather in this image"
    },
    {
      type: "image_url",
      image_url: {
        url: imageUrl
      }
    }
  ],
});

const response = await model.invoke([message]);

console.log(response.tool_calls);

[
  {
    name: "multiply",
    args: { weather: "sunny" },
    id: "call_ZaBYUggmrTSuDjcuZpMVKpMR"
  }
]


请注意，我们在模型响应中以LangChain的[标准格式](/docs/how_to/tool_calling)恢复带有解析参数的工具调用。

## Anthropic

对于Anthropic，我们可以将一个base64编码的图像格式化为类型为"image"的内容块，如下所示：

In [3]:
import * as fs from "node:fs/promises";

import { ChatAnthropic } from "@langchain/anthropic";
import { HumanMessage } from "@langchain/core/messages";

const imageData = await fs.readFile("../../data/sunny_day.jpeg");

const model = new ChatAnthropic({
  model: "claude-3-sonnet-20240229",
}).bindTools([weatherTool]);

const message = new HumanMessage({
  content: [
    {
      type: "text",
      text: "describe the weather in this image",
    },
    {
      type: "image_url",
      image_url: {
        url: `data:image/jpeg;base64,${imageData.toString("base64")}`,
      },
    },
  ],
});

const response = await model.invoke([message]);

console.log(response.tool_calls);

[
  {
    name: "multiply",
    args: { weather: "sunny" },
    id: "toolu_01HLY1KmXZkKMn7Ar4ZtFuAM"
  }
]


## 谷歌生成式人工智能

对于谷歌生成式人工智能（GenAI），我们可以将经过 base64 编码的图像格式化为类型为 "image" 的内容块，如下所示：

In [4]:
import { ChatGoogleGenerativeAI } from "@langchain/google-genai";
import axios from "axios";
import { ChatPromptTemplate, MessagesPlaceholder } from "@langchain/core/prompts";
import { HumanMessage } from "@langchain/core/messages";

const axiosRes = await axios.get(imageUrl, { responseType: "arraybuffer" });
const base64 = btoa(
  new Uint8Array(axiosRes.data).reduce(
    (data, byte) => data + String.fromCharCode(byte),
    ''
  )
);

const model = new ChatGoogleGenerativeAI({ model: "gemini-1.5-pro-latest" }).bindTools([weatherTool]);

const prompt = ChatPromptTemplate.fromMessages([
  ["system", "describe the weather in this image"],
  new MessagesPlaceholder("message")
]);

const response = await prompt.pipe(model).invoke({
  message: new HumanMessage({
    content: [{
      type: "media",
      mimeType: "image/jpeg",
      data: base64,
    }]
  })
});
console.log(response.tool_calls);

[ { name: 'multiply', args: { weather: 'sunny' } } ]


### 音频输入

Google的Gemini还支持音频输入。在下一个示例中，我们将看到如何将音频文件传递给模型，并以结构化格式获取摘要。

In [6]:
import { SystemMessage } from "@langchain/core/messages";
import { tool } from "@langchain/core/tools";

const summaryTool = tool((input) => {
  return input.summary;
}, {
  name: "summary_tool",
  description: "Log the summary of the content",
  schema: z.object({
    summary: z.string().describe("The summary of the content to log")
  }),
});

const audioUrl = "https://www.pacdv.com/sounds/people_sound_effects/applause-1.wav";

const axiosRes = await axios.get(audioUrl, { responseType: "arraybuffer" });
const base64 = btoa(
  new Uint8Array(axiosRes.data).reduce(
    (data, byte) => data + String.fromCharCode(byte),
    ''
  )
);

const model = new ChatGoogleGenerativeAI({ model: "gemini-1.5-pro-latest" }).bindTools([summaryTool]);

const response = await model.invoke([
  new SystemMessage("Summarize this content. always use the summary_tool in your response"),
  new HumanMessage({
  content: [{
    type: "media",
    mimeType: "audio/wav",
    data: base64,
  }]
})]);

console.log(response.tool_calls);

[
  {
    name: 'summary_tool',
    args: { summary: 'The video shows a person clapping their hands.' }
  }
]
