Skip to content

Commit 6fc4f53

Browse files
SongChiYoungekzhu
andauthored
FIX: MultiModalMessage in gemini with openai sdk error occured (#6440)
## Why are these changes needed? Multimodal message fill context with other routine. However current `_set_empty_to_whitespace` is fill with context. So, error occured. And, I checked `multimodal_user_transformer_funcs` and I found it, in this routine, context must not be empty. Now remove the `_set_empty_to_whitespace` when multimodal message, <!-- Please give a short summary of the change and the problem this solves. --> ## Related issue number Closes #6439 ## Checks - [ ] I've included any doc changes needed for <https://microsoft.github.io/autogen/>. See <https://github.com/microsoft/autogen/blob/main/CONTRIBUTING.md> to build and test documentation locally. - [x] I've added tests (if relevant) corresponding to the changes introduced in this PR. - [x] I've made sure all auto checks have passed. Co-authored-by: Eric Zhu <ekzhu@users.noreply.github.com>
1 parent 7c29704 commit 6fc4f53

File tree

2 files changed

+27
-1
lines changed

2 files changed

+27
-1
lines changed

python/packages/autogen-ext/src/autogen_ext/models/openai/_message_transform.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -379,7 +379,7 @@ def assistant_condition(message: LLMMessage, context: Dict[str, Any]) -> str:
379379

380380
user_transformer_funcs_gemini: Dict[str, List[Callable[[LLMMessage, Dict[str, Any]], Dict[str, Any]]]] = {
381381
"text": single_user_transformer_funcs + [_set_empty_to_whitespace],
382-
"multimodal": multimodal_user_transformer_funcs + [_set_empty_to_whitespace],
382+
"multimodal": multimodal_user_transformer_funcs,
383383
}
384384

385385

python/packages/autogen-ext/tests/models/test_openai_model_client.py

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@
77

88
import httpx
99
import pytest
10+
from autogen_agentchat.agents import AssistantAgent
11+
from autogen_agentchat.messages import MultiModalMessage
1012
from autogen_core import CancellationToken, FunctionCall, Image
1113
from autogen_core.models import (
1214
AssistantMessage,
@@ -2459,4 +2461,28 @@ def test_find_model_family() -> None:
24592461
assert _find_model_family("openai", "error") == ModelFamily.UNKNOWN
24602462

24612463

2464+
@pytest.mark.asyncio
2465+
@pytest.mark.parametrize(
2466+
"model",
2467+
[
2468+
"gpt-4.1-nano",
2469+
"gemini-1.5-flash",
2470+
"claude-3-5-haiku-20241022",
2471+
],
2472+
)
2473+
async def test_multimodal_message_test(
2474+
model: str, openai_client: OpenAIChatCompletionClient, monkeypatch: pytest.MonkeyPatch
2475+
) -> None:
2476+
# Test that the multimodal message is converted to the correct format
2477+
img = Image.from_base64(
2478+
"iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAIAAACQd1PeAAAADElEQVR4nGP4z8AAAAMBAQDJ/pLvAAAAAElFTkSuQmCC"
2479+
)
2480+
multi_modal_message = MultiModalMessage(content=["Can you describe the content of this image?", img], source="user")
2481+
2482+
ocr_agent = AssistantAgent(
2483+
name="ocr_agent", model_client=openai_client, system_message="""You are a helpful agent."""
2484+
)
2485+
_ = await ocr_agent.run(task=multi_modal_message)
2486+
2487+
24622488
# TODO: add integration tests for Azure OpenAI using AAD token.

0 commit comments

Comments
 (0)