-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix][Refactor] Modular Transformer Pipeline and Fix Gemini/Anthropic Empty Content Handling #6063
base: main
Are you sure you want to change the base?
[BugFix][Refactor] Modular Transformer Pipeline and Fix Gemini/Anthropic Empty Content Handling #6063
Conversation
While preparing this fix, I noticed we currently don't have explicit tests validating empty-string handling for Gemini API. Would it be helpful if I added a dedicated test case directly in this PR? Or should we handle the test separately in a follow-up PR/issue? Please let me know what you prefer—happy to help either way! |
Yes. It would be good to include the test as part of this PR. Also, we can use model family to check if we need to make the white space adjustment, rather than doing this for all model API |
Another thing is instead of modifying the returned content, we should be modifying the messages that got sent to the model api. So we need to modify to_oai_types function and enable option to use white space instead of empty string for empty content, if the model family is Gemini |
@ekzhu Therefore, I'll implement a closure-based approach that:
However, I still see some merit in the original approach (modifying responses), as it keeps changes minimal and simple. If you prefer simplicity and minimal impact on existing code over the extensibility provided by the closure approach, I'd be happy to revert to the original implementation. I'll proceed with the closure-based solution for now, but please let me know if you'd prefer the simpler response-handling option instead. Happy to accommodate your preference! here is sample code for closure case # Define closure once at the top level
def create_content_transformer(model_family: ModelFamily):
if model_family == ModelFamily.GEMINI:
return lambda c: " " if not c else c
else:
return lambda c: c
# Usage at API request call-site
content_transformer = create_content_transformer(model_family)
oai_messages = [
to_oai_type(msg, content_transformer=content_transformer)
for msg in messages
]
# Usage at token counting
token_count_messages = [
to_oai_type(msg, content_transformer=content_transformer)
for msg in token_count_messages
]
# at to_oai_type
def to_oai_type(
message: LLMMessage,
prepend_name: bool = False,
content_transformer: Callable[[str], str] = lambda x: x
) -> Sequence[ChatCompletionMessageParam]:
if isinstance(message, SystemMessage):
return [system_message_to_oai(message, content_transformer)]
elif isinstance(message, UserMessage):
return [user_message_to_oai(message, prepend_name, content_transformer)]
elif isinstance(message, AssistantMessage):
return [assistant_message_to_oai(message, content_transformer)]
else:
return tool_message_to_oai(message, content_transformer) |
@SongChiYoung I think what you are doing is a useful step toward a more modular way to manage different message transformation logic for each model family. I think your new approach is in the right direction. I would consider making it more modular and easier to maintain by having a global dictionary of separate transformation functions that goes from a list of So we can then refactor the model client to use this dictionary. This will make it easier to solve other problem like this one #6034 easier. As a next step in a different PR, we can address the parsing of |
@ekzhu Thanks for your valuable feedback! Motivation:
Additionally, I'll place this global registry within a dedicated Based on your previous comment, it seems like we're on the same page regarding this structural direction, but I just wanted to confirm once more before proceeding. Let me know if you agree or have additional suggestions—once confirmed, I'll try the implementation! |
I would keep this within the |
b61a868
to
82c50ce
Compare
@ekzhu Thanks again for your thoughtful feedback! Please check my new changes of code.
Details
Before finalizing the change, I just wanted to briefly share why I initially designed the global transformer registry at the top-level namespace (
I fully respect your initial suggestion and I'm happy to move forward by placing it under Could you please let me know your final thoughts on this? Thanks again for your time and patience! |
Updated the PR title and description to better reflect the scope of changes. Let me know if anything else needs clarification! |
@ekzhu As I mentioned in the recent issue I filed (#6083), Anthropic models also raise errors when given empty content — similar to the Gemini case. This applies to both OpenAI-compatible and native Anthropic SDKs. To address this properly, I realized that the transformer logic needs to apply more broadly — not just under the OpenAI namespace. I’m truly sorry to repeat this suggestion again, as I know I previously proposed a similar idea. However, this additional context (supporting Anthropic and potentially other models) makes it clearer that placing the transformer registry under a top-level Please let me know what you think — and I really appreciate your patience reviewing this again. |
They @SongChiYoung thanks for the update. I understand that it is much more modular with the new approach with potential added benefit, but the PR is getting quite large, and it introduces a new architecture that maybe we as maintainers aren't extremely familiar with. Consider this, you may stop contributing to the project after a few months with very good reason as you may start work on other things interest you. But we will still be left with the code we may not understand very well. So, my suggestion is to make the changes incrementally and think about what the minimal design is needed to solve the problems related to Right now, I'd say beefing up the integration test cases for OpenAIChatCompletionClient is more important than creating a new architecture for message transformation. Right now, the tests only use OpenAI and Gemini models, but we hope to increase the modularity and expand to more providers. |
Thank you @ekzhu — I really appreciate your thoughtful feedback, and your team’s dedication to maintaining the quality and long-term sustainability of AutoGen. Your suggestion makes perfect sense, and I’ll proceed as you advised:
In addition, since the Anthropic SDK is specific to Anthropic models only, I’ll go ahead and apply the simple fix there as well — without needing any model check logic — and include that in this PR. Thanks again, and I’ll update the PR shortly! |
Thank you very much for your work! |
@ekzhu
Let me know what you think—appreciate your time and thoughtful guidance throughout this process! |
@@ -142,20 +143,33 @@ def get_mime_type_from_image(image: Image) -> Literal["image/jpeg", "image/png", | |||
return "image/jpeg" | |||
|
|||
|
|||
def __empty_content_to_whitespace( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using a single function with multiple return types, I think you can just split this into two functions for one return type each. That way you don't need to do cast which can be a source of bug as you bypassed type checking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ekzhu Thanks for the suggestion! I agree that avoiding cast()
is the safer and more maintainable approach.
Instead of splitting the logic into two separate functions, I’m planning to update the implementation using @overload
to preserve a unified structure while keeping it type-safe.
I'll adjust the return and argument types accordingly — likely using something like str | Iterable[Any]
— in a way that works cleanly with the rest of the codebase.
Appreciate the feedback, and I’ll follow up with the revision soon!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ekzhu Changed. Plz check it.
return {"content": prefix + message.content} | ||
|
||
|
||
def _set_multimodal_content(message: LLMMessage, context: Dict[str, Any]) -> Dict[str, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since now we are exclusively dealing with OpenAI types we should be using the types from OpenAI library. This helps us to make sure things are validated via type checking and avoid the use of cast.
|
||
|
||
# === Base Transformers list === | ||
base_system_message_transformers: List[Callable[[LLMMessage, Dict[str, Any]], Dict[str, Any]]] = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here and everywhere else: use types from the OpenAI library to ensure we are covered by type checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored to avoid cast()
by enforcing TransformerFunc
to return Sequence[MessageParam]
.
This preserves type safety while keeping the registry extensible beyond OpenAI.
FunctionExecutionResultMessage: function_execution_result_message, | ||
} | ||
|
||
__CLAUDE_TRANSFORMER_MAP: TransformerMap = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought Claude isn't OpenAI compatible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now claude has openai API : https://docs.anthropic.com/en/api/openai-sdk
And current use it.(without PR #6120)
anthropic_openai_client = lambda: OpenAIChatCompletionClient(
model="claude-3-7-sonnet-20250219",
base_url="https://api.anthropic.com/v1/",
api_key="temp-api-key",
max_tokens=21333,
temperature=1,
top_p=1,
top_k=-1,
model_info={
"vision": True,
"function_calling": True,
"json_output": False,
"family": "claude-3.7-sonnet",
}
)
|
||
|
||
# ===Mini Transformers=== | ||
def _assert_valid_name(message: LLMMessage, context: Dict[str, Any]) -> Dict[str, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if we can have a typed dict for the context instead of a generic dict
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used explicit return type hints (e.g., Dict[str, str]
) in each transformer to ensure key-level type safety and avoid cast()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, Solved all of your require changes
elif isinstance(part, Image): | ||
# TODO: support url based images | ||
# TODO: support specifying details | ||
parts.append(cast(ChatCompletionContentPartImageParam, part.to_openai_format())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: That cast is still inside of prev code before my change
def user_message_to_oai(message: UserMessage, prepend_name: bool = False) -> ChatCompletionUserMessageParam: def to_oai_type(
assert_valid_name(message.source) message: LLMMessage, prepend_name: bool = False, model_family: str = "gpt-4o"
if isinstance(message.content, str): ) -> Sequence[ChatCompletionMessageParam]:
return ChatCompletionUserMessageParam( context = {
content=(f"{message.source} said:\n" if prepend_name else "") + message.content, "prepend_name": prepend_name,
role="user", }
name=message.source, transformers = get_transformer("openai", model_family)
)
else:
parts: List[ChatCompletionContentPartParam] = []
for part in message.content:
if isinstance(part, str):
if prepend_name:
# Append the name to the first text part
oai_part = ChatCompletionContentPartTextParam(
text=f"{message.source} said:\n" + part,
type="text",
)
prepend_name = False
else:
oai_part = ChatCompletionContentPartTextParam(
text=part,
type="text",
)
parts.append(oai_part)
elif isinstance(part, Image):
# TODO: support url based images
# TODO: support specifying details
parts.append(cast(ChatCompletionContentPartImageParam, part.to_openai_format()))
else:
raise ValueError(f"Unknown content type: {part}")
return ChatCompletionUserMessageParam(
content=parts,
role="user",
name=message.source,
)
|
||
|
||
# === Base Transformers list === | ||
base_system_message_transformers: List[Callable[[LLMMessage, Dict[str, Any]], Dict[str, Any]]] = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored to avoid cast()
by enforcing TransformerFunc
to return Sequence[MessageParam]
.
This preserves type safety while keeping the registry extensible beyond OpenAI.
|
||
|
||
# ===Mini Transformers=== | ||
def _assert_valid_name(message: LLMMessage, context: Dict[str, Any]) -> Dict[str, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used explicit return type hints (e.g., Dict[str, str]
) in each transformer to ensure key-level type safety and avoid cast()
.
@ekzhu Done.
|
One potential issue identified during review: In It might be on purpose, just pointing it out if it isn't. |
Why are these changes needed?
This change addresses a compatibility issue when using Google Gemini models with AutoGen. Specifically, Gemini returns a 400 INVALID_ARGUMENT error when receiving a response with an empty "text" parameter.
The root cause is that Gemini does not accept empty string values (e.g., "") as valid inputs in the history of the conversation.
To fix this, if the content field is falsy (e.g., None, "", etc.), it is explicitly replaced with a single whitespace (" "), which prevents the Gemini model from rejecting the request.
""
), causing runtime errors. This PR ensures such messages are safely replaced with whitespace where appropriate.Summary
This PR introduces a modular transformer pipeline for message conversion and fixes a Gemini-specific bug related to empty assistant message content.
Key Changes
[Refactor] Extracted message transformation logic into a unified pipeline to:
[BugFix] Gemini models do not accept empty assistant message content.
_set_empty_to_whitespace
transformer to replace empty strings with" "
only where needed"text"
and"thought"
message types, not to"tools"
to avoid serialization errorsImproved structure for model-specific handling
Test coverage added
Motivation
Originally, Gemini-compatible endpoints would fail when receiving assistant messages with empty content (
""
).This issue required special handling without introducing brittle, ad-hoc patches.
In addressing this, I also saw an opportunity to modularize the message transformation logic across models.
This improves clarity, avoids duplication, and simplifies future adaptations (e.g., different constraints across model families).
📘 AutoGen Modular Message Transformer: Design & Usage Guide
This document introduces the new modular transformer system used in AutoGen for converting
LLMMessage
instances to SDK-specific message formats (e.g., OpenAI-styleChatCompletionMessageParam
).The design improves reusability, extensibility, and maintainability across different model families.
🚀 Overview
Instead of scattering model-specific message conversion logic across the codebase, the new design introduces:
ChatCompletionUserMessageParam
)🧱 1. Define Transform Functions
Each transformer function takes:
LLMMessage
: a structured AutoGen messagecontext: dict
: metadata passed through the builder pipelineAnd returns:
{"content": ..., "name": ..., "role": ...}
)🪢 2. Compose Transformer Pipelines
Multiple transformer functions are composed into a pipeline using
build_transformer_func()
:message_param_func
is the actual constructor for the target message class (usually from the SDK).🗂️ 3. Register Transformer Map
Each model family maintains a
TransformerMap
, which mapsLLMMessage
types to transformers:"openai"
is currently required (as only OpenAI-compatible format is supported now).🔁 4. Conditional Transformers (Optional)
When message construction depends on runtime conditions (e.g.,
"text"
vs."multimodal"
), use:Where:
funcs_map
: maps condition label → list of transformer functionsmessage_param_func_map
: maps condition label → message buildercondition_func
: determines which transformer to apply at runtime🧪 Example Flow
🎯 Design Benefits
_set_name
) is DRY🔮 Future Direction
"openai"
-scoped)Related issue number
Closes #5762
Checks