Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix FP termination in step3_azure_ai_agent_group_chat.py #10771

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

thecsw
Copy link

@thecsw thecsw commented Mar 3, 2025

Motivation and Context

The example script will terminate prematurely because the agent responding "Not approved." will wrongfully trigger the termination condition, since "approved" substring will trigger.

Description

Exclude case-insensitive "not approved" from the termination criteria.

Contribution Checklist

@thecsw thecsw requested a review from a team as a code owner March 3, 2025 19:45
@markwallace-microsoft markwallace-microsoft added the python Pull requests for the Python Semantic Kernel label Mar 3, 2025
@github-actions github-actions bot changed the title Sandy/step3 azure ai agent group chat termination Python: Sandy/step3 azure ai agent group chat termination Mar 3, 2025
@thecsw thecsw changed the title Python: Sandy/step3 azure ai agent group chat termination Fix FP termination in step3_azure_ai_agent_group_chat.py Mar 3, 2025
# The agent would sometimes respond with simple "Not approved," which
# would trigger the termination. Even if the prompt clearly states not
# to use the word, it fails on 4o. This is a simple check to avoid that.
return "approved" in resp and not "not approved" in resp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably won't work for all cases. For example, this will trigger the termination too: "It wasn't approved."

I wonder if tuning the REVIEWER INSTRUCTIONS prompt will be a better solution.

Copy link
Author

@thecsw thecsw Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good note, I tried doing that first with updating the instruction blob with,

If not, provide insight on how to refine suggested copy without example, do NOT say it wasn't simply not approved.

LLMs being LLMs, they don't take commands and would still produce consistently,

# User: a slogan for a new line of electric cars.
# CopyWriter: "Shockingly Smooth. Quietly Powerful."
# ArtDirector: Not approved. While "Shockingly Smooth. Quietly Powerful." plays with the electric theme and contrasts the smoothness and quietness, it feels expected and perhaps too tame for a groundbreaking electric car line. A slogan should reflect the unique essence of the brand—why these cars matter within a crowded market.

etc.

I haven't seen it saying something with "It wasn't approved", not putting it behind the model to generate, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can adjust the termination keyword to something like TERMINATE. This can depend on the model used, too -- for example - gpt-4o-mini may handle it differently compared to gpt-4o.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to what @TaoChenOSU said, I'm more keen on adjusting the Reviewer's instructions to better communicate the termination criteria -- that's either telling it to switch to using TERMINATE if approved. Or respond with APPROVED and instructing it to use all caps.

I understand we should have samples that work, but this is a sample. :) It should guide users towards what's possible in applications and it doesn't need to be the end-all-be-all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python Pull requests for the Python Semantic Kernel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants