Skip to content

Agentic Workflow Evaluation: Text Summarization Agent. This project includes an AI agent evaluation workflow using a text summarization model with OpenAI API and Transformers library. It follows an iterative approach: generate summaries, analyze metrics, adjust parameters, and retest to refine AI agents for accuracy, readability, and performance.

Notifications You must be signed in to change notification settings

ashleysally00/agent_eval_testing_workflow

Repository files navigation

Agentic Workflow Evaluation: Text Summarization Agent

still working on this rewrite later, wip

Overview πŸ“–

This project contains a simple workflow for evaluating AI agents. The goal is to systematically assess and refine an AI agent's performance by testing, analyzing outputs, adjusting parameters, and retesting. This example implements a text summarization agent using:

  • OpenAI API for text summarization
  • Transformers library for semantic similarity analysis

Project Structure

  • agent.py β†’ Initial implementation of the text summarization agent
  • test_workflow.py β†’ First round of testing with sample inputs
  • metrics.py β†’ Evaluation metrics, including semantic similarity and readability
  • readability.py β†’ Calculates readability scores
  • semantic_similarity.py β†’ Computes semantic similarity between original text and summaries
  • edited_parameters.py β†’ Adjusted agent settings after analyzing initial results
  • edited_eval.py β†’ Retesting after modifying agent behavior

Step 1: Initial Test & Results

Input Text:

Climate change is a major contemporary challenge, characterized by rising global temperatures that cause extreme weather, melting ice, and ecosystem disruptions. Human activities like deforestation and industrial pollution exacerbate these effects. Scientists stress the urgency of reducing greenhouse gas emissions to mitigate environmental impacts.

Initial Summary Output:

Climate change leads to extreme weather, melting ice, and ecosystem disruptions. Human activities worsen the problem. Scientists urge reducing emissions.

Evaluation Metrics:

  • Semantic Similarity Score: 0.90
  • Flesch Reading Ease: -2.68 (complex)
  • SMOG Index: 16.30 (difficult to read)

Observations:

  • The summary retained key points but was still complex
  • The readability score indicated high difficulty
  • The agent needed adjustments for more accessible summaries

Step 2: Adjusting Parameters & Retesting

Modifications:

  1. Adjusted temperature and max_tokens to simplify language
  2. Applied post-processing for clarity

New Summary Output:

Climate change is a big problem today. It causes higher temperatures, extreme weather, and melting ice. This affects nature and wildlife. Human actions like cutting down trees and pollution make it worse. Scientists say we must act now to cut down on greenhouse gases.

New Evaluation Metrics:

  • Semantic Similarity Score: 0.88
  • Flesch Reading Ease: 71.00 (easy to read)
  • SMOG Index: 7.60 (much simpler language)

Observations:

  • The summary remained accurate while becoming more readable
  • Better balance between information retention and accessibility
  • The agent was successfully refined through iterative testing

Key Takeaways

This project demonstrates how to evaluate an agentic AI workflow using a structured testing process:

  1. Generate initial outputs β†’ Assess AI performance
  2. Measure metrics β†’ Semantic similarity, readability, etc.
  3. Identify areas for improvement β†’ Adjust prompts, parameters, or processing
  4. Retest and compare β†’ Observe performance changes

This approach is useful for any AI-driven agent, from summarization to decision-making systems, ensuring continuous improvement and alignment with intended objectives.

How to Use This Project

Clone the Repository

git clone https://github.com/ashleysally00/agent_eval_testing_workflow.git
cd agent_eval_testing_workflow

Run Initial Test

python test_workflow.py

Evaluate Outputs

python metrics.py

Modify Parameters & Retest

python edited_parameters.py
python edited_eval.py

Conclusion

This workflow offers a clear method for testing and refining AI agents. By using evaluation metrics and making iterative improvements, we can enhance their performance and create more user-friendly AI outputs.

About

Agentic Workflow Evaluation: Text Summarization Agent. This project includes an AI agent evaluation workflow using a text summarization model with OpenAI API and Transformers library. It follows an iterative approach: generate summaries, analyze metrics, adjust parameters, and retest to refine AI agents for accuracy, readability, and performance.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages