Skip to content

wonderfull/testpilot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

LangGraph Multi-Agent Requirement System

A sophisticated multi-agent system built with LangGraph and Python that analyzes software requirements and automatically generates high-quality Gherkin BDD features and Playwright automation.

🎯 Overview

This system takes requirements from multiple sources (Jira, files, or raw text), orchestrates an intelligent multi-agent pipeline, and produces:

  • βœ… High-quality Gherkin feature files with proper BDD structure and semantic tags
  • βœ… Test coverage matrices in Markdown format with manual test identification
  • βœ… Playwright BDD automation with Page Object Model (for frontend requirements)
  • βœ… Semantic change detection to avoid unnecessary regeneration (wording-only vs. real changes)
  • βœ… Persistent state & traceability via SQLite (MCP-based)
  • βœ… LangSmith integration for full pipeline observability

πŸ—οΈ Architecture

High-Level Flow

Input (Jira/File/Text)
        ↓
[Ingestion Agent] ← fetch via MCP HTTP/Filesystem
        ↓
[Change Detection Agent] ← semantic diff via Groq Mixtral
        ↓ (skip if wording-only)
[Classification Agent] ← OpenAI classification
        ↓
[Feature Gen Agent] ← generate Gherkin via OpenAI
        ↓
[Coverage Eval Agent] ← Groq Mixtral evaluation
        ↓
[Router Agent] ← decide specialist path
        β”œβ”€β”€ frontend? β†’ [Playwright Agent]
        └── other    β†’ [Post-Eval Agent]
        ↓
[Post-Eval Agent] ← final QA via Groq Mixtral
        ↓
Output (Features, Coverage, Automation)

πŸ›°οΈ Observability & LangSmith

  • All CLI and Gradio runs create a root pipeline span plus child spans for every stage (ingestion, classification, feature generation, etc.) using services/observability.py.

  • Set these environment variables (preferably in .env) to enable tracing:

    LANGSMITH_API_KEY=lsv2_...
    LANGCHAIN_PROJECT=qa-e2e          # or any project name you prefer
    LANGCHAIN_TRACING_V2=true
  • Every run now appears inside the configured LangSmith project with:

    • Inputs (source type, reference)
    • Nested spans per agent with status/errors
    • Final outputs (feature path, coverage report path, specialist notes)
  • The same observability hooks are reused by python main.py ... and the Gradio UI because both call the shared pipeline entry point.

MCP Tool Integration

All IO operations delegate to MCP tools (not direct calls):

Operation MCP Tool Usage
File I/O filesystem Create/read/write feature files, markdown, raw requirements
Database sqlite All SQL operations on db/context.db for persistence
Jira API http / jira Fetch requirements from Jira via REST API
Shell commands shell / terminal Bootstrap Playwright, run npm install
PDF/Docx extraction pdf_extract / docx_extract Parse document requirements

Agents

1. Ingestion Agent (OpenAI)

  • Fetches requirements from Jira, file, or raw text
  • Normalizes and stores in SQLite via MCP
  • Creates initial RunState

2. Change Detection Agent (Groq Mixtral)

  • Compares current vs. previous requirement text
  • Uses semantic diff + Groq to determine if change is meaningful
  • Skips pipeline if wording-only (idempotency)
  • Records change type in SQLite

3. Classification Agent (OpenAI)

  • Classifies work area: frontend | api | data | load | security | other
  • Updates RunState and SQLite

4. Feature Generation Agent (OpenAI)

  • Generates high-quality Gherkin with:
    • Multiple scenarios (happy, edge, negative paths)
    • Proper tags (@area-*, @priority-*, @req-*, @edge, @negative)
    • Clear Given/When/Then steps
  • Saves to playwright/ui/features/{requirement_id}.feature

5. Coverage Evaluation Agent (Groq Mixtral)

  • Evaluates test coverage against requirements
  • Generates coverage matrix in Markdown
  • Identifies manual test gaps
  • Saves to coverage/{requirement_id}-coverage.md

6. Router Agent (deterministic)

  • Routes frontend β†’ Playwright Agent
  • Routes others β†’ Post-Eval Agent (stub for future)

7. Playwright Agent (OpenAI)

  • Bootstraps Playwright + Cucumber project structure
  • Generates TypeScript step definitions
  • Generates Page Object Model classes
  • Saves files to playwright/ui/step-definitions/ and playwright/ui/pages/

8. Post-Evaluation Agent (Groq Mixtral)

  • Final QA: checks alignment between requirements, Gherkin, and automation
  • Appends findings to coverage markdown

πŸŽ›οΈ Web UI (Gradio)

You can run a lightweight Gradio UI to interact with the pipeline locally. The UI supports:

  • Paste requirement text
  • Upload a requirement file (pdf/docx/txt)
  • Provide a Jira key
  • Run pipeline and stream logs
  • Preview generated feature + coverage artifact paths

Run the UI:

# Activate the project's venv (.venv if you used uv)
source .venv/bin/activate

# Install Gradio if not already installed
uv sync   # or pip install gradio

# Start the Gradio app
python ui/gradio_app.py

Visit http://localhost:7860 in your browser.

  • Updates final status in SQLite

πŸ“Š Database Schema (SQLite, via MCP)

All database operations go through the MCP SQLite tool.

Tables

requirements

  • id (PK): Unique identifier
  • requirement_id: Jira key or synthetic ID
  • source_type: "jira" | "file" | "text"
  • source_reference: Jira key / file path / NULL
  • raw_text: Original requirement text
  • normalized_text: Normalized/cleaned text
  • project_area: Classification
  • last_run_id: Most recent run
  • created_at, updated_at: Timestamps

runs

  • run_id (PK): Unique run identifier
  • requirement_id (FK): Links to requirements
  • status: Pipeline status
  • started_at, finished_at: Timestamps
  • notes: Human-readable notes

artifacts

  • id (PK): Artifact identifier
  • run_id (FK): Links to runs
  • type: "feature" | "coverage" | "playwright"
  • path: File system path
  • hash: Content hash for integrity
  • created_at: Timestamp

locks

  • requirement_id (PK): Prevents concurrent modification
  • locked_at: Lock acquisition time
  • locked_by: Who locked it

changes

  • id (PK): Change identifier
  • requirement_id (FK): Links to requirements
  • old_text: Previous normalized text
  • new_text: New normalized text
  • change_type: "semantic_change" | "wording_only"
  • created_at: Timestamp

πŸ”„ Concurrency & Locks

  • Uses locks table to prevent concurrent modification of same requirement
  • Acquire lock before processing, release after completion
  • Supports idempotent re-runs (wording-only changes are skipped)

πŸ“ Project Structure

QAe2e/
β”œβ”€β”€ main.py                      # CLI entrypoint (Typer)
β”œβ”€β”€ graph.py                     # LangGraph workflow definition
β”œβ”€β”€ state.py                     # Pydantic state models
β”‚
β”œβ”€β”€ agents/                      # Agent nodes
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ ingestion_agent.py
β”‚   β”œβ”€β”€ change_detection_agent.py
β”‚   β”œβ”€β”€ classification_agent.py
β”‚   β”œβ”€β”€ feature_gen_agent.py
β”‚   β”œβ”€β”€ coverage_eval_agent.py
β”‚   β”œβ”€β”€ router_agent.py
β”‚   β”œβ”€β”€ playwright_agent.py
β”‚   └── post_eval_agent.py
β”‚
β”œβ”€β”€ services/                    # Service layer (MCP abstractions)
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ db.py                    # MCP SQLite wrapper
β”‚   β”œβ”€β”€ jira_client.py           # MCP HTTP/Jira wrapper
β”‚   β”œβ”€β”€ nlp_utils.py             # Text processing
β”‚   β”œβ”€β”€ diff_utils.py            # Semantic diff
β”‚   β”œβ”€β”€ logging_utils.py         # Structured logging + LangSmith env hooks
β”‚   └── observability.py         # LangSmith span manager for CLI + Gradio runs
β”‚
β”œβ”€β”€ playwright/
β”‚   └── ui/
β”‚       β”œβ”€β”€ features/            # Generated .feature files
β”‚       β”œβ”€β”€ step-definitions/    # Generated .ts step defs
β”‚       β”œβ”€β”€ pages/               # Page object models
β”‚       β”œβ”€β”€ support/             # Playwright hooks/support code
β”‚       └── config/              # Playwright config
β”‚
β”œβ”€β”€ coverage/                    # Generated coverage reports (.md)
β”œβ”€β”€ requirements_raw/            # Raw requirement snapshots
β”œβ”€β”€ parsed/                      # Parsed/processed requirements
β”œβ”€β”€ db/                          # SQLite database
β”‚   └── context.db               # Created/managed via MCP
β”œβ”€β”€ logs/                        # Application logs
β”‚   └── app.log
β”‚
β”œβ”€β”€ README.md                    # This file
β”œβ”€β”€ README-usage.md              # Usage guide
β”œβ”€β”€ pyproject.toml               # Python dependencies
β”œβ”€β”€ .env.example                 # Environment template
└── .github/
    └── copilot-instructions.md  # VS Code Copilot context

πŸš€ Getting Started

Prerequisites

  • Python 3.10+
  • Node.js 16+ (for Playwright)
  • uv (recommended, fast Python package installer) - install from https://docs.astral.sh/uv/
  • API Keys:
    • OpenAI (GPT-4)
    • Groq Mixtral
    • (Optional) Jira API token
    • (Optional) LangSmith API key

Installation

  1. Clone/create workspace:

    cd QAe2e
  2. Set up Python environment with uv (recommended):

    uv venv .venv
    source .venv/bin/activate  # macOS/Linux
    # or
    .venv\Scripts\activate     # Windows

    Or with standard pip:

    python -m venv venv
    source venv/bin/activate  # macOS/Linux
    # or
    venv\Scripts\activate     # Windows
  3. Install dependencies:

    uv sync              # Recommended (fast)
    # or
    pip install -e .     # Alternative
  4. Configure environment:

    cp .env.example .env
    # Edit .env with your API keys
  5. Initialize system:

    python main.py init

Quick Start

# Analyze a requirement from text
python main.py run --text "As a user I want to log in with Google SSO"

# Analyze from Jira
python main.py run --jira-key JIRA-123

# Analyze from file
python main.py run --file requirements.pdf

# Check status
python main.py status --requirement-id JIRA-123

# Verbose mode (debug logging)
python main.py run --jira-key JIRA-123 --verbose

πŸ“– Usage Guide

See README-usage.md for detailed usage documentation, including:

  • CLI commands and examples
  • Interpreting coverage reports
  • Understanding manual test notes
  • Extending for new specialist agents
  • Troubleshooting

πŸ” Observability

LangSmith Integration

Enable LangSmith tracing to inspect all agent executions:

# Set in .env
LANGSMITH_ENABLED=true
LANGCHAIN_API_KEY=ls_...

# Then inspect at https://smith.langchain.com

Each agent call appears as a span with metadata:

  • requirement_id
  • run_id
  • project_area
  • source_type

Logging

Structured JSON logging to logs/app.log:

  • Each log entry includes:
    • timestamp, level, logger
    • run_id, requirement_id (if available)
    • agent_name, metadata (structured context)

πŸ’‘ Key Features

βœ… MCP-Native IO

  • All file operations use filesystem MCP tool
  • All database ops use sqlite MCP tool
  • All external API calls use http MCP tool
  • All shell commands use shell MCP tool

βœ… Semantic Change Detection

  • Avoids unnecessary regeneration of tests when requirements are just reworded
  • Uses Groq Mixtral for intelligent semantic understanding
  • Records change history in SQLite

βœ… Idempotency

  • Wording-only changes skip regeneration
  • Locks prevent concurrent modification
  • Previous versions stored for comparison

βœ… High-Quality Gherkin

  • Multiple scenarios per feature (happy, edge, negative)
  • Proper BDD language (Given/When/Then)
  • Comprehensive tagging system
  • Generated from requirements using LLM

βœ… Test Automation Coverage

  • Coverage matrix with happy/edge/negative/manual assessment
  • Playwright BDD with Page Object Model (for frontend)
  • Step definitions and page objects generated automatically
  • Ready to run: npx playwright test

βœ… LLM Flexibility

  • OpenAI GPT-4 for main agents (thinking, generation)
  • Groq Mixtral for evaluators (cost-effective critic)
  • Easy to swap LLM providers

βœ… Observability & Tracing

  • LangSmith integration for full pipeline visibility
  • Structured logging with context
  • Human-readable CLI with progress updates

πŸ”§ Extending the System

Adding a New Specialist Agent

  1. Create agent module in agents/new_area_agent.py
  2. Implement processing logic with MCP tool calls
  3. Add node to graph.py
  4. Update router logic in router_agent.py
  5. Document in README-usage.md

Customizing LLM Models

Edit in state.py and agent implementations:

OPENAI_MODEL = "gpt-4-turbo"      # or "gpt-3.5-turbo"
GROQ_MODEL = "mixtral-8x7b-32768" # or other Groq model

Adding Manual Test Identification

Edit coverage_eval_agent.py to enhance the manual test detection prompt.

πŸ“ Example Output

Feature File (playwright/ui/features/JIRA-123.feature)

@area-frontend @priority-p0 @req-JIRA-123 @smoke
Feature: Google SSO Login Integration

  @happy-path
  Scenario: Successful login with Google SSO
    Given I am on the login page
    When I click the "Sign in with Google" button
    And I authenticate with valid Google credentials
    Then I should be redirected to the dashboard
    And my user profile should be loaded

  @edge-case
  Scenario: User already logged in via SSO
    Given I am already logged in via Google SSO
    When I navigate to the login page
    Then I should be redirected to the dashboard

  @negative
  Scenario: Login cancelled by user
    Given I am on the login page
    When I click the "Sign in with Google" button
    And I cancel the Google authentication dialog
    Then I should remain on the login page
    And see a message "Login cancelled"

Coverage Report (coverage/JIRA-123-coverage.md)

# Coverage Report: JIRA-123

## Requirement Summary
"As a user I want to log in with Google SSO so that I can access without managing passwords"

## Gherkin Scenarios
Total scenarios: 3

| Scenario ID | Covers Happy Path | Covers Edge Cases | Covers Negative | Automated | Manual Needed | Notes |
|-------------|-------------------|-------------------|-----------------|-----------|---------------|-------|
| Scenario 1  | Yes               | No                | No              | Yes       | No            | -     |
| Scenario 2  | No                | Yes               | No              | Yes       | Yes           | Test Google UI flow |
| Scenario 3  | No                | No                | Yes             | Yes       | No            | -     |

## Coverage Analysis

- Happy path: 100% covered
- Edge cases: 67% covered
- Negative cases: 100% covered
- Overall: 89% coverage

### Manual Tests Needed

1. Test with Google account email not registered in system
2. Test SSO flow on mobile browsers
3. Test error handling for Google API outages

## Post-Generation Review

- βœ… Gherkin accurately represents requirements
- βœ… Acceptance criteria fully covered
- ⚠️ Consider adding scenario for permission denial

πŸ› Troubleshooting

See README-usage.md for common issues and solutions.

πŸ“„ License

MIT License - See LICENSE file for details

🀝 Contributing

Contributions welcome! Please ensure:

  • All new agents follow the LoggingMixin pattern
  • MCP tools are used for all IO (no direct file/db access)
  • Type hints are complete
  • Docstrings are comprehensive
  • Tests are included

πŸ“ž Support

For issues, questions, or feedback:

  • Check logs in logs/app.log
  • Enable verbose mode: --verbose
  • Review LangSmith traces at https://smith.langchain.com
  • Check README-usage.md for common patterns

Built with ❀️ using LangGraph, OpenAI, Groq, and MCP tools

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages