Pluggable DeepEval scaffold for RAG, agents, and LLM apps across Anthropic, Bedrock, Azure OpenAI, and Vertex. Ships traceability, test synthesis, safety/PII gating, multi-turn conversation eval, agentic tool-use scoring, JSON validation, judge benchmarks, hyperparameter sweeps, and pytest CI — one Makefile target per feature.
-
Updated
Jun 3, 2026 - Python