harness-ai

Here are 3 public repositories matching this topic...

sunilp303 / deepeval-evaluation-harness

Pluggable DeepEval scaffold for RAG, agents, and LLM apps across Anthropic, Bedrock, Azure OpenAI, and Vertex. Ships traceability, test synthesis, safety/PII gating, multi-turn conversation eval, agentic tool-use scoring, JSON validation, judge benchmarks, hyperparameter sweeps, and pytest CI — one Makefile target per feature.

bedrock evaluation-metrics vertex-ai azure-openai anthropic ollama rag-evaluation ragas deepeval harness-ai deepeval-metrics

Updated Jun 3, 2026
Python

sunilp303 / trulens-agent-starter

Star

Drop-in TruLens evaluation harness for tool-calling LangGraph agents. Swap LLM providers (OpenAI, Anthropic via LiteLLM, Bedrock, Cortex, Gemini, Ollama) with a single env var. Ships with the RAG Triad plus Plan Quality, Plan Adherence, Execution Efficiency, and Logical Consistency metrics.

bedrock evaluation-metrics vertex-ai azure-openai anthropic ollama rag-evaluation trulens ragas deepeval harness-ai deepeval-metrics

Updated Jun 3, 2026
Python

sunilp303 / ragas-evaluation-harness

Star

Provider-agnostic RAG evaluation harness powered by RAGAS with pluggable LLM and embedding backends.

bedrock evaluation-metrics vertex-ai azure-openai anthropic ollama rag-evaluation ragas harness-ai

Updated Jun 2, 2026
Python

Improve this page

Add a description, image, and links to the harness-ai topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the harness-ai topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly