The LLM Evaluation Framework
-
Updated
Oct 20, 2024 - Python
The LLM Evaluation Framework
🐢 Open-Source Evaluation & Testing for ML models & LLMs
The all-in-one LLM developer platform: prompt management, evaluation, human feedback, and deployment all in one place.
The LLM's practical guide: From the fundamentals to deploying advanced LLM and RAG apps to AWS using LLMOps best practices
Prompty makes it easy to create, manage, debug, and evaluate LLM prompts for your AI applications. Prompty is an asset class and format for LLM prompts designed to enhance observability, understandability, and portability for developers.
The official evaluation suite and dynamic data release for MixEval.
Data-Driven Evaluation for LLM-Powered Applications
Python SDK for running evaluations on LLM generated responses
Connect agents to live web environments evaluation.
Framework for LLM evaluation, guardrails and security
A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.
Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"
Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
FM-Leaderboard-er allows you to create leaderboard to find the best LLM/prompt for your own business use case based on your data, task, prompts
A lightweight evaluation suite tailored specifically for assessing Indic LLMs across a diverse range of tasks
Evaluating LLMs with CommonGen-Lite
Open-LLM-Leaderboard: Open-Style Question Evaluation. Paper at https://arxiv.org/abs/2406.07545
[ICLR 2024] Jaiswal, A., Gan, Z., Du, X., Zhang, B., Wang, Z., & Yang, Y. Compressing llms: The truth is rarely pure and never simple.
DCR-Consistency: Divide-Conquer-Reasoning for Consistency Evaluation and Improvement of Large Language Models
[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models
Add a description, image, and links to the llm-evaluation topic page so that developers can more easily learn about it.
To associate your repository with the llm-evaluation topic, visit your repo's landing page and select "manage topics."