🐢 Open-Source Evaluation & Testing for LLMs and ML models
-
Updated
Jun 13, 2024 - Python
🐢 Open-Source Evaluation & Testing for LLMs and ML models
The LLM Evaluation Framework
The all-in-one LLM developer platform: prompt management, evaluation, human feedback, and deployment all in one place.
Open-Source Evaluation for GenAI Application Pipelines
Python SDK for running evaluations on LLM generated responses
The official evaluation suite and dynamic data release for MixEval.
Superpipe - optimized LLM pipelines for structured data
Evaluating LLMs with CommonGen-Lite
Framework for LLM evaluation, guardrails and security
A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.
Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
The implementation for EMNLP 2023 paper ”Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators“
Find better generation parameters for your LLM
A framework to build scenario simulation projects where human and LLM based agents can participant in, with a user-friendly web UI to visualize simulation, support automatically evaluation on agent action level.
DCR-Consistency: Divide-Conquer-Reasoning for Consistency Evaluation and Improvement of Large Language Models
Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"
Open-LLM-Leaderboard: Open-Style Question Evaluation. Paper at https://arxiv.org/abs/2406.07545
FM-Leaderboard-er allows you to create leaderboard to find the best LLM/prompt for your own business use case based on your data, task, prompts
[ICLR 2024] Jaiswal, A., Gan, Z., Du, X., Zhang, B., Wang, Z., & Yang, Y. Compressing llms: The truth is rarely pure and never simple.
Large Model Evaluation Experiments
Add a description, image, and links to the llm-evaluation topic page so that developers can more easily learn about it.
To associate your repository with the llm-evaluation topic, visit your repo's landing page and select "manage topics."