A framework for few-shot evaluation of language models.
-
Updated
Nov 18, 2024 - Python
A framework for few-shot evaluation of language models.
🐢 Open-Source Evaluation & Testing for ML & LLM systems
The LLM Evaluation Framework
This is the repository of our article published in RecSys 2019 "Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches" and of several follow-up studies.
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
Data-Driven Evaluation for LLM-Powered Applications
Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.
The official evaluation suite and dynamic data release for MixEval.
Python SDK for running evaluations on LLM generated responses
A research library for automating experiments on Deep Graph Networks
Moonshot - A simple and modular tool to evaluate and red-team any LLM application.
PySODEvalToolkit: A Python-based Evaluation Toolbox for Salient Object Detection and Camouflaged Object Detection
Evaluation suite for large-scale language models.
Optical Flow Dataset and Benchmark for Visual Crowd Analysis
Multilingual Large Language Models Evaluation Benchmark
BIRL: Benchmark on Image Registration methods with Landmark validations
LiDAR SLAM comparison and evaluation framework
Official repository of RankEval: An Evaluation and Analysis Framework for Learning-to-Rank Solutions.
A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.
Industrial-level evaluation benchmarks for Coding LLMs in the full life-cycle of AI native software developing.企业级代码大模型评测体系,持续开放中
Add a description, image, and links to the evaluation-framework topic page so that developers can more easily learn about it.
To associate your repository with the evaluation-framework topic, visit your repo's landing page and select "manage topics."