llm-benchmarking

Here are 13 public repositories matching this topic...

robertvacareanu / llm4regression

Examining how large language models (LLMs) perform across various synthetic regression tasks when given (input, output) examples in their context, without any parameter update

linear-regression sklearn regression regression-models large-language-models llm llms llm-inference llm-benchmarking

Updated Sep 10, 2024
Python

lakeraai / pint-benchmark

Star

A benchmark for prompt injection detection systems.

benchmark llm prompt-injection llm-security llm-benchmarking

Updated Sep 10, 2024
Jupyter Notebook

MJ-Bench / MJ-Bench

Star

Official implementation for "MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?"

reward-models multimodal-foundation-model llm-benchmarking llm-as-a-judge multimodal-judge

Updated Jul 12, 2024
Jupyter Notebook

asimsinan / LLM-Research

Star

A collection of LLM related papers, thesis, tools, datasets, courses, open source models, benchmarks

arxiv-papers large-language-models llm llms llm-datasets llm-tools buyuk-dil-modelleri llm-research llm-theses llm-benchmarking llm-frameworks

Updated Oct 8, 2024
Python

aws-samples / fm-leaderboarder

Star

FM-Leaderboard-er allows you to create leaderboard to find the best LLM/prompt for your own business use case based on your data, task, prompts

llm-evaluation llm-evaluation-framework llm-benchmarking

Updated Jul 21, 2024
Python

AKSW / LLM-KG-Bench

Star

LLM-KG-Bench is a Framework and task collection for automated benchmarking of Large Language Models (LLMs) on Knowledge Graph (KG) related tasks.

sparql rdf knowledge-graph large-language-models llm llm-benchmarking

Updated Oct 19, 2024
Python

An app and set of methodologies designed to evaluate the performance of various Large Language Models (LLMs) on the text-to-SQL task. Our goal is to offer a standardized way to measure how well these models can generate SQL queries from natural language descriptions

benchmark text-to-sql llm-benchmarking