llm-evaluation

[Personalize@EACL 2024] LLM Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language Models.

personality-traits bfi linguistic-alignment llms generative-agents llm-evaluation

Updated Apr 8, 2024
Python

FactScoreLite is an implementation of the FactScore metric, designed for detailed accuracy assessment in text generation. This package builds upon the framework provided by the original FactScore repository, which is no longer maintained and contains outdated functions.

nlp natural-language-processing evaluation openai question-answering gpt-4 answer-evaluation large-language-models llms gpt-evaluation llm-evaluation

Updated Apr 25, 2024
Python

j0st / PoliticalLLM

Star

A framework for automatically manipulating and evaluating the political ideology of LLMs with two ideology tests: Wahl-O-Mat and Political Compass Test.

german pct manifesto-project rag wahlomat political-ideology-detection llms llm-evaluation

Updated May 1, 2024
Python

VidhyaVarshanyJS / EnsembleX

Star

EnsembleX utilizes the Knapsack algorithm to optimize Large Language Model (LLM) ensembles for quality-cost trade-offs, offering tailored suggestions across various domains through a Streamlit dashboard visualization.

python benchmark knapsack huggingface streamlit large-language-models llm llm-evaluation open-llm-leaderboard

Updated May 5, 2024
Python

aws-samples / fm-leaderboarder

Star

FM-Leaderboard-er allows you to create leaderboard to find the best LLM/prompt for your own business use case based on your data, task, prompts

llm-evaluation llm-evaluation-framework llm-benchmarking

Updated May 8, 2024
Python

intuit-ai-research / DCR-consistency

Star

DCR-Consistency: Divide-Conquer-Reasoning for Consistency Evaluation and Improvement of Large Language Models

consistency summarization blackbox divide-and-conquer-approach hallucinations large-language-models llm llm-evaluation

Updated May 23, 2024
Python

Networks-Learning / prediction-powered-ranking

Star

Code for "Prediction-Powered Ranking of Large Language Models", Arxiv 2024.

ranking-algorithm llm-eval llm-evaluation llm-evaluation-framework prediction-powered-inference rank-sets

Updated May 27, 2024
Python

evaluation-tools / nutcracker

Star

Large Model Evaluation Experiments

large-language-models llm llmops llm-evaluation

Updated May 29, 2024
Python

raga-ai-hub / raga-llm-hub

Star

Framework for LLM evaluation, guardrails and security

guardrails llmops llm-security llm-evaluation

Updated May 31, 2024
Python

innerNULL / summary-evaluator

Star

Summary Evaluation Tool

nlp deep-learning text-summarization model-evaluation model-evaluation-metrics llm bertscore llm-evaluation

Updated Jun 18, 2024
Python

Praful932 / llmsearch

Star

Find better generation parameters for your LLM

nlp llm llm-inference llm-evaluation

Updated Jun 9, 2024
Python

LLM-Evaluation-s-Always-Fatiguing / leaf-playground

Star

A framework to build scenario simulation projects where human and LLM based agents can participant in, with a user-friendly web UI to visualize simulation, support automatically evaluation on agent action level.

agent automation evaluations agents agent-based-simulation chatgpt llm-evaluation

Updated Jun 18, 2024
Python

villagecomputing / superpipe

Star

Superpipe - optimized LLM pipelines for structured data

classification data-extraction structured-data data-labeling llm llm-evaluation llm-optimization

Updated Jun 18, 2024
Python

Improve this page

Add a description, image, and links to the llm-evaluation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-evaluation topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-evaluation

Here are 37 public repositories matching this topic...

Agenta-AI / job_extractor_template

ChanLiang / CONNER

Re-Align / just-eval

awesome-software / lm-evaluation-harness

VITA-Group / llm-kick

allenai / CommonGen-Eval

euskoog / openai-assistants-link

ivarfresh / Interaction_LLMs

armingh2000 / FactScoreLite

j0st / PoliticalLLM

VidhyaVarshanyJS / EnsembleX

aws-samples / fm-leaderboarder

intuit-ai-research / DCR-consistency

Networks-Learning / prediction-powered-ranking

evaluation-tools / nutcracker

raga-ai-hub / raga-llm-hub

innerNULL / summary-evaluator

Praful932 / llmsearch

LLM-Evaluation-s-Always-Fatiguing / leaf-playground

villagecomputing / superpipe

Improve this page

Add this topic to your repo