evaluation-metrics

Star

Here are 180 public repositories matching this topic...

athina-ai / athina-evals

Star

Python SDK for running evaluations on LLM generated responses

evaluation evaluation-metrics evaluation-framework llmops llm-eval llm-ops llm-evaluation llm-evaluation-toolkit

Updated Jul 28, 2024
Python

Sambhav-Gautam / bioplanner

Star

Data from BioPlanner: Automatic Evaluation of LLMs on Protocol Planning in Biology paper

testing automation bioinformatics biology paper evaluation arxiv evaluation-metrics bioinformatics-tool arxiv-papers bioprotocol

Updated Jul 27, 2024
Python

AgentOps-AI / agentops

Star

Python SDK for agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks like CrewAI, Langchain, and Autogen

agent ai openai evaluation-metrics mistral cost-estimation autogen groq agentops llm langchain anthropic evals ollama crewai

Updated Jul 27, 2024
Python

confident-ai / deepeval

Star

The LLM Evaluation Framework

evaluation-metrics evaluation-framework llm-evaluation llm-evaluation-framework llm-evaluation-metrics

Updated Jul 27, 2024
Python

Striveworks / valor

Star

Valor is a centralized evaluation store which makes it easy to measure, explore, and rank model performance.

computer-vision evaluation classification object-detection image-segmentation evaluation-metrics model-evaluation mlops

Updated Jul 27, 2024
Python

huggingface / lighteval

Star

LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.

evaluation evaluation-metrics evaluation-framework huggingface

Updated Jul 28, 2024
Python

kolenaIO / kolena

Star

Python client for Kolena's machine learning testing platform

testing machine-learning evaluation evaluation-metrics evaluation-framework mlops evaluate-models llmops

Updated Jul 24, 2024
Python

TonicAI / tonic_validate

Star

Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.

evaluation-metrics evaluation-framework rag large-language-models llm llms llmops retrieval-augmented-generation

Updated Jul 22, 2024
Python

VinAIResearch / tise-toolbox

Star

TISE: Bag of Metrics for Text-to-Image Synthesis Evaluation (ECCV 2022)

evaluation-metrics text-to-image-synthesis language-and-vision toobox eccv2022

Updated Jul 22, 2024
Python

MIND-Lab / OCTIS

Star

OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)

nlp natural-language-processing hyperparameter-optimization topic-modeling nlp-library bayesian-optimization hyperparameter-tuning latent-dirichlet-allocation evaluation-metrics neural-topic-models latent-semantic-analysis topic-models hyperparameter-search non-negative-matrix-factorization nlproc

Updated Jul 25, 2024
Python

34j / boost-loss

Star

Utilities for easy use of custom losses in CatBoost, LightGBM, XGBoost.

scikit-learn sklearn pytorch autograd xgboost gbdt lightgbm hacktoberfest evaluation-metrics sklearn-compatible gradient-boosting catboost custom-loss custom-loss-functions

Updated Jul 23, 2024
Python

shotakoyama / green

Star

GREEN: n-gram F-score for Grammatical Error Correction

natural-language-processing evaluation-metrics grammatical-error-correction

Updated Jul 19, 2024
Python

wenhao728 / awesome-diffusion-v2v

Star

Awesome diffusion Video-to-Video (V2V). A collection of paper on diffusion model-based video editing, aka. video-to-video (V2V) translation. And a video editing benchmark code.

benchmark survey video-editing evaluation-metrics diffusion-models video-to-video

Updated Jul 18, 2024
Python

ZionOchayon / LLM_Conversational_Agent_Customer_Support

Star

A conversational agent for customer support queries was built using React.js for the frontend and Python (Flask) for the backend, using RESTful API architecture. The OpenAI Assistant API manages multiple conversations, utilizing file search with stored FAQs, function API calls, and a NoSQL database for order statuses. Evaluation scripts.

javascript css python html flask reactjs openai nosql-databases restful-api evaluation-metrics

Updated Jul 17, 2024
Python

LAIT-CVLab / TopPR

Star

NeurIPS 2023 - TopP&R: Robust Support Estimation Approach for Evaluating Fidelity and Diversity in Generative Models Official Code

generative-model evaluation-metrics topological-data-analysis

Updated Jul 16, 2024
Python

shi-ang / SurvivalEVAL

Star

The most comprehensive Python package for evaluating survival analysis models.

survival-analysis evaluation-metrics survival-curves

Updated Jul 16, 2024
Python

up42 / image-similarity-measures

Star

📈 Implementation of eight evaluation metrics to access the similarity between two images. The eight metrics are as follows: RMSE, PSNR, SSIM, ISSM, FSIM, SRE, SAM, and UIQ.

processing machine-learning image metrics evaluation-metrics p1