evaluation-framework

Star

Here are 80 public repositories matching this topic...

ulysses-camara / ulysses-senteval

Star

Benchmark for assessing contextual-semantic sentence models in Brazilian legal domain.

legal brazil evaluation datasets evaluation-framework brazilian-portuguese sentence-transformers sbert legal-domain

Updated Feb 6, 2024
Python

IDT-ITI / XAI-Deepfakes

Star

Code, model and data for our paper: K. Tsigos, E. Apostolidis, S. Baxevanakis, S. Papadopoulos, V. Mezaris, "Towards Quantitative Evaluation of Explainable AI Methods for Deepfake Detection", Proc. ACM Int. Workshop on Multimedia AI against Disinformation (MAD’24) at the ACM Int. Conf. on Multimedia Retrieval (ICMR’24), Thailand, June 2024.

evaluation-framework adversarial-attacks explainable-ai visual-explanations deepfake-detection

Updated May 29, 2024
Python

jinzhuoran / RWKU

Star

RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models

benchmark natural-language-processing right-to-be-forgotten evaluation-framework privacy-protection adversarial-attacks forgetting membership-inference-attack unlearning large-language-models

Updated Jun 18, 2024
Python

MinhVuong2000 / LLMReasonCert

Star

Official Implementation of ACL2024 paper "Direct Evaluation of Chain-of-Thought in Multi-hop Reasoning with Knowledge Graphs"(https://arxiv.org/abs/2402.11199).

framework evaluation knowledge-graph reasoning evaluation-framework llms faithfulness

Updated May 20, 2024
Python

dkuehlwein / capgemini-gdsc

Star

Web-Interface for the evaluation of the different GDSC entries.

python competition capgemini evaluation-framework

Updated Apr 8, 2018
Python

EvilPsyCHo / Open-LLM-Benchmark

Star

Evaluate open-source language models on Agent, formatted output, command following, long text, multilingual, coding, and custom task capabilities. 开源语言模型在Agent，格式化输出，指令追随，长文本，多语言，代码，自定义任务的能力基准测试。

openai evaluation-framework huggingface large-language-models llamacpp vllm llm-agent llms-benchmarking

Updated May 10, 2024
Python

ics-unisg / aqudem

Star

Activity and Sequence Detection Performance Measures: A package to evaluate activity detection results, including the sequence of events given multiple activity types.

events evaluation activity-recognition event-log sequence-recognition evaluation-metrics evaluation-framework business-process-management activity-detection xes sequence-detection

Updated Jun 19, 2024
Python

AI4Bharat / Dhruva-Evaluation-Suite

Star

A tool to perform functional testing and performance testing of the Dhruva Platform

nlp tts locust nmt evaluation-metrics asr evaluation-framework

Updated Oct 18, 2023
Python

OPTML-Group / Unlearn-WorstCase

Star

"Challenging Forgets: Unveiling the Worst-Case Forget Sets in Machine Unlearning" by Chongyu Fan*, Jiancheng Liu*, Alfred Hero, Sijia Liu

evaluation data-privacy evaluation-framework machine-unlearning forgetting data-deletion unlearning data-removal

Updated May 4, 2024
Python

KID-22 / Cocktail

Star

Cocktail: A Comprehensive Information Retrieval Benchmark with LLM-Generated Documents Integration

benchmark information-retrieval evaluation dataset bias evaluation-framework large-language-models llm llms large-language-model llm4ir source-bias

Updated Jun 4, 2024
Python

thlmenezes / TMJudge

Star

A Code Judge Environment

python pbl evaluation-framework

Updated Feb 6, 2020
Python

szegedai / hun_ner_checklist

Star

CHECKLIST-style test cases and the testing of three Hungarian Named Entity Recognition tools.

nlp ner evaluation-framework hungarian-language

Updated Jan 26, 2021
Python

yupidevs / pactus

Star

Framework to evaluate Trajectory Classification Algorithms

python transformers classification trajectory-analysis trajectory evaluation-framework classification-models

Updated Sep 23, 2023
Python

gauravpatil93 / evaluation-framework

Star

evaluation-framework

Updated Apr 3, 2017
Python

gplhegde / Object-Detection-Metrics

Star

Most popular metrics used to evaluate object detection algorithms.

xml-format object-detection pascal-voc evaluation-framework xml-annotation

Updated Apr 11, 2019
Python

e0397123 / AM-FM

Star

dialogue evaluation-framework

Updated Jun 4, 2021
Python

andrewimpellitteri / llm_poli_compass

Star

A program to automate testing open source LLMs for their political compass scores

testing-tools evaluation-framework bias-detection political-compass llm llms llamacpp

Updated Nov 28, 2023
Python

BeFranke / ErrorCategoriesPedestrianDetection

Star

Fine-grained evaluation of pedestrian detectors based on 8 error-categories

machine-learning computer-vision evaluation evaluation-metrics evaluation-framework pedestrian-detection

Updated Mar 27, 2024
Python

antorguez95 / synthetic_data_generation_framework

Star

This repository contains the code of our published work in IEEE JBHI. Our main objective was to demonstrate the feasibility of the use of synthetic data to effectively train Machine Learning algorithms, prooving that it benefits classification performance most of the times.

machine-learning artificial-intelligence classification evaluation-metrics augmentation imbalanced-data smote evaluation-framework synthetic-data adasyn synthetic-dataset-generation copulas ctgan

Updated Aug 8, 2022
Python

hlt-mt / subsonar

Star

Evaluate the quality of SRT files using the multilingual multimodal SONAR model.

subtitles evaluation-metrics evaluation-framework subtitling

Updated May 18, 2024
Python

Improve this page

Add a description, image, and links to the evaluation-framework topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the evaluation-framework topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evaluation-framework

Here are 80 public repositories matching this topic...

ulysses-camara / ulysses-senteval

IDT-ITI / XAI-Deepfakes

jinzhuoran / RWKU

MinhVuong2000 / LLMReasonCert

dkuehlwein / capgemini-gdsc

EvilPsyCHo / Open-LLM-Benchmark

ics-unisg / aqudem

AI4Bharat / Dhruva-Evaluation-Suite

OPTML-Group / Unlearn-WorstCase

KID-22 / Cocktail

thlmenezes / TMJudge

szegedai / hun_ner_checklist

yupidevs / pactus

gauravpatil93 / evaluation-framework

gplhegde / Object-Detection-Metrics

e0397123 / AM-FM

andrewimpellitteri / llm_poli_compass

BeFranke / ErrorCategoriesPedestrianDetection

antorguez95 / synthetic_data_generation_framework

hlt-mt / subsonar

Improve this page

Add this topic to your repo