Benchmark for assessing contextual-semantic sentence models in Brazilian legal domain.
-
Updated
Feb 6, 2024 - Python
Benchmark for assessing contextual-semantic sentence models in Brazilian legal domain.
Code, model and data for our paper: K. Tsigos, E. Apostolidis, S. Baxevanakis, S. Papadopoulos, V. Mezaris, "Towards Quantitative Evaluation of Explainable AI Methods for Deepfake Detection", Proc. ACM Int. Workshop on Multimedia AI against Disinformation (MAD’24) at the ACM Int. Conf. on Multimedia Retrieval (ICMR’24), Thailand, June 2024.
RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models
Official Implementation of ACL2024 paper "Direct Evaluation of Chain-of-Thought in Multi-hop Reasoning with Knowledge Graphs"(https://arxiv.org/abs/2402.11199).
Web-Interface for the evaluation of the different GDSC entries.
Evaluate open-source language models on Agent, formatted output, command following, long text, multilingual, coding, and custom task capabilities. 开源语言模型在Agent,格式化输出,指令追随,长文本,多语言,代码,自定义任务的能力基准测试。
Activity and Sequence Detection Performance Measures: A package to evaluate activity detection results, including the sequence of events given multiple activity types.
A tool to perform functional testing and performance testing of the Dhruva Platform
"Challenging Forgets: Unveiling the Worst-Case Forget Sets in Machine Unlearning" by Chongyu Fan*, Jiancheng Liu*, Alfred Hero, Sijia Liu
Cocktail: A Comprehensive Information Retrieval Benchmark with LLM-Generated Documents Integration
CHECKLIST-style test cases and the testing of three Hungarian Named Entity Recognition tools.
Framework to evaluate Trajectory Classification Algorithms
Most popular metrics used to evaluate object detection algorithms.
A program to automate testing open source LLMs for their political compass scores
Fine-grained evaluation of pedestrian detectors based on 8 error-categories
This repository contains the code of our published work in IEEE JBHI. Our main objective was to demonstrate the feasibility of the use of synthetic data to effectively train Machine Learning algorithms, prooving that it benefits classification performance most of the times.
Evaluate the quality of SRT files using the multilingual multimodal SONAR model.
Add a description, image, and links to the evaluation-framework topic page so that developers can more easily learn about it.
To associate your repository with the evaluation-framework topic, visit your repo's landing page and select "manage topics."