A MNIST-like fashion product database. Benchmark 👇
-
Updated
Jun 13, 2022 - Python
A MNIST-like fashion product database. Benchmark 👇
OpenMMLab Pose Estimation Toolbox and Benchmark.
Benchmarks of approximate nearest neighbor libraries in Python
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
A series of large language models developed by Baichuan Intelligent Technology
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Python package for the evaluation of odometry and SLAM
A 13B large language model developed by Baichuan Intelligent Technology
A unified evaluation framework for large language models
[ICLR 2024] SWE-bench: Can Language Models Resolve Real-world Github Issues?
A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
Reference implementations of MLPerf™ training benchmarks
A machine learning toolkit for log parsing [ICSE'19, DSN'16]
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
📊 Benchmark multiple object trackers (MOT) in Python
Efficient Retrieval Augmentation and Generation Framework
⚡FlashRAG: A Python Toolkit for Efficient RAG Research
Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".
Add a description, image, and links to the benchmark topic page so that developers can more easily learn about it.
To associate your repository with the benchmark topic, visit your repo's landing page and select "manage topics."