Pinned Loading
Repositories
- onebench Public
[ACL'25] The official code for "ONEBench to Test Them All: Sample-Level Benchmarking Over Open-Ended Capabilities"
- model-vs-human Public
Benchmark your model on out-of-distribution datasets with carefully collected human comparison data (NeurIPS 2021 Oral)
- CiteME Public
CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.
- sort-and-search Public
Code for the paper: "Efficient Lifelong Model Evaluation in an Era of Rapid Progress" [NeurIPS'24]
- frequency_determines_performance Public
Code for the paper: "No Zero-Shot Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance" [NeurIPS'24]
- foolbox Public
A Python toolbox to create adversarial examples that fool neural networks in PyTorch, TensorFlow, and JAX
- DataTypeIdentification Public
Code for the ICLR'24 paper: "Visual Data-Type Understanding does not emerge from Scaling Vision-Language Models"
Top languages
Loading…
Most used topics
Loading…