A Contrastive Compositional Benchmark for Text-to-Image Synthesis: A Study with Unified Text-to-Image Fidelity Metrics

Xiangru Zhu¹, Penglei Sun², Chengyu Wang³, Jingping Liu⁴, Zhixu Li¹, Yanghua Xiao¹, Jun Huang³

¹Fudan University, ²The Hong Kong University of Science and Technology (Guangzhou), ³Alibaba Group, ⁴East China University of Science and Technology

Updates

✅ Winoground-T2I Dataset and Templates
⬜ Images Generated (7 Benchmarks) and T2I Fidelity Metric Results (9 Metrics)
⬜ Code for Data Collection
⬜ Code for Evaluating the Reliability of Metrics from 4 Perspectives
⬜ Results of Human Evaluation and Code for the Annotation Interface
⬜ Code for the improved version of LLMScore with self-verification

Dataset

Winoground-T2I Dataset: data/dataset/

Templates: data/template/

Acknowledgments

We makes use of several T2I fidelity metrics to evaluate T2I synthesis models.