[ICLR 2025] IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
-
Updated
Feb 19, 2025 - Python
[ICLR 2025] IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
Datasets collection and preprocessings framework for NLP extreme multitask learning
Efficient LLM inference on Slurm clusters using vLLM.
official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and Alternatives
Learning to route instances for Human vs AI Feedback
[ACL2024 Findings]DMoERM: Recipes of Mixture-of-Experts for Effective Reward Modeling
The code used in the paper "DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging"
Building an LLM with RLHF involves fine-tuning using human-labeled preferences. Based on Learning to Summarize from Human Feedback, it uses supervised learning, reward modeling, and PPO to improve response quality and alignment.
Add a description, image, and links to the reward-modeling topic page so that developers can more easily learn about it.
To associate your repository with the reward-modeling topic, visit your repo's landing page and select "manage topics."