Code for NeurIPS 2022 paper Exploiting Reward Shifting in Value-Based Deep RL
-
Updated
Oct 29, 2023 - Python
Code for NeurIPS 2022 paper Exploiting Reward Shifting in Value-Based Deep RL
A gymnasium-compatible framework to create reinforcement learning (RL) environment for solving the optimal power flow (OPF) problem. Contains five OPF benchmark environments for comparable research.
grpo to train long form QA and instructions with long-form reward model
Common repository for our readings and discussions
Socio-Emotional Reward Design for Intrinsically-Motivated Reinforcement Learning Agents
Code for the paper "Reward Design for Justifiable Sequential Decision-Making"; ICLR 2024
The course teaches how to fine-tune LLMs using Group Relative Policy Optimization (GRPO)—a reinforcement learning method that improves model reasoning with minimal data. Learn RFT concepts, reward design, LLM-as-a-judge evaluation, and deploy jobs on the Predibase platform.
Add a description, image, and links to the reward-design topic page so that developers can more easily learn about it.
To associate your repository with the reward-design topic, visit your repo's landing page and select "manage topics."