reward-design

Here are 7 public repositories matching this topic...

holarissun / RewardShifting

Code for NeurIPS 2022 paper Exploiting Reward Shifting in Value-Based Deep RL

reinforcement-learning ensemble ensemble-learning rnd deep-q-network reward-design reward-shaping exploration-exploitation value-based-methods reward-engineering offline-reinforcement-learning dqn-rnd ensemble-rl

Updated Oct 29, 2023
Python

Digitalized-Energy-Systems / opfgym

Star

A gymnasium-compatible framework to create reinforcement learning (RL) environment for solving the optimal power flow (OPF) problem. Contains five OPF benchmark environments for comparable research.

benchmark environment reinforcement-learning supervised-learning rl optimal-power-flow energy-system gymnasium opf pandapower contextual-bandit reward-design reward-shaping power-system environment-design action-shaping

Updated Mar 22, 2025
Python

zli12321 / long_form_rl

Star

grpo to train long form QA and instructions with long-form reward model

reinforcement-learning-algorithms evaluation-framework reward-design rl-training long-form-text-generation qwen2-5 grpo rlvr

Updated Jun 23, 2025
Python

kkhetarpal / ais

Star

Common repository for our readings and discussions

reinforcement-learning ai intrinsic-motivation aisafety saferl reward-design

Updated Apr 16, 2018

pedrodbs / SocioEmotionalIMRL

Star

Socio-Emotional Reward Design for Intrinsically-Motivated Reinforcement Learning Agents

reinforcement-learning csharp genetic-programming emotions intrinsic-motivation reward-design emotional-appraisal

Updated May 25, 2018
C#

aleksa-sukovic / iclr2024-reward-design-for-justifiable-rl

Star

Code for the paper "Reward Design for Justifiable Sequential Decision-Making"; ICLR 2024

reinforcement-learning alignment preference-learning reward-design preference-based-reinforcement-learning

Updated Feb 27, 2024
Jupyter Notebook

ksm26 / Reinforcement-Fine-Tuning-LLMs-with-GRPO

Star

The course teaches how to fine-tune LLMs using Group Relative Policy Optimization (GRPO)—a reinforcement learning method that improves model reasoning with minimal data. Learn RFT concepts, reward design, LLM-as-a-judge evaluation, and deploy jobs on the Predibase platform.

reinforcement-learning machine-learning-algorithms language-model reward-design rft ai-training deeplearning-ai-courses ai-optimization multi-step-reasoning ai-evaluation rlhf llm-fine-tuning opensource-ai llm-as-judge predibase grpo llm-development token-level-control

Updated Jun 13, 2025
Jupyter Notebook

Improve this page

Add a description, image, and links to the reward-design topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the reward-design topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reward-design

Here are 7 public repositories matching this topic...

holarissun / RewardShifting

Digitalized-Energy-Systems / opfgym

zli12321 / long_form_rl

kkhetarpal / ais

pedrodbs / SocioEmotionalIMRL

aleksa-sukovic / iclr2024-reward-design-for-justifiable-rl

ksm26 / Reinforcement-Fine-Tuning-LLMs-with-GRPO

Improve this page

Add this topic to your repo