Multi-Agent Constrained Policy Optimisation (MACPO; MAPPO-L).
-
Updated
Apr 17, 2024 - Python
Multi-Agent Constrained Policy Optimisation (MACPO; MAPPO-L).
Policy Optimization with Penalized Point Probability Distance: an Alternative to Proximal Policy Optimization
Implementation of a Deep Reinforcement Learning algorithm, Proximal Policy Optimization (SOTA), on a continuous action space openai gym (Box2D/Car Racing v0)
Mirror Descent Policy Optimization
Model-based Policy Gradients
Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)
Code for Policy Optimization as Online Learning with Mediator Feedback
An implementation of the reinforcement learning for CartPole-v0 by policy optimization
This repo implements the REINFORCE algorithm for solving the Cart Pole V1 environment of the Gymnasium library using Python 3.8 and PyTorch 2.0.1.
Add a description, image, and links to the policy-optimization topic page so that developers can more easily learn about it.
To associate your repository with the policy-optimization topic, visit your repo's landing page and select "manage topics."