A curated list of ๐ฅ Distributed Reinforcement Learning papers and systems. Covering scalable algorithms, system frameworks, multi-agent setups, large models, communication strategies, and RLHF โ 140+ papers, repos and websites, maintained by Dong Liu and Xuqing Yang.
- ๐ Related Surveys and Overviews
- ๐ Algorithms & Theoretical Advances
- ๐ Policy Gradient Methods
- ๐ฏ Value-Based Methods
- ๐ญ Actor-Critic Methods
- โก Asynchronous & Parallel Methods
- ๐ Gradient Aggregation & Optimization
- ๐งฎ Theoretical Foundations
- ๐ฒ Exploration & Sampling
- ๐ง Latent Thought Language Models (LTM) for RL
- ๐ฐ๏ธ Federated Reinforcement Learning (FedRL)
- ๐งฑ System Frameworks
- ๐ก Communication Efficiency
- ๐ฅ Multi-Agent Distributed RL
- ๐ฆพ RLHF & Distributed Human Feedback
- ๐ง Large-Scale Models in RL
- ๐ป Codebases & Benchmarks
- ๐ Resources & Tutorials
- ๐ Contributing
- ๐ License
- ๐ Citation
- Survey on Distributed RL
- Scaling RL
- A Survey on Federated RL
- RLHF: Challenges & Opportunities
- Distributed Deep Reinforcement Learning: A Survey and A Multi-Player Multi-Agent Learning Toolbox
- Acceleration for Deep Reinforcement Learning using Parallel and Distributed Computing: A Survey
- Distributed Training for Reinforcement Learning
- A Survey on Multi-Agent Reinforcement Learning and Its Application
Title | Year | Link |
---|---|---|
IMPALA: Scalable Distributed Deep-RL | 2018 | https://arxiv.org/abs/1802.01561 |
Distributed Proximal Policy Optimization | 2017 | https://arxiv.org/abs/1707.02286 |
Asynchronous Methods for Deep Reinforcement Learning (A3C) | 2016 | https://arxiv.org/abs/1602.01783 |
Asynchronous Federated RL with PG Updates | 2024 | https://arxiv.org/abs/2402.08372 |
Distributed Policy Gradient with Variance Reduction | 2023 | https://arxiv.org/abs/2305.07180 |
Federated PG for Multi-Agent RL | 2023 | https://arxiv.org/abs/2304.12151 |
Scalable Distributed Policy Optimization via Consensus | 2024 | https://arxiv.org/abs/2401.09876 |
Distributed Reinforcement Learning for Decentralized Linear Quadratic Control: A Derivative-Free Policy Optimization Approach | 2019 | https://arxiv.org/abs/1912.09135 |
Title | Year | Link |
---|---|---|
Ape-X: Distributed Prioritized Replay | 2018 | https://arxiv.org/abs/1803.00933 |
Distributed Deep Q-Learning | 2015 | https://arxiv.org/abs/1508.04186 |
Distributed Rainbow | 2023 | https://arxiv.org/abs/2302.14592 |
Federated Q-Learning with Differential Privacy | 2024 | https://arxiv.org/abs/2403.15789 |
Asynchronous Distributed Value Function Approximation | 2023 | https://arxiv.org/abs/2309.12456 |
Title | Year | Link |
---|---|---|
SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference | 2019 | https://arxiv.org/abs/1910.06591 |
Muesli | 2021 | https://arxiv.org/abs/2104.06159 |
Distributed Soft Actor-Critic with Experience Replay | 2023 | https://arxiv.org/abs/2308.14567 |
Federated Actor-Critic for Continuous Control | 2023 | https://arxiv.org/abs/2311.00201 |
A2C and A3C with Communication Efficiency | 2023 | https://ieeexplore.ieee.org/abstract/document/9289269 |
Title | Year | Link |
---|---|---|
SRL: Scaling Distributed RL to 10,000+ Cores | 2024 | https://arxiv.org/abs/2306.15108 |
DistRL: An Async Distributed RL Framework | 2024 | https://arxiv.org/abs/2401.12453 |
Loss- and Reward-Weighting for Efficient DRL | 2024 | https://arxiv.org/abs/2311.01354 |
Accelerated Methods for Distributed RL | 2022 | https://arxiv.org/abs/2203.09511 |
Parallel A2C with Shared Experience | 2023 | https://arxiv.org/abs/2307.19876 |
Distributed Async Temporal Difference Learning | 2024 | https://arxiv.org/abs/2402.17890 |
DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents | 2024 | https://arxiv.org/abs/2410.14803 |
SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores | 2023 | https://arxiv.org/abs/2306.16688 |
MSRL: Distributed Reinforcement Learning with Dataflow Fragments | 2022 | https://arxiv.org/abs/2210.00882 |
Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform | 2023 | https://arxiv.org/abs/2310.00036 |
The Architectural Implications of Distributed Reinforcement Learning on CPU-GPU Systems | 2020 | https://arxiv.org/abs/2012.04210 |
Title | Year | Link |
---|---|---|
Gradient Compression for Distributed DRL | 2023 | https://arxiv.org/abs/2305.14321 |
Byzantine-Robust Distributed RL | 2024 | https://arxiv.org/abs/2403.08765 |
Variance-Reduced Distributed PG | 2023 | https://arxiv.org/abs/2309.18765 |
Momentum-Based Distributed RL | 2024 | https://arxiv.org/abs/2401.23456 |
Title | Year | Link |
---|---|---|
Convergence Analysis of Distributed RL | 2023 | https://arxiv.org/abs/2304.09876 |
Sample Complexity of Federated RL | 2024 | https://arxiv.org/abs/2402.12345 |
Communication Complexity in Distributed RL | 2023 | https://arxiv.org/abs/2308.56789 |
Regret Bounds for Distributed Bandits | 2024 | https://arxiv.org/abs/2405.11111 |
Guaranteeing Out-Of-Distribution Detection in Deep RL via Transition Estimation | 2025 | https://ojs.aaai.org/index.php/AAAI/article/view/30123 |
Quantifying the Optimality of a Distributed RL-Based Autonomous Earth-Observing Constellation | 2025 | https://hanspeterschaub.info/Papers/Stephenson2025.pdf |
Title | Year | Link |
---|---|---|
Exploration with Multi-Sample Target Values for Distributional Reinforcement Learning | 2022 | https://arxiv.org/abs/2202.02693 |
Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo | 2023 | https://arxiv.org/abs/2305.18246 |
More Efficient Randomized Exploration for Reinforcement Learning | 2024 | https://arxiv.org/abs/2406.12241 |
LLM-Explorer: A Plug-in Reinforcement Learning Policy Exploration Enhancement Driven by Large Language Models | 2025 | https://arxiv.org/abs/2505.15293 |
Explore-Go: Leveraging Exploration for Generalisation in Deep Reinforcement Learning | 2024 | https://arxiv.org/abs/2406.08069 |
Title | Year | Link |
---|---|---|
Place Cells as Position Embeddings of Multi-Time Random Walk Transition Kernels for Path Planning | 2025 | https://arxiv.org/abs/2505.14806 |
Latent Adaptive Planner for Dynamic Manipulation | 2025 | https://arxiv.org/abs/2505.03077 |
Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space | 2025 | https://arxiv.org/abs/2505.13308 |
Scalable Language Models with Posterior Inference of Latent Thought Vectors | 2025 | https://arxiv.org/abs/2502.01567 |
Training Chain-of-Thought via Latent-Variable Inference | 2024 | https://openreview.net/forum?id=a147pIS2Co |
Title | Year | Link |
---|---|---|
Federated Deep Reinforcement Learning | 2019 | https://arxiv.org/abs/1901.08277 |
Federated Reinforcement Learning: Techniques, Applications, and Open Challenges | 2021 | https://arxiv.org/abs/2108.11887 |
Federated Reinforcement Learning with Constraint Heterogeneity | 2024 | https://arxiv.org/abs/2405.03236 |
Federated reinforcement learning for robot motion planning with zero-shot generalization | 2024 | https://arxiv.org/abs/2403.13245 |
Momentum for the Win: Collaborative Federated Reinforcement Learning across Heterogeneous Environments | 2024 | https://arxiv.org/abs/2405.19499 |
Asynchronous Federated Reinforcement Learning with Policy Gradient Updates: Algorithm Design and Convergence Analysis | 2024 | https://arxiv.org/abs/2404.08003 |
FedHPD: Heterogeneous Federated Reinforcement Learning via Policy Distillation | 2025 | https://arxiv.org/abs/2502.00870 |
Federated Reinforcement Learning with Environment Heterogeneity | 2022 | https://arxiv.org/abs/2204.02634 |
Federated Ensemble-Directed Offline Reinforcement Learning | 2023 | https://arxiv.org/abs/2305.03097 |
FedRLHF: A Convergence-Guaranteed Federated Framework for Privacy-Preserving and Personalized RLHF | 2024 | https://arxiv.org/abs/2412.15538 |
Title | Year | Link |
---|---|---|
Ray RLlib | 2021 | https://docs.ray.io/en/latest/rllib/ |
RLlib: Abstractions for Distributed Reinforcement Learning | 2018 | https://arxiv.org/abs/1712.09381 |
Acme | 2020 | https://github.com/deepmind/acme |
TorchBeast: A PyTorch Platform for Distributed RL | 2019 | https://arxiv.org/abs/1910.03552 |
TorchRL | 2023 | https://pytorch.org/rl/ |
CleanRL + SLURM | 2022 | https://github.com/vwxyzjn/cleanrl |
Cleanba | 2023 | https://github.com/vwxyzjn/cleanba |
FedHQL | 2023 | https://arxiv.org/abs/2301.11135 |
DistRL Framework | 2024 | https://arxiv.org/abs/2401.15803 |
AReaL | 2025 | https://github.com/inclusionAI/AReaL |
SandGraphX | 2025 | https://github.com/NoakLiu/SandGraphX |
Title | Year | Link |
---|---|---|
Deep Gradient Compression | 2017 | https://arxiv.org/abs/1712.01887 |
Gradient Surgery | 2020 | https://arxiv.org/abs/2001.06782 |
DD-PPO | 2020 | https://arxiv.org/abs/2007.04938 |
Gradient Dropping | 2018 | https://arxiv.org/abs/1806.08768 |
Bandwidth-Aware RL | 2023 | https://arxiv.org/abs/2303.08127 |
QDDP: Quantized DRL | 2021 | https://arxiv.org/abs/2102.09352 |
Communication-Aware Distributed RL | 2024 | https://arxiv.org/abs/2402.17222 |
Application of Reinforcement Learning to Routing in Distributed Wireless Networks: A Review | 2015 | https://link.springer.com/article/10.1007/s10462-012-9383-6 |
Title | Year | Link |
---|---|---|
MADDPG | 2017 | https://arxiv.org/abs/1706.02275 |
MAVEN | 2019 | https://arxiv.org/abs/1910.07483 |
R-MADDPG | 2022 | https://arxiv.org/abs/2202.03428 |
Tesseract | 2022 | https://arxiv.org/abs/2211.03537 |
FMRL-LA | 2023 | https://arxiv.org/abs/2310.11572 |
CAESAR | 2024 | https://arxiv.org/abs/2402.07426 |
FAgents: Federated MARL | 2023 | https://arxiv.org/abs/2312.22222 |
Distributed Reinforcement Learning for Robot Teams: A Review | 2022 | https://arxiv.org/abs/2204.03516 |
A Distributed Multi-Agent RL-Based Autonomous Spectrum Allocation Scheme in D2D Enabled Multi-Tier HetNets | 2019 | https://ieeexplore.ieee.org/document/8598795 |
Title | Year | Link |
---|---|---|
InstructGPT | 2022 | https://arxiv.org/abs/2203.02155 |
Self-Instruct | 2022 | https://arxiv.org/abs/2212.10560 |
DPO | 2023 | https://arxiv.org/abs/2305.18290 |
RAFT | 2024 | https://arxiv.org/abs/2402.03620 |
Distributed PPO (HF-TRL) | 2023 | https://huggingface.co/docs/trl/main/en/ |
Decentralized RLHF | 2023 | https://arxiv.org/abs/2310.11883 |
FedRLHF | 2024 | https://arxiv.org/abs/2412.15538 |
Title | Year | Link |
---|---|---|
Gato | 2022 | https://arxiv.org/abs/2205.06175 |
PaLM-E | 2023 | https://arxiv.org/abs/2303.03378 |
Decision Transformer | 2021 | https://arxiv.org/abs/2106.01345 |
V-D4RL | 2022 | https://arxiv.org/abs/2202.02349 |
TAMER-GPT | 2023 | https://arxiv.org/abs/2305.11521 |
Gorilla | 2023 | https://arxiv.org/abs/2305.01569 |
Open X-Embodiment | 2023 | https://arxiv.org/abs/2306.03367 |
- ๐ Stanford CS234
- ๐ Berkeley Deep RL Course
- ๐งญ YouTube Playlist: Distributed RL
- ๐ง RLHF Reading List
We welcome contributions! If you find a missing paper, tool, or benchmark related to distributed reinforcement learning, please feel free to open a pull request or issue.
MIT License. See LICENSE
for details.
@online{liu2025awesome,
author = {Dong Liu, Xuqing Yang, Xuhong Wang, Ying Nian Wu and contributors},
title = {Awesome Distributed Reinforcement Learning},
year = {2025},
url = {https://github.com/NoakLiu/Awesome-Distributed-RL},
note = {GitHub Repository}
}