Clean baseline implementation of PPO using an episodic TransformerXL memory
deep-reinforcement-learning
pytorch
transformer
policy-gradient
pomdp
actor-critic
proximal-policy-optimization
ppo
on-policy
episodic-memory
transformer-xl
gtrxl
trxl
gated-transformer-xl
memory-gym
-
Updated
Jun 18, 2024 - Python