Official codebase for the preprint: "Representation over Routing: Overcoming Surrogate Hacking in Multi-Timescale PPO".
reinforcement-learning deep-learning deep-reinforcement-learning pytorch reinforcement actor-critic credit-assignment deep-rl proximal-policy-optimization ppo ppo-agent multi-timescale surrogate-hacking target-decoupling
-
Updated
Apr 16, 2026 - Python