Skip to content

unixpickle/qwop-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

QWOP AI

This is an experiment in training an RL agent to play the famous game QWOP.

This might seem like nothing new. I've trained plenty of RL agents before, and I've even turned HTML5 games into RL environments before. So, instead of focusing on these aspects, this project focuses on infrastructure and scalability. In particular, I am playing with the following ideas/technologies:

  • Redis Pub/Sub
  • Kubernetes Deployments and Services
  • Remote environments running on CPU-only machines.
  • Asynchronous policy stepping.

Here are the components of the training system:

  • Redis - used for communicating between CPU and GPU machines.
  • Master - a GPU machine that takes actions and trains an agent.
  • Workers - a set of CPU instances that asynchronously run multiple environments and ask the master for actions at every timestep.

This setup has a nice consequence: it is really easy to monitor and debug. For example, if every worker sends an environment's frames to a different Redis channel, then a third-party can hook into one of those Redis channels and passively watch the agent play.

Results

The agent learns the optimal "kneeling" gate, which looks lame, but I'm told is the best you can do. In the interest of getting something cooler-looking, I tried adding a bonus that rewards the agent for keeping its knees off the ground. Here is a video of the agent playing without a standing bonus. Here is a video of the agent playing with a standing bonus of 0.05. The difference is clear: the first agent is much faster, but it relies on its knees contacting the ground.

About

AI for the game QWOP

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published