Neural.Swarms

Python swarm simulation with neural net based agents. Mainly focused on the NaviGame in my fork of Python.Swarms, but the way I extend the game to allow a neural net is applicable to all of the games. You will need that repo, so clone it down before continuing. I am exploring both supervised and reinforcement learning for the agent. For a nice visual display, see:

/NaviGame/Reinforcement Model Training.ipynb
/NaviGame/Supervised Model Training.ipynb

If you just want to train models or want to display somehow other than with a jupyter notebook, use these files:

/NaviGame/reinforcement_training.py
/NaviGame/supervised_training.py

tl;dr

To train a model, run one of the two .py files above
They will step you through setting up the basic variables
That code is locked to a 40 by 30 game board, 3-layer MLPs, and training on a empty field
I recommend training a model through the terminal, launching a Jupyter Notebook, and playing with it there
Then, use the Jupyter Notebook to run the more advanced training methods

Known issues

Figure placement can create recursion crashes
Not enough comments anywhere
Requires jupyter for visualization
- Add direct GIF export
- Consider potential pygame interfaces

Initial Goals

The goals are behavioral in nature, rather than statistic.

Navigation : demonstrate reinforcement learning for simple navigation
Cooperation : multiple agents move a target
Model Extension : find a way to build a model which can be trained to arbitrary performance levels, then still train further on new environments.

Agents, Environments, and Reinforcement Learning

Agents, in the context of machine learning, are a class of algorithms which make choices once deployed. Agents may be anything from humble vacuum cleaners to stock-picking algorithms. We generally say the agent exists in an environment, be it virtual or physical.
Supervised Learning is the standard method of many statistical models and neural networks. It requires an (X, y) style training set, with inputs and desired outputs. The training set becomes a limitation, as the agent will only perform as well as the data it learns from. For the simple task of reaching a goal position in a deterministic environment, it performs very well after a short training period. We obtain data for this training from a deterministic strategy, so the neural network is limited to that performance level on the game. Simply adding a small barrier is enough to completely halt the network's strategy.
Reinforcement Learning allows us to train our algorithms with rewards. Rather than learning from an (X, y) training set, it learns from experience. Each experience comes with certain rewards, and each time a reward is received, the algorithm can learn.
Deep-Q Networks are a way to deploy reinforcement learning to neural networks. The network predicts Q-values for each action the network is allowed. A Q-value is a quality of a state, or the expected sum of rewards as we play the game from that state. We (almost) always select the max Q-value we predict.
RL Data: Initially, the agent has absolutely no knowledge of the environment, so Q-values are effectively random. At each step, it updates the Q-value using the actual reward, plus the Q-value of the next step it plans on taking. So, our model trains on (X, y) data, but each y is actually self-generated and often very inaccurate. But since a part of it is ground truth, the model eventually learns something close enough to real Q-values to function.

Deterministic and Supervised Examples

Here we see how quickly the supervised learner can perform well on the simple task. In contrast, the reinforcement learner struggles to perform well, but it is does show potential. Here are some examples of simulation performance:

Deterministic Strategy	Almost trained supervised model	Trained supervised

The supervised network learns from the deterministic strategy on the left, and eventually learns to mimic it perfectly.

This also means that the supervised learner is limited by the strategy it learns from. So, enter reinforcement learning!

Reinforcement Examples

Reinforcement learning allows the agent to explore strategies on its own, and by receiving rewards from its environment, learns which are better.

When the DQN agent is initialized, it's output values are effectively random numbers, and training is very susceptible to local minima. So, we train using an explore/exploit ratio that decreases throughout the training session. Typically, it starts at 0.9, and ends at 0.1. Additionally, I can make some of the choices come from our deterministic strategy, to focus training on the "correct" routes. Third, we know that our deterministic strategy works, so why not use it? And, finally, a tolerance function can make the game easier or harder, to let's start with an easier game, then make it harder once the agent is doing well.

With all this in mind, I built a new model. This model takes inputs as usual, the whole game screen. As outputs, it has the five usual outputs; up, down, left, right and stay, plus a new addition: use the deterministic strategy. So, for the simple games, all our DQN agent has to do is learn to always use the deterministic strategy. Once it learns this, then we can start exploring more complex problems. Meet Larry, the simple bundle of neurons:

Break In	More Training	Trained with harder game	Non-optimal paths	Doesn't like obstacles

To conclude, reinforcement learning clearly works, and leaves flexibility to function of new challenges. I've built new training systems for training with a variety of obstacles. I'm currently in the processes of documenting the code here and in my Sphero DQN project, and will continue testing models once that's finished.

References

Python.Swarms
Playing Atari with Deep Reinforcement Learning
Excellent materials in Georgia Tech's Reinforcement Learning course on Udacity.
Keras Plays Catch
Pong from Pixels
Deep Reinforcement Learning

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
Notes		Notes
Source		Source
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neural.Swarms

tl;dr

Known issues

Initial Goals

Agents, Environments, and Reinforcement Learning

Deterministic and Supervised Examples

Reinforcement Examples

References

About

Releases

Packages

Languages

tab0r/Neural.Swarms

Folders and files

Latest commit

History

Repository files navigation

Neural.Swarms

tl;dr

Known issues

Initial Goals

Agents, Environments, and Reinforcement Learning

Deterministic and Supervised Examples

Reinforcement Examples

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages