A solution for solving Classic Control problems by combining Neural Network and Evolution Strategies.
While these tasks are usually solved by Reinforcement Learning algorithms, here we solve them using Evolution Strategy.
Created a policy neural network for choosing the next action to play.
The weights of the neural network are trained using Evolution strategies.
Control theory problems from the classic Reinforcement Learning literature.
The problems are simulated using a framework built by OpenAI called Gym.
For each task, the environment provides an initial state from a distribution (so each game might be a little different), accepts actions, and given an action provides the next state and a reward.
Each states and actions simulates real physics behavior.
Example of states values are location, velocity, angle of joint, angular velocity and momentum, actions are usually the force that should be acted on an object.
The goal is to build a policy model that given a state returns the action that will lead us to the highest reward.
We solved 3 different tasks using the same training code.
A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The system is controlled by applying a force of +1 or -1 to the cart.
The pendulum starts upright, and the goal is to prevent it from falling over.
A reward of +1 is provided for every timestep that the pole remains upright.
The episode ends when the pole is more than 15 degrees from vertical, or the cart moves more than 2.4 units from the center.
Acrobot is a 2-link pendulum with only the second joint actuated. Initially, both links point downwards. The goal is to swing the end-effector at a height at least the length of one link above the base. Both links can swing freely and can pass by each other, i.e., they don't collide when they have the same angle.
A car is on a one-dimensional track, positioned between two "mountains". The goal is to drive up the mountain on the right; however, the car's engine is not strong enough to scale the mountain in a single pass. Therefore, the only way to succeed is to drive back and forth to build up momentum.
The goal in all the tasks is to get the highest score
1. Cartpole-v0
Solved Requirements - average reward of 195
Our Score - average reward of 200
Solved!
2. Acrobot-v1
Solving – no specified threshold, only leaderboard
Our Score - average reward of -80.73
13th place in the leaderboard!
3. MountainCar-v0
Solving - average reward of -110.0
Our Score - average reward of -106.26
Solved!
Training and testing methods are available on Main.py for all three tasks.
Uncomment and run the test_model methods for testing the pretrained models in this repo.
Uncomment and run the train_model methods for training new models.
Requirements: deap, numpy, matplotlib, gym