### Experimenting with the NeuroEvolution of Augmenting Topologies (NEAT) Algorithm

#### Objective
In this Jupyter Notebook, we will work on solving simple simulators with a Reinforcement Learning agent trained with the NEAT Algorithm. 

#### Problem
Reinforcement learning is studied in multiple disciplines, as well as evolutionary algorithms. I'd like to research and learn interesting reinforcement learning techniques such as NEAT that solves Reinforcement learning tasks like AIs in video games, robots learning how to walk, etc.

#### What is NEAT
See the presentation on NEAT at https://docs.google.com/presentation/d/1H4W0TBSQHH-FQ18fmvH-Qv1MRaqOpOf0MI1bV-UftOM/edit#slide=id.g1096e8bacce_0_157

Read the paper by the professor that discovered NEAT http://nn.cs.utexas.edu/downloads/papers/stanley.cec02.pdf

#### Requirements
* The `neat-python` library installed with `pip install neat-python`
* The OpenAI Gym library installed with `pip install gym`

#### How to Formulate a problem for NEAT (or RL tasks)
* Define the inputs (observations) and outputs (action)
* Define the fitness function
* Define the hyperparameters in config file (ie. population size, bias, etc)

#### Simulations for this project:
* XOR problem 
* Cart Pole Balancing
* Mountain Car Climbing

### XOR Problem (Very Basic NEAT Problem)

| A | B | O |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 0 |

#### Inputs and Outputs
This the XOR problem, the inputs are simply `A` and `B` and output `O`. 

#### Fitness Function
Since we know the labels for the XOR gate, we can make a fitness function `1 - sum_i((ei - ai)^2)`, ei meaning the expected and ai being the actual outputs. 

#### Hyperparameters 
This takes a lot of tweaking for any NEAT problem to optimize training, but the actual important ones are listed below
* fitness_threshold = 3.9
* pop_size = 150
* feed_forward = True
* species_fitness_func = max

In [10]:
# Make sure to install all these modules correctly
import sys 
import os
import neat
import gym
import numpy as np
import pickle

In [11]:
# 2-input XOR inputs and expected outputs.
xor_inputs = [(0.0, 0.0), (0.0, 1.0), (1.0, 0.0), (1.0, 1.0)]
xor_outputs = [   (0.0,),     (1.0,),     (1.0,),     (0.0,)]

class Xor:
    @staticmethod
    def eval_genomes(genomes, config):
        for genome_id, genome in genomes:
            genome.fitness = 4.0
            net = neat.nn.FeedForwardNetwork.create(genome, config)
            for xi, xo in zip(xor_inputs, xor_outputs):
                output = net.activate(xi)
                genome.fitness -= (output[0] - xo[0]) ** 2

    @staticmethod
    def run(config_file):
        # Load configuration.
        config = neat.Config(neat.DefaultGenome, neat.DefaultReproduction,
                            neat.DefaultSpeciesSet, neat.DefaultStagnation,
                            config_file)

        # Create the population, which is the top-level object for a NEAT run.
        p = neat.Population(config)

        # Add a stdout reporter to show progress in the terminal.
        p.add_reporter(neat.StdOutReporter(True))
        # Redirect stdout to a txt file
        actual_stdout = sys.stdout
        sys.stdout = open('xor-output.txt', 'w')

        # Run for up to 300 generations.
        winner = p.run(Xor.eval_genomes, 300)

        # Restore stdout 
        sys.stdout = actual_stdout

        return winner, config

    @staticmethod
    def eval_winner(winner, config):
        # Display the winning genome.
        print('\nBest genome:\n{!s}'.format(winner))

        # Show output of the most fit genome against training data.
        print('\nOutput:')
        winner_net = neat.nn.FeedForwardNetwork.create(winner, config)
        for xi, xo in zip(xor_inputs, xor_outputs):
            output = winner_net.activate(xi)
            print("input {!r}, expected output {!r}, got {!r}".format(xi, xo, output))


In [12]:
def run_xor():
    # Determine path to configuration file. This path manipulation is
    # here so that the script will run successfully regardless of the
    # current working directory.
    local_dir = os.path.abspath('')
    config_path = os.path.join(local_dir, 'config-xor')
    winner, config = Xor.run(config_path)
    Xor.eval_winner(winner, config)
    # Visualize the resulting neural network if possible
    try:
        import visualize
        node_names = {-1:'A', -2: 'B', 0:'A XOR B'}
        visualize.draw_net(config, winner, filename='xor-winner-genome', node_names=node_names)
    except Exception:
        pass
run_xor()


Best genome:
Key: 6912
Fitness: 3.93413315555832
Nodes:
	0 DefaultNodeGene(key=0, bias=-2.5053889851167845, response=1.0, activation=sigmoid, aggregation=sum)
	714 DefaultNodeGene(key=714, bias=-1.8540309741834953, response=1.0, activation=sigmoid, aggregation=sum)
	1211 DefaultNodeGene(key=1211, bias=-1.127904750342468, response=1.0, activation=sigmoid, aggregation=sum)
Connections:
	DefaultConnectionGene(key=(-2, 714), weight=4.614806302099371, enabled=True)
	DefaultConnectionGene(key=(-2, 1211), weight=-0.14491657968845467, enabled=True)
	DefaultConnectionGene(key=(-1, 0), weight=0.3884322061108544, enabled=False)
	DefaultConnectionGene(key=(-1, 714), weight=3.7211666625869557, enabled=True)
	DefaultConnectionGene(key=(-1, 1211), weight=-0.760805411682394, enabled=True)
	DefaultConnectionGene(key=(714, 0), weight=-0.43187673076799654, enabled=False)
	DefaultConnectionGene(key=(714, 1211), weight=2.091788797171861, enabled=True)
	DefaultConnectionGene(key=(1211, 0), weight=3.7823247

#### Visualize Neural Network
If you downloaded Graphviz, you can visualize the result genome at `xor-winner-genome.svg`

### Cart Pole Balancing Problem
We are utilizing the OpenAI Gym `CartPole-v1` environment for our next experiment. The system is controlled by applying a force of +1 or -1 to the cart. The pendulum starts upright, and the goal is to prevent it from falling over. A reward of +1 is provided for every timestep that the pole remains upright. The episode ends when the pole is more than 15 degrees from vertical, or the cart moves more than 2.4 units from the center. More about this environment can be explored at https://gym.openai.com/envs/CartPole-v1/

#### Inputs and Outputs 

Inputs/observations are:
* Cart Position
* Cart Velocity
* Pole Angle
* Pole Angular Velocity

Outputs/actions are:
* Push cart to the left 0
* Push cart to the right 1

Fitness Function/Reward:
* Number of steps/frames before termination

Hyperparameters:
* Again, this takes a lot of tweaking for any NEAT problem to optimize training
* I tested activation of `clamped` or `sigmoid` and found that produces similar results
* fitness of 200 because that is the goal set by OpenAI

#### Check out the environment 
Let's take a look at the gym first with random inputs. We can run the next cell multiple things to see that randomly choosing actions will fail very quickly.

In [1]:
import gym
# Create the environment
env = gym.make('CartPole-v1')
# Initialize the environment
observation = env.reset()
# Loop until we reach termination
done = False
fitness = 0
done = False
while not done:
    observation, reward, done, info = env.step(env.action_space.sample()) # take a random action
    env.render()
    fitness += reward
env.close()
print('The score/fitness for this random sample run is', fitness) # Frames/Steps this run last

The score/fitness for this random sample run is 32.0


#### Cart Pole Balancing Code
Now let's work on the training our agent with NEAT. Everything will be similar to the XOR problem which just a little difference: 
* We can utlize the `multiprocessing` library to use multiple CPUs and speed up training
* we have another function eval_genome to evaulate individual genomes

Because juypter notebook does not work with `multiprocessing`, so I have code for training in the file `train_pole_balancing.py`

In [14]:
def pole_balancing():
    from train_pole_balancing import PoleBalance
    local_dir = os.path.abspath('')
    config_path = os.path.join(local_dir, 'config-pole-balance')
    winner, config = PoleBalance.run(config_path)
    print(winner)

    # Save the winner.
    with open('winner-pole-balance', 'wb') as f:
        pickle.dump(winner, f)

    # Visualize the resulting neural network if possible
    try:
        import visualize
        node_names = {-1: 'Cart Position', -2: 'Cart Velocity', 
        -3: 'Pole Angle', -4: 'Pole Angular Velocity', 0: 'Push cart to left', 1: 'Push cart to right'}
        visualize.draw_net(config, winner, filename='pole-balancing-winner-genome', node_names=node_names)
    except Exception:
        pass
pole_balancing()

Key: 314
Fitness: 500.0
Nodes:
	0 DefaultNodeGene(key=0, bias=0.14381512908471206, response=1.0, activation=clamped, aggregation=sum)
	1 DefaultNodeGene(key=1, bias=-0.4701595934607206, response=1.0, activation=clamped, aggregation=sum)
	62 DefaultNodeGene(key=62, bias=-1.3185657977706486, response=1.0, activation=clamped, aggregation=sum)
	63 DefaultNodeGene(key=63, bias=-0.01867436303910147, response=1.0, activation=clamped, aggregation=sum)
	514 DefaultNodeGene(key=514, bias=1.4469735377372763, response=1.0, activation=clamped, aggregation=sum)
Connections:
	DefaultConnectionGene(key=(-4, 0), weight=-2.361693229075551, enabled=True)
	DefaultConnectionGene(key=(-4, 62), weight=-0.2955341683285919, enabled=True)
	DefaultConnectionGene(key=(-3, 1), weight=-0.8631475492114329, enabled=False)
	DefaultConnectionGene(key=(-3, 63), weight=-1.958387045250396, enabled=True)
	DefaultConnectionGene(key=(-3, 514), weight=0.8864010788492427, enabled=True)
	DefaultConnectionGene(key=(-2, 62), weig

#### Visualize Neural Network
If you downloaded Graphviz, you can visualize the result genome at `pole-balancing-winner-genome.svg`

### Testing the performance of our agent
Let's use our winner stored with `pickle` to run the OpenAI gym environment again

In [15]:
def test_pole_balancing():
    # load the winner
    with open('winner-pole-balance', 'rb') as f:
        winner = pickle.load(f)


    # Load the config file, which is assumed to live in
    # the same directory as this script.
    local_dir = os.path.abspath('')
    config_path = os.path.join(local_dir, 'config-pole-balance')
    config = neat.Config(neat.DefaultGenome, neat.DefaultReproduction,
                        neat.DefaultSpeciesSet, neat.DefaultStagnation,
                        config_path)

    net = neat.nn.FeedForwardNetwork.create(winner, config)
    env = gym.make('CartPole-v1')
    observation = env.reset()
    fitness = 0.0
    done = False
    while not done:
        action = np.argmax(net.activate(observation)) 
        observation, reward, done, info = env.step(action)
        env.render()
        fitness += reward
    env.close()
    print('The score/fitness for this random sample run is', fitness) # Frames/Steps this run last
test_pole_balancing()

The score/fitness for this random sample run is 500.0


### Mountain Car Problem
A car is on a one-dimensional track, positioned between two "mountains". The goal is to drive up the mountain on the right; however, the car's engine is not strong enough to scale the mountain in a single pass. Therefore, the only way to succeed is to drive back and forth to build up momentum. More about this environment can be explored at https://gym.openai.com/envs/MountainCarContinuous-v0/

#### Inputs and Outputs 

Inputs/observations are:
* Cart Position
* Cart Velocity

Outputs/actions are:
* Power Coefficient between -1 - 1

Fitness Function/Reward:
* Reward of 100 is awarded if the agent reached the flag (position = 0.45) on top of the mountain. Reward is decrease based on amount of energy consumed each step.

Termination:
    * The car position is more than 0.45
    * Episode length is greater than 200

Hyperparameters:
* Again, this takes a lot of tweaking for any NEAT problem to optimize training
* However, our activation must be `clamped` because we want an output between -1 and 1
* Sigmoid will not work for this problem
* fitness of 0 because the only time fitness is greater than 0 is when car reaches the flag

#### Check out the environment 
Let's take a look at the gym first with random inputs. We can run the next cell multiple things to see that randomly choosing actions will fail very quickly.

In [2]:
import gym
# Create the environment
env = gym.make('MountainCarContinuous-v0')
# Initialize the environment
observation = env.reset()
# Loop until we reach termination
done = False
fitness = 0
while not done:
    observation, reward, done, info = env.step(env.action_space.sample()) # take a random action
    env.render()
    fitness += reward
env.close()
print('The score/fitness for this random sample run is', fitness) # Frames/Steps this run last

Similar to the previous problem, we train our problem with NEAT. This will take a lot more time than the pole balancing problem

In [17]:
def mountain_climbing():
    from train_mountain_car import MountainCar
    local_dir = os.path.abspath('')
    config_path = os.path.join(local_dir, 'config-mountain-car')
    winner, config = MountainCar.run(config_path)
    print(winner)

    # Save the winner.
    with open('winner-mountain-car', 'wb') as f:
        pickle.dump(winner, f)

    # Visualize the resulting neural network if possible
    try:
        import visualize
        node_names = {-1: 'Cart Position', -2: 'Cart Velocity', 0: 'Power Coefficient'}
        visualize.draw_net(config, winner, filename='mountaincar-winner-genome', node_names=node_names)
    except Exception:
        pass
mountain_climbing()

Key: 10584
Fitness: 99.1209235786025
Nodes:
	0 DefaultNodeGene(key=0, bias=0.2808552807809455, response=1.0, activation=clamped, aggregation=sum)
	228 DefaultNodeGene(key=228, bias=0.1735738010381113, response=1.0, activation=clamped, aggregation=sum)
	2236 DefaultNodeGene(key=2236, bias=0.7021845169629942, response=1.0, activation=clamped, aggregation=sum)
Connections:
	DefaultConnectionGene(key=(-2, 0), weight=5.980447707587803, enabled=True)
	DefaultConnectionGene(key=(-2, 228), weight=-0.6500711894609923, enabled=True)
	DefaultConnectionGene(key=(-2, 2236), weight=-0.3022207361846107, enabled=True)
	DefaultConnectionGene(key=(228, 2236), weight=-0.6273478660315946, enabled=True)
	DefaultConnectionGene(key=(2236, 0), weight=-0.4132314387970404, enabled=True)


#### Visualize Neural Network
If you downloaded Graphviz, you can visualize the result genome at `pole-balancing-winner-genome.svg`

### Testing the performance of our agent
Let's use our winner stored with `pickle` to run the OpenAI gym environment again

In [19]:
def test_mountain_climbing():
    # load the winner
    with open('winner-mountain-car', 'rb') as f:
        winner = pickle.load(f)


    # Load the config file, which is assumed to live in
    # the same directory as this script.
    local_dir = os.path.abspath('')
    config_path = os.path.join(local_dir, 'config-mountain-car')
    config = neat.Config(neat.DefaultGenome, neat.DefaultReproduction,
                        neat.DefaultSpeciesSet, neat.DefaultStagnation,
                        config_path)

    net = neat.nn.FeedForwardNetwork.create(winner, config)
    env = gym.make('MountainCarContinuous-v0')
    observation = env.reset()
    fitness = 0.0
    done = False
    while not done:
        action = net.activate(observation)
        observation, reward, done, info = env.step(action)
        env.render()
        fitness += reward
    env.close()
    print('The score/fitness for this random sample run is', fitness) # Frames/Steps this run last
test_mountain_climbing()

The score/fitness for this random sample run is 98.97010562724965
