In [None]:
import numpy as np

# PyBeast Tutorial 3: Evolving agents

In this tutoral, you will learn how to use a genetic algorithm in BEAST to train agents to preform tasks. A genetic algorithm (GA) is a method for solving optimization problems based on natural selection processes that mimics biological evolution. 

To effectively train a population of agents, we need to define a performance metric which allows us to compare the performance of different agents. The performance metric is commonly referred to as the agent's fitness. Recall that in BEAST, simulations consist of runs, generations and assessments . After each generation, the genetic algorithm randomly selects pairs of individuals called parents. Agents that achieve higher fitness scores during the assessments will be selected with a higher probability. Once a pair of parents is selected, the genetic algorithm combines their genomes to generate a new pair of agents referred to as children. This combination process involves random crossovers and mutations of the parent genomes which will be explained later. Typcially, the GA generates as many children as there were agents in the parent generation, i.e. it conserves the number of agents in the population. 

At this point, you might wonder what do mean by "genome"? Typically, the genome of an agent is a list of hyperparameters that determine the control logic of an agent and, therby, how it responds to its environment. The GA aims to find optimal values for these hyperparameters that maximize the agent's fitness. To design an evovable agent in BEAST, the user must specify the list of hyperparameters that constiute its genome. This should be done by inherting from the `core.evolver.Evolver` class and implementing the `Evolver.GetGenotype` and `Evolver.SetGenotype` methods.   

BEAST predefines several evovable agents, which can be found in the `core.agents.neuralanimat` module. The agent classes in the ```neuralanimat``` module implement a neural network, referred to as the agent's brain, that controls the agent's behaviour. To be more precise, the neural network takes the outputs of the agent's sensor as input and generates a vector-valued output. This output vector can then be used as the agent's control values. For instance, in case of a Braitenberg vehicle, two control values are required to actuate the vehicle's left and right wheel. You can refer to the first tutorial for more details. 

The output values generated by the agent's brain (neural network) for a given sensor input will depend on the network's weights and biases. Hence, BEAST defines the agent's genome as the list of all the weights biases that constitute the agent's brain. This means that the genetic aglorithm aims to optimize the connection's in the agent's brain to the achieve optimal control for a given task.  

Let's explore how we can use the concepts we have discussed so far in a simple example. We use the ```Mouse``` class from the previous tutorial as a template and make it evovable. Our goal is to use a GA to train a population of mice to efficiently locate the cheese in its environment.      

## Evovable mouse

To equip the mouse with a brain and make it evovable, we implement a ```EvoMouse``` class that inherits from ```core.agents.neuralanimat.EvoFFNAnimat```  

In [None]:
from pybeast.core.world.worldobject import WorldObject
from pybeast.core.sensors.sensor import NearestAngleSensor
from pybeast.core.utils.colours import ColourPalette, ColourType
from pybeast.core.agents.neuralanimat import EvoFFNAnimat

class EvoMouse(EvoFFNAnimat):
    """
    Evovable mouse with Feed-forward neural network as brain. 
    """

    def __init__(self, hidden = 4):
        
        super().__init__()

        self.cheesesFound = 0
        self.SetSolid(False)
        self.SetRadius(10.0)
        
        sensorRange = 400
        self.AddSensor("angle", NearestAngleSensor(Cheese, range=sensorRange))
        self.SetInteractionRange(sensorRange)
        self.AddFFNBrain(hidden = hidden)
    
    def Control(self):
        """
        Overwrite EvoFFNAnimat control method
        """
        # Make brain fire, output is in the range [-1.0, 1.0]
        # Here, we add a bias to ensure that animats control is between 0 and 1
        super().Control()

        for n, k in enumerate(self.controls.keys()):
            self.controls[k] = 0.5 * (self.controls[k] + 1.0)

    def Reset(self):
        """
        Resets mouse after assessment
        """
        self.cheesesFound = 0
        super().Reset()

    def OnCollision(self, obj):
        """
        This is identical to the OnCollision method for Mouse, except here we are also recording the number of
        cheeses eaten.
        """
        if isinstance(obj, Cheese):
            self.cheesesFound += 1
            obj.Eaten()

    def GetFitness(self) -> float:
        """
        The EvoMouse's fitness is the amount of cheese collected, divided by
	    the distance travelled, so a mouse is penalised for simply charging around
	    as fast as possible and therby randomly collecting cheese - it needs to find
	    its the cheese efficiently.
        """
        return self.cheesesFound / np.log10(self.distanceTravelled) 

class Cheese(WorldObject):
    """Represents a cheese object."""

    def __init__(self):
        """Initialize a new Cheese object."""
        super().__init__()
        self.SetRadius(5.0)
        self.SetColour(*ColourPalette[ColourType.COLOUR_YELLOW])

    def Eaten(self):
        """Respawn cheese at random location after it's eaten."""
        self.location = self.myWorld.RandomLocation()


Let's create an instance  of ```EvoMouse``` and go through its implementation step by step

In [None]:
myEvoMouse = EvoMouse()
myEvoMouse.Init()

In the contstructor ```EvoMouse.__init__```, we add a nearest angle sensor and a brain to the mouse using the ```Animat.AddSensor``` and the ```EvoFFNAnimat.AddFFNBrain``` methods. The brain (neural network) can be accessed via ```myBrain``` attribute 

In [None]:
myEvoMouse.myBrain

which is a instance of `core.control.feedforwardnet.FeedForwardNet`.  In addition to simple feed-forward networks, BEAST also supports dynamic brain's with recurrent connections which accessed by inherting from the ```core.agents.neuralanimat.EvoDNNAnimat``` class.   

A ```FeedForwardNet``` conists of an input, a hidden and output layer. The number of neurons within each layer can be accessed via the attributes 

In [None]:
myEvoMouse.myBrain.inputs, myEvoMouse.myBrain.hidden, myEvoMouse.myBrain.outputs,  

By default, the number of inputs to the network is equal to the number of the animat's sensors. The number of neurons in the hidden layer can be controlled by passing an input ```hidden``` argument to the ```EvoFFNAnimat.AddFFNBrain``` method which defaults to 4. The number of neurons in the output layer defaults to the number of agent's control values, which is two 

In [None]:
myEvoMouse.controls

The network's ```Fire``` method   

In [None]:
myEvoMouse.myBrain.Fire()

takes the angle sensor's output

In [None]:
myEvoMouse.sensors['angle'].GetOutput()

as an input and generates a two dimensional output vector. This output vector can be retrieved by using ```FeedForwardNet.GetOutputs``` method 

In [None]:
myEvoMouse.myBrain.GetOutputs()

The output values that the agent's brain generates for a given input will depend on the neural network's weights, biases and activation function. By default, the network uses a sigmoidal activation function which constraints the output values to a range from -1.0 to 1.0. Weights and biases are initialized randomly within the range -1.0 to 1.0. The current values of the weights and biases can be accessed using the ```FeedForwardNet.GetConfiguration``` method

In [None]:
conifguration = myEvoMouse.myBrain.GetConfiguration()

which returns the configuration dictionary. Weights and biases of the hidden and output layer can accessed via   

In [None]:
conifguration['hidden']

and 

In [None]:
conifguration['output']

In our example, there are four hidden neurons, i.e.

In [None]:
len(conifguration['hidden'])

Each hidden neuron receives the same input value from the angle sensor, i.e. each hidden neuron has one input weight and one bias. To access the weight and bias of the first and last hidden neuron, we use

In [None]:
conifguration['hidden'][0], conifguration['hidden'][3] 

Furhtermore, the network has two output neurons, i.e.  

In [None]:
len(conifguration['output'])

Each output neuron receives inputs from four hidden neurons, i.e. each output neuron has four weights and one bias. Weights and bias of both output neurons can be accessed via 

In [None]:
conifguration['output'][0], conifguration['output'][1]

As we explained in the introduction, to make an agent evovable, we need to specify its genome. The ```EvoFFNAnimat``` class defines the genome as the list of weights and biases that consitute its brain. The genome can be accessed by using the ```GetGenotype``` method

In [None]:
myEvoMouse.GetGenotype()

For a feed-forward network with one input, four hidden neurons and two output neurons, we expect a total number of 18 weights and biases

In [None]:
len(myEvoMouse.GetGenotype())

In BEAST, the GA uses the agent's ```GetGenotype``` and ```SetGenotype``` method to retrieve, set and optimize the agents hyperparameters. In our example, the hyperparameters are the weights and biases of the agent's brain. The GA uses the agent's ```GetFitness``` method to retrieve its fitness score. In our example, ```EvoMouse.GetFitness``` method returns the number of cheese found by the mouse during the assessment.  

To test our implementation, let's create a simulation class that uses a genetic algorithm to evolve a population of mice that hunt for cheese.  

In [None]:
from pybeast.core.simulation import Simulation
from pybeast.core.evolve.geneticalgorithm import GeneticAlgorithm, GASelectionType
from pybeast.core.evolve.population import Group, Population

class EvoMouseSimulation(Simulation):
    """Represents a simulation with mice and cheese."""

    def __init__(self):
        """Initialize a new MouseSimulation."""
        super().__init__('Mouse')

        self.SetRuns(1)
        self.SetGenerations(10)        
        self.SetTimeSteps(500)
        self.theGA = GeneticAlgorithm(crossover = 0.25, mutation = 0.1, selection = GASelectionType.GA_ROULETTE)

        mice = Population(30, EvoMouse, self.theGA)
        cheese = Group(30, Cheese)
        # Adds population of mice and group of cheese to the simulation 
        self.Add('theMice', mice)
        self.Add('thecheese', cheese)
        
        self.whatToLog['Simulation'] = self.whatToLog['Run'] = self.whatToLog['Generation'] = True
        self.whatToSave['Simulation'] = self.whatToSave['Run'] = self.whatToSave['Generation'] = True

    def LogEndGeneration(self):
        super().LogEndGeneration()
        self.logger.info(f'Average fitness {self.avgFitness:.5f}')

    def CreateDataStructSimulation(self):
        self.data = {}

    def CreateDataStructRun(self):
        self.averageFitness = []

    def SaveGeneration(self):
        self.avgFitness = np.mean(self.contents['theMice'].AverageFitnessScoreOfMembers())
        self.averageFitness.append(self.avgFitness)

        return

    def SaveRun(self):
        self.data[f'Run{self.Run}'] = self.averageFitness


In the constructor `EvoMouseSimulation.__init__`, we set the number runs, generations per run, and timeSteps and instantiate an instance of the `core.evolve.geneticalgorithm.GeneticAlgorithm` class, specifying the crossover probability, mutation probability and selection method of the GA. The class `GeneticAlgorithm` supports three selection methods: roulette, rank, and tournament. Here, is a brief overwiew of each: 

- Roulette selection: An agent's selection probability is calculated based on its fitness score   
- Rank selection: An agent's selection probability is determined by its rank which is determined by it relative fitness.  
- Tournament selection: Agents are randomly selected, but chooses the fittest among the selected agents is chosen.

Once the genetic algorithm has selected two parent agents, their is a crossover probability chance that their genomes will be crossed over. After the crossover, each gene (hyperparameter) has a mutation probability chance to be mutated (changed in value). By default, the gentic agorithm uses Gaussian noise with zero mean and standard devivation 0.2. Note that the size of variations introduced by the mutations need to be of a simlar range as the hyperparameters. In our example, weight and biases are initialized within the range -1.0 and 1.0, i.e. a standard deviation of 0.2 is a reasonable choice. `GeneticAlgorithm` has additional parameters that control the selection and generation process of the next generation. Have a look at the source code to find more information. 

To add a evovable population of `EvoMouse` agents to our simulation, we first create an instance of the `core.evolve.population.Population`, specifying the population size, agent type, and pass a reference to the `GeneticAlgorithm` instance. `Population` implements the ```EndGeneration``` method which will called at the end of each generation and takes care of evolving the agents and creating the next generation. Similar to groups, populations can be added to the simulation using the `Simulation.Add` method. To analyse how well the mice learn their task, we create data structures and store the average fitness of each generation during the seperate runs. Let's run and render the simulation to see how it looks like

In [None]:
import matplotlib.pyplot as plt

simulation = EvoMouseSimulation()
simulation.RunSimulation(render=True)

To determine whether the mice are improving at finding the cheese, we can plot the population's fitness across generations. Let's rerun the simulation with a larger population and for multiple runs to observe the trend more accurately. To accelerate the simulation, we'll disable rendering.

In [None]:
simulation.SetRuns(2)
simulation.SetGenerations(50)
simulation.RunSimulation(render=False)

for i in range(simulation.Runs):
    averageFitness = simulation.data[f'Run{i}']
    plt.plot(averageFitness, label = {f'Run={i}'})

plt.xlabel('generation')
plt.ylabel('average fitness')

plt.show()