# CS486 - Artificial Intelligence
## Lesson 9 - Genetic Algorithms

A genetic algorithm is another stochastic approach to iteratively improve a goal. We'll use the AIMA library to implement a string matching genetic algorithm. 

In [1]:
from helpers import romania
from aima.search import *
from aima.notebook import psource

# we'll use pandas to nicely display and filter data
from pandas import DataFrame

## String Matching

We are going to design a genetic algorithm that starts from a population of random strings that, over many generations, will produce a string that matches some target. First, let's choose a target string to match:

In [2]:
target = "Azure FTW!" # start with around 10 characters

Next, e'll use the AIMA **`init_population`** function to generate our initial population:

```python
init_population(pop_size, gene_pool, ind_size)
```

Population size is a tradeoff between computation and memory. For this problem, `100` works well. **`gene_pool`** contains all the possible values that can be used to generate a member of the population. Our gene pool will be the following range of ASCII characters:

In [3]:
pop_size = 100
ind_size = len(target)
gene_pool = [chr(x) for x in range(32, 123)]

print("Gene pool:", ''.join(gene_pool))

Gene pool:  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz


Finally, the `ind_size` is the number of genes in each individual. 

In [4]:
population = init_population(pop_size,gene_pool,ind_size)

# we'll use a DataFrame to generate a nice table
DataFrame.from_dict({'Individual': population})

Unnamed: 0,Individual
0,"[2, D, x, 9, ^, r, s, J, W, >]"
1,"[y, 2, 2, v, v, 7, v, v, 7, 0]"
2,"[s, 1, \, f, A, -, *, a, g, U]"
3,"[], ], j, P, &, 2, G, "", e, Z]"
4,"[>, h, >, N, b, d, @, c, "", ;]"
5,"[C, _, y, v, p, \, #, :, ;, 1]"
6,"[9, <, M, :, 7, +, /, ., V, b]"
7,"[R, s, ;, r, (, y, 5, @, y, _]"
8,"[j, ], A, U, 4, ^, $, l, x, S]"
9,"[B, G, -, *, H, h, R, !, k, g]"


## Fitness 

Next, we need to define a function that quantifies the fitness of individuals in the population. The more fit an individual, the more likely the will combine genes with other members of the population, i.e. "reproduce". 

The fitness function takes an individual - a word in our case - and returns a number. The larger the number returned by the fitness function, the more fit the individual. 

In [5]:
def fitness(word):
    return 0

In [6]:
def fitness(word):
    score = 0
    for i in range(len(word)):
        if word[i] == target[i]:
            score += 1
    return score

The table below shows us the fitness of individuals in the initial population. These are the individuals that will contribute the most to the next generation. 

In [7]:
df = DataFrame.from_dict({
    'Individual': population, 
    'Fitness': [fitness(i) for i in population]
}).sort_values(by=['Fitness'], ascending=False)

df[df['Fitness'] != 0]

Unnamed: 0,Individual,Fitness
47,"[&, =, #, r, %, W, ), T, \, *]",2
0,"[2, D, x, 9, ^, r, s, J, W, >]",1
64,"[p, z, d, Z, t, N, q, 6, /, <]",1
33,"[T, i, C, p, ., %, n, l, W, 1]",1
21,"[P, K, u, -, _, J, y, M, l, A]",1
20,"[w, $, F, 7, B, 6, B, e, +, !]",1
19,"[9, $, X, 4, Y, L, V, T, $, Q]",1
39,"[k, r, G, (, m, (, +, *, +, !]",1
15,"[7, >, I, P, W, , Q, 7, y, ;]",1
42,"[U, 6, q, t, p, , N, n, C, 9]",1


The **`select`** and **`recombine`** functions in the AIMA library select individuals from the population and recombine their genes.

## Mutation

It's unlikely that our population will converge on the goal strictly through reproduction. An important gene may not be represented inour population at all. New genes are introduced into each generation through *mutation*. Mutations are introduced by adding a random gene to randomly selected individuals in each generation.  

Here is the mutation function used by the AIMA library: 

In [8]:
psource(mutate)

Longer strings take longer to converge, so we'll make mutation a function of string size:

In [9]:
mutation_rate = 1 / ind_size

## Re-populating

The following code will use the fitness function to produce a successive generation. 

In [10]:
next_gen = [mutate(recombine(*select(2, population, fitness)), gene_pool, mutation_rate)
              for _ in population))]

The next generation should be significantly more fit than the previous: 

In [11]:
DataFrame.from_dict({
    'Individual': next_gen, 
    'Fitness': [fitness(i) for i in next_gen]
}).sort_values(by=['Fitness'], ascending=False)

Unnamed: 0,Individual,Fitness
1,"[A, b, >, r, %, W, ), T, \, *]",3
50,"[9, $, X, r, %, W, ), T, \, *]",2
66,"[&, =, #, r, %, W, ), T, $, Q]",2
75,"[A, b, >, Z, j, x, 1, T, $, Q]",2
25,"[&, =, #, r, %, W, ), T, h, x]",2
53,"[P, K, u, -, _, L, V, T, $, Q]",2
73,"[&, =, #, r, %, W, ), T, \, _]",2
30,"[7, >, I, P, W, , Q, 7, y, !]",2
69,"[A, b, >, Z, j, L, V, T, $, Q]",2
33,"[p, z, d, Z, t, N, n, l, W, 1]",2


Let's iterate and watch the algorithm progress:

In [12]:
import time
from IPython.core.display import HTML
from IPython.display import clear_output, display

gen_num = 0
next_gen = population[:]
fittest = max(next_gen,key=fitness)

while (fitness(fittest) != ind_size):
    next_gen = [mutate(recombine(*select(2, next_gen, fitness)), gene_pool, mutation_rate)
        for i in range(len(population))]

    gen_num += 1
    fittest = max(next_gen,key=fitness)
    
    if gen_num % 10 == 0:
        display(HTML("<h1>{}: {}</h1>".format(gen_num, ''.join(fittest))))
        clear_output(wait=True)
        
display(HTML("<h1>{}: {}</h1>".format(gen_num, ''.join(fittest))))

We can do all of that in one step using AIMA's **`genetic_algorithm`**:

In [13]:
%%time
genetic_algorithm(population,fitness,gene_pool,ind_size)

Wall time: 8.75 s


['A', 'z', 'u', 'r', 'e', ' ', 'F', 'T', 'W', '!']

# Writing a Blokus Agent

How can you write an agent using genetic algorithms?