In [1]:
from pprint import pprint
import random
import math

## Local Search - Genetic Algorithm

There are some key ideas in the Genetic Algorithm.

First, there is a problem of some kind that either *is* an optimization problem or the solution can be expressed in terms of an optimization problem.
For example, if we wanted to minimize the function

$$f(x) = \sum (x_i - 0.5)^2$$

where $n = 10$.
This *is* an optimization problem. Normally, optimization problems are much, much harder.

![Eggholder](http://www.sfu.ca/~ssurjano/egg.png)!

The function we wish to optimize is often called the **objective function**.
The objective function is closely related to the **fitness** function in the GA.
If we have a **maximization** problem, then we can use the objective function directly as a fitness function.
If we have a **minimization** problem, then we need to convert the objective function into a suitable fitness function, since fitness functions must always mean "more is better".

Second, we need to *encode* candidate solutions using an "alphabet" analogous to G, A, T, C in DNA.
This encoding can be quite abstract.
You saw this in the Self Check.
There a floating point number was encoded as bits, just as in a computer and a sophisticated decoding scheme was then required.

Sometimes, the encoding need not be very complicated at all.
For example, in the real-valued GA, discussed in the Lectures, we could represent 2.73 as....2.73.
This is similarly true for a string matching problem.
We *could* encode "a" as "a", 97, or '01100001'.
And then "hello" would be:

```
["h", "e", "l", "l", "o"]
```

or

```
[104, 101, 108, 108, 111]
```

or

```
0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1
```

In Genetics terminology, this is the **chromosome** of the individual. And if this individual had the **phenotype** "h" for the first character then they would have the **genotype** for "h" (either as "h", 104, or 01101000).

To keep it straight, think **geno**type is **genes** and **pheno**type is **phenomenon**, the actual thing that the genes express.
So while we might encode a number as 10110110 (genotype), the number itself, 182, is what goes into the fitness function.
The environment operates on zebras, not the genes for stripes.

## String Matching

You are going to write a Genetic Algorithm that will solve the problem of matching a target string.
Now, this is kind of silly because in order for this to work, you need to know the target string and if you know the target string, why are you trying to do it?
Well, the problem is *pedagogical*.
It's a fun way of visualizing the GA at work, because as the GA finds better and better candidates, they make more and more sense.

Now, string matching is not *directly* an optimization problem so this falls under the general category of "if we convert the problem into an optimization problem we can solve it with an optimization algorithm" approach to problem solving.
This happens all the time.
We have a problem.
We can't solve it.
We convert it to a problem we *can* solve.
In this case, we're using the GA to solve the optimization part.

And all we need is some sort of measure of the difference between two strings.
We can use that measure as a **loss function**.
A loss function gives us a score tells us how similar two strings are.
The loss function becomes our objective function and we use the GA to minimize it by converting the objective function to a fitness function.
So that's the first step, come up with the loss/objective function.
The only stipulation is that it must calculate the score based on element to element (character to character) comparisons with no global transformations of the candidate or target strings.

And since this is a GA, we need a **genotype**.
The genotype for this problem is a list of "characters" (individual letters aren't special in Python like they are in some other languages):

```
["h", "e", "l", "l", "o"]
```

and the **phenotype** is the resulting string:

```
"hello"
```

In addition to the generic code and problem specific loss function, you'll need to pick parameters for the run.
These parameters include:

1. population size
2. number of generations
3. probability of crossover
4. probability of mutation

You will also need to pick a selection algorithm, either roulette wheel or tournament selection.
In the later case, you will need a tournament size.
This is all part of the problem.

Every **ten** (10) generations, you should print out the fitness, genotype, and phenotype of the best individual in the population for the specific generation.
The function should return the best individual *of the entire run*, using the same format.

In [2]:
ALPHABET = "abcdefghijklmnopqrstuvwxyz "

## get_parent

`get_parent` gets random parents based on the size of the target. **Used by**: [genetic_algorithm](#genetic_algorithm)

* **gene_set** string: possible values for the genes i.e. ALPHABET

* **target** string: the target in question

**returns** List: list of strings of randomly generated parent based on the target's length

In [3]:
def get_parent(gene_set: str, target: str)-> list:
    genotype = []
    while len(genotype) < len(target):
        size = min(len(target) - len(genotype), len(gene_set))
        genotype.extend(random.sample(gene_set, size))
    return genotype

In [4]:
# assertions/unit tests
test_target_1 = "hello"
assert get_parent(ALPHABET, test_target_1)

test_target_2 = "hello world"
assert get_parent(ALPHABET, test_target_2)

test_target_3 = "good night world"
assert get_parent(ALPHABET, test_target_3)

## initialize_population

`get_parent` gets random parents based on the size of the target. **Used by**: [genetic_algorithm](#genetic_algorithm)

* **gene_set** string: possible values for the genes i.e. ALPHABET

* **target** string: the target in question

**returns** List: list of lists of strings of randomly generated parent based on the target's length

In [5]:
def initialize_population(gene_set: str, target: str, population_size: int) ->list[list[str]]:
    population =[]
    for i in range(population_size):
        individual = get_parent(gene_set, target)
        population.append(individual)
    return population

In [6]:
# assertions/unit tests
test_target_1 = "hello"
assert initialize_population(ALPHABET, test_target_1, population_size = 2)

test_target_2 = "hello world"
assert initialize_population(ALPHABET, test_target_2, population_size = 5)

test_target_3 = "good night world"
assert initialize_population(ALPHABET, test_target_3, population_size = 10)

## fitness_evaluation

`fitness_evaluation` examines the child and compares it to the target to determine its fitness. **Used by**: [genetic_algorithm](#genetic_algorithm)

* **population** list: list of lists that have singular string values that form the prediction
* **target**: string: the target in question

**returns** list: list of floats that form the fitness scores

In [7]:
def fitness_evaluation(population: list[list[str]], target: str) -> list[float]: 
    fitness_scores = []
    for i in range(len(population)):
        prediction_formatted = ''.join(population[i])
        score = sum(1 for expected, actual in zip(target, prediction_formatted) if expected == actual)
        score /= max(len(target),len(population[i]))
        fitness_scores.append(score)
    return fitness_scores

In [8]:
# assertions/unit tests
test_target_1 = "hello"
test_population_1 = [['h', 'e', 'k', 'l', ' '],
                     ['a', 'k', 'c', 'b', 'q']]
assert fitness_evaluation(test_population_1, test_target_1)

test_target_2 = "hello world"
test_population_2 = [['h', 'e', 'k', 'l', ' ', 't', 'w', 'o', 't', 'f', 'd'],
                ['r', 'u', 'b', 'o', 'y', 'p', 'l', 'x', 'w', 'g', 'k'], 
                ['s', 'o', 'p', 'a', 'w', 'i', 'x', 'k', 'h', 'm', 'f']]
assert fitness_evaluation(test_population_2, test_target_2)

test_target_3 = "good night world"
test_population_3 = [['g', 'o', 'o', 'd', ' ', 'n', 'i', 'g', 'h', 't', ' ', 'w', 'o', 'r', 'l', 'd'],
                ['w', 'f', ' ', 'q', 'v', 'o', 't', 'd', 'i', 'c', 'a', 'm', 'h', 'l', 'j', 'n'],
                ['v', 's', 'q', 'f', 'i', 'x', 'p', ' ', 'u', 'c', 'e', 'h', 'y', 'd', 'n', 'o'],
                ['a', 'i', 'l', 'd', 'w', 'x', 'b', 'f', 'y', ' ', 'e', 'u', 'q', 'h', 'v', 'c']]
assert fitness_evaluation(test_population_3, test_target_3)

## survival_of_fittest

`survival_of_fittes` examines the child and compares it to the target to determine its fitness. **Used by**: [genetic_algorithm](#genetic_algorithm)

* **population** list: list of list that contains singular string values that form the prediction
* **fitness_scores**: list: list of the fitness scores

**returns** list: list of lists that contain the remaining fittest populants

In [9]:
def survival_of_fittest(population, fitness_scores):
    for i in range(len(population)-1,-1,-1):
        if fitness_scores[i] == 0:
            del population[i]
            del fitness_scores[i]
    return (population, fitness_scores)

In [10]:
# assertions/unit tests
test_population_1 = [['h', 'e', 'k', 'l', ' '],
                     ['a', 'k', 'c', 'b', 'q']]
test_fitness_1 = [0.6, 0.0]
assert survival_of_fittest(test_population_1, test_fitness_1)

test_population_2 = [['h', 'e', 'k', 'l', ' ', 't', 'w', 'o', 't', 'f', 'd'],
                ['r', 'u', 'b', 'o', 'y', 'p', 'l', 'x', 'w', 'g', 'k'], 
                ['s', 'o', 'p', 'a', 'w', 'i', 'x', 'k', 'h', 'm', 'f']]
test_fitness_2 = [0.5454545454545454, 0.0, 0.0]

assert survival_of_fittest(test_population_2, test_fitness_2)


test_population_3 = [['g', 'o', 'o', 'd', ' ', 'n', 'i', 'g', 'h', 't', ' ', 'w', 'o', 'r', 'l', 'd'],
                ['w', 'f', ' ', 'q', 'v', 'o', 't', 'd', 'i', 'c', 'a', 'm', 'h', 'l', 'j', 'n'],
                ['v', 's', 'q', 'f', 'i', 'x', 'p', ' ', 'u', 'c', 'e', 'h', 'y', 'd', 'n', 'o'],
                ['a', 'i', 'l', 'd', 'w', 'x', 'b', 'f', 'y', ' ', 'e', 'u', 'q', 'h', 'v', 'c']]
test_fitness_3 = [1.0, 0.0, 0.0, 0.0625]

assert survival_of_fittest(test_population_3, test_fitness_3)

## sort

`sort` sorts the list of offsprings by highest fitness score first to lowest **Used by**: [genetic_algorithm](#genetic_algorithm)

* **population** list: list of list that contains singular string values that form the prediction
* **fitness_scores**: list: list of the fitness scores

**returns** list: list of lists that contain the sorted populants

In [11]:
def sort(population, fitness):
    to_sort_list = list(zip(fitness,population))
    sorted_list = sorted(to_sort_list, reverse = True)
    final_list = list(zip(*sorted_list))
    fitness_scores = final_list[0]
    population = list(final_list[1])
    return population, fitness

In [12]:
# assertions/unit tests
test_population_1 = [['h', 'e', 'k', 'l', ' '], ['h', 'e', 'l', 'l', 'o']]
test_fitness_1 = [0.6, 1]
assert sort(test_population_1, test_fitness_1)

test_population_2 = [['h', 'e', 'k', 'l', ' ', 't', 'w', 'o', 't', 'f', 'd']]
test_fitness_2 = [0.5454545454545454]

assert sort(test_population_2, test_fitness_2)

test_population_3 =[['g', 'o', 'o', 'd', ' ', 'n', 'i', 'g', 'h', 't', ' ', 'w', 'o', 'r', 'l', 'd'],
                    ['a', 'i', 'l', 'd', 'w', 'x', 'b', 'f', 'y', ' ', 'e', 'u', 'q', 'h', 'v', 'c']]
test_fitness_3 = [1.0, 0.0625]

assert sort(test_population_3, test_fitness_3)

## Crossover helper functions

In [13]:
def front_split(genotype):
    length = len(genotype)
    half = round(length/2)
    first_half = genotype[0 : half]
    return first_half

In [14]:
test_genotype_1 = ['h', 'e', 'k', 'l', ' ']
assert front_split(test_genotype_1)

test_genotype_2 = ['h', 'e', 'k', 'l', ' ', 't', 'w', 'o', 't', 'f', 'd']
assert front_split(test_genotype_2)

test_genotype_3 = ['g', 'o', 'o', 'd', ' ', 'n', 'i', 'g', 'h', 't', ' ', 'w', 'o', 'r', 'l', 'd']
assert front_split(test_genotype_3)

In [15]:
def end_split(genotype):
    length = len(genotype)
    half = round(length/2)
    second_half = genotype[half : length]
    return second_half

In [16]:
test_genotype_1 = ['h', 'e', 'k', 'l', ' ']
assert end_split(test_genotype_1)

test_genotype_2 = ['h', 'e', 'k', 'l', ' ', 't', 'w', 'o', 't', 'f', 'd']
assert end_split(test_genotype_2)

test_genotype_3 = ['g', 'o', 'o', 'd', ' ', 'n', 'i', 'g', 'h', 't', ' ', 'w', 'o', 'r', 'l', 'd']
assert end_split(test_genotype_3)

## crossover

`crossover` makes childs from the parents generated by [get_parent] **Used by**: [genetic_algorithm](#genetic_algorithm)

* **parent_1** list: list of singular string values that is the genotype of parent 1
* **parent_2**: list: list of singular string values that is the genotype of parent 2

**returns** list: the offspring of parent 1 and 2

In [17]:
def crossover(population, crossover_rate): 
    if random.uniform(0,1) < crossover_rate:
        new_generation = []
        for i in range(random.randint(0,len(population))):
            new_generation.append(population[i])
            
        for i in range(len(population)):
            first_half = front_split(population[i])
            for j in range(len(population)):
                if i != j:
                    second_half = end_split(population[j])
                    new_generation.append(first_half + second_half)
        return new_generation
    else:
        return population

In [18]:
test_population_1 = [['h', 'e', 'k', 'l', ' '],['h', 'a', ' ', 'l', ' '],
                    ['h', 'e', 'k', 'l', ' '],['h', 'a', ' ', 'l', ' '],
                    ['h', 'e', 'k', 'l', ' '],['h', 'a', ' ', 'l', ' '],
                    ['h', 'e', 'k', 'l', ' '],['h', 'a', ' ', 'l', ' '],
                    ['h', 'e', 'k', 'l', ' '],['h', 'a', ' ', 'l', ' '],
                    ['h', 'e', 'k', 'l', ' '],['h', 'a', ' ', 'l', ' ']]
assert crossover(test_population_1, crossover_rate = 0.8)

test_population_2 = [['h', 'e', 'k', 'l', ' ', 't', 'w', 'o', 't', 'f', 'd'], 
                     ['h', 'a', 'k', ' ', ' ', 't', 'p', 'o', 'r', 'l', 'd'],
                    ['h', 'e', 'k', 'l', ' ', 't', 'w', 'o', 't', 'f', 'd'],
                    ['h', 'e', 'k', 'l', ' ', 't', 'w', 'o', 't', 'f', 'd'],
                    ['h', 'e', 'k', 'l', ' ', 't', 'w', 'o', 't', 'f', 'd'],
                    ['h', 'e', 'k', 'l', ' ', 't', 'w', 'o', 't', 'f', 'd'],
                    ['h', 'e', 'k', 'l', ' ', 't', 'w', 'o', 't', 'f', 'd'],
                    ['h', 'e', 'k', 'l', ' ', 't', 'w', 'o', 't', 'f', 'd'],
                    ['h', 'e', 'k', 'l', ' ', 't', 'w', 'o', 't', 'f', 'd'],
                    ['h', 'e', 'k', 'l', ' ', 't', 'w', 'o', 't', 'f', 'd']]
assert crossover(test_population_2, crossover_rate = 0.1)

test_population_3 = [['g', 'o', 'o', 'd', ' ', 'n', 'i', 'g', 'h', 't', ' ', 'w', 'o', 'r', 'l', 'd'],
                     ['r', 'o', ' ', 'd', ' ', 'm', 'i', 'h', 'h', 't', ' ', 'p', 'o', 'l', 'l', 'd'],
                     ['f', 'a', 'o', ' ', ' ', 't', 'i', 'g', 'l', 't', ' ', 'j', 'o', 'e', 'l', 'd'],
                    ['g', 'o', 'o', 'd', ' ', 'n', 'i', 'g', 'h', 't', ' ', 'w', 'o', 'r', 'l', 'd'],
                    ['g', 'o', 'o', 'd', ' ', 'n', 'i', 'g', 'h', 't', ' ', 'w', 'o', 'r', 'l', 'd'],
                    ['g', 'o', 'o', 'd', ' ', 'n', 'i', 'g', 'h', 't', ' ', 'w', 'o', 'r', 'l', 'd'],
                    ['f', 'a', 'o', ' ', ' ', 't', 'i', 'g', 'l', 't', ' ', 'j', 'o', 'e', 'l', 'd'],
                    ['f', 'a', 'o', ' ', ' ', 't', 'i', 'g', 'l', 't', ' ', 'j', 'o', 'e', 'l', 'd'],
                    ['f', 'a', 'o', ' ', ' ', 't', 'i', 'g', 'l', 't', ' ', 'j', 'o', 'e', 'l', 'd'],
                    ['f', 'a', 'o', ' ', ' ', 't', 'i', 'g', 'l', 't', ' ', 'j', 'o', 'e', 'l', 'd']]
assert crossover(test_population_3, crossover_rate = 0.8)

## mutate

`mutate` examines the child determines if one of the indexes needs to be mutated **Used by**: [genetic_algorithm](#genetic_algorithm)

* **population** list: list of the populants
* **gene_set** string: possible values for the genes i.e. ALPHABET
* **mutation_rate** float : the mutation rate

**returns**  list: list of the populants if they are mutated

In [19]:
def mutate (population, gene_set, mutation_rate):
    for i in range(len(population)):
        for j in range(len(population[0])):
            if random.uniform(0,1) < mutation_rate:
                gene_index = random.randint(0, len(population[0]))
                mutation = random.sample(gene_set, 1)
                population[i][j] = mutation[0]
    return population

In [20]:
test_new_gen_1 = [['h', 'e', ' ', 'l', ' '], 
                  ['h', 'a', 'k', 'l', ' ']]
assert mutate(test_new_gen_1, ALPHABET, mutation_rate = 0.5)

test_new_gen_2 = [['h', 'e', 'k', 'l', ' ', 't', 'p', 'o', 'r', 'l', 'd'], 
                  ['h', 'a', 'k', ' ', ' ', 't', 'w', 'o', 't', 'f', 'd']]
assert mutate(test_new_gen_2, ALPHABET, mutation_rate = 0.5)

test_new_gen_3 = [['g', 'o', 'o', 'd', ' ', 'n', 'i', 'g', 'h', 't', ' ', 'p', 'o', 'l', 'l', 'd'], 
                  ['g', 'o', 'o', 'd', ' ', 'n', 'i', 'g', 'l', 't', ' ', 'j', 'o', 'e', 'l', 'd'], 
                  ['r', 'o', ' ', 'd', ' ', 'm', 'i', 'h', 'h', 't', ' ', 'w', 'o', 'r', 'l', 'd'], 
                  ['r', 'o', ' ', 'd', ' ', 'm', 'i', 'h', 'l', 't', ' ', 'j', 'o', 'e', 'l', 'd'], 
                  ['f', 'a', 'o', ' ', ' ', 't', 'i', 'g', 'h', 't', ' ', 'w', 'o', 'r', 'l', 'd'], 
                  ['f', 'a', 'o', ' ', ' ', 't', 'i', 'g', 'h', 't', ' ', 'p', 'o', 'l', 'l', 'd']]

assert mutate(test_new_gen_3, ALPHABET, mutation_rate = 0.5)

## make_children

`make_children` helper function to reproduce offpsrings from the parents **Used by**: [genetic_algorithm](#genetic_algorithm)

* **population** list: list of the populants
* **gene_set** string: possible values for the genes i.e. ALPHABET
* **mutation_rate** float : the mutation rate
* **crossover_rate** float : the crossover rate
* **population_size** int : the population_size

**returns**  list: list of the new generation

In [21]:
def make_children(population, gene_set, population_size, crossover_rate, mutation_rate):
    new_generation = crossover(population, crossover_rate)
    new_generation = mutate(population, gene_set, mutation_rate)
    return new_generation

<a id="genetic_algorithm"></a>
### genetic_algorithm

* **gene_set** string: possible values for the genes i.e. ALPHABET
* **target**: string: the target in question
* **population** list: list of the populants
* **gene_set** string: possible values for the genes i.e. ALPHABET
* **mutation_rate** float : the mutation rate
* **crossover_rate** float : the crossover rate
* **population_size** int : the population_size

**returns**  list: returns the fittest member after x generations

In [22]:
def genetic_algorithm(gene_set, target, population_size, num_generations, crossover_rate, mutation_rate): # add your formal parameters
    generation = 0
    population = initialize_population(gene_set, target, population_size)
    
    while generation != num_generations:
        generation = generation + 1 
        fitness = fitness_evaluation(population, target)
        #(population, fitness) = survival_of_fittest(population, fitness)
        (population, fitness) = sort(population, fitness)
        population = population[0:population_size]
        fittest = ''.join(population[0])
        if generation % 10 == 0:
            print('Generation = {}\n'.format(generation),'Genotype = {}'.format(population[0]))
            print('Phenotype = {}\n'.format(fittest),'Fitness = {}\n'.format(fitness[0]))
        if fitness[0] == 1.0:
            break 
            
        population = make_children(population, gene_set, population_size, crossover_rate, mutation_rate)
    result = ''.join(population[0])
    print('Overall:\n','Genotype = {}\n'.format(population[0]),
          'Phenotype = {}\n'.format(result),'Fitness = {}\n'.format(fitness[0]))
    return population[0] 

## Problem 1

The target is the string "this is so much fun".
The challenge, aside from implementing the basic algorithm, is deriving a fitness function based on "b" - "p" (for example).
The fitness function should come up with a fitness score based on element to element comparisons between target v. phenotype.

In [23]:
target1 = "this is so much fun"

In [24]:
result1 = genetic_algorithm(ALPHABET, target1, population_size = 50, num_generations = 500,
                            crossover_rate = 0.8, mutation_rate = 0.05)

Generation = 10
 Genotype = ['l', 'p', 'o', 'm', 'n', 'e', 'y', 'b', 'c', 's', 't', 'q', 'v', 'a', 'x', 'd', 'f', 'u', 'n']
Phenotype = lpomneybcstqvaxdfun
 Fitness = 0.15789473684210525

Generation = 20
 Genotype = ['f', 'i', 'o', 'g', 'n', 'e', 'y', 'b', 't', 's', 'q', 'q', 'i', 'a', 'x', 'd', 'f', 'u', 'n']
Phenotype = fiogneybtsqqiaxdfun
 Fitness = 0.15789473684210525

Generation = 30
 Genotype = ['m', ' ', 'u', 'e', 'c', 'd', 'y', ' ', 'z', 'o', 'q', 'z', 'q', 'w', 'p', 'g', 'p', 'c', 'n']
Phenotype = m uecdy zoqzqwpgpcn
 Fitness = 0.10526315789473684

Generation = 40
 Genotype = ['m', ' ', 'y', 'c', 'c', 'd', 'y', ' ', 'm', 'o', 'q', 'z', 'q', 'w', 'p', 'v', 'p', 'c', 'n']
Phenotype = m yccdy moqzqwpvpcn
 Fitness = 0.15789473684210525

Generation = 50
 Genotype = ['x', 'd', 'g', 't', ' ', 'y', 'j', 'y', 'e', 'x', 'd', 'n', 'r', 'c', 'q', 'x', 'h', 'a', 'p']
Phenotype = xdgt yjyexdnrcqxhap
 Fitness = 0.10526315789473684

Generation = 60
 Genotype = ['t', 'u', 'i', 'l', 'j', 'c', '

In [25]:
pprint(result1, compact=True)

['t', 'p', 'c', ' ', 'q', 'u', 'j', 'n', 'o', 'o', 'n', 't', 'n', 'u', 'q', ' ',
 'y', 'k', 'w']


## Problem 2

You should have working code now.
The goal here is to think a bit more about fitness functions.
The target string is, 'nuf hcum os si siht'.
This is obviously target #1 but reversed.
The goal is then to derive a "gene vs gene" fitness function (although I am not specifying which gene against which gene).
(You may not reverse the target or the candidate, either before fitness evaluation or afterwards).

<div style="background: lemonchiffon; margin:20px; padding: 20px;">
    <strong>Important</strong>
    <p>
        You may not reverse an entire string (either target or candidate) at any time.
        Everything must be a computation of one gene against one gene (one letter against one letter).
        Failure to follow these directions will result in 0 points for the problem.
    </p>
</div>

The best individual in the population is the one who expresses this string *forwards*.

In [26]:
target2 = "nuf hcum os si siht"

In [27]:
result2 = genetic_algorithm(ALPHABET, target1, population_size = 50, num_generations = 500,
                            crossover_rate = 0.8, mutation_rate = 0.05)

Generation = 10
 Genotype = ['z', 'h', 'p', 'b', 'g', 'f', 'd', 'l', 'q', 'c', ' ', 'o', 'u', 'c', 'n', 'j', 's', 'n', 't']
Phenotype = zhpbgfdlqc oucnjsnt
 Fitness = 0.21052631578947367

Generation = 20
 Genotype = ['z', 'h', 'p', 'b', 'g', 's', 'b', 'j', 'q', 'c', ' ', 'n', 'y', 'c', 'f', 'j', 's', 's', 't']
Phenotype = zhpbgsbjqc nycfjsst
 Fitness = 0.15789473684210525

Generation = 30
 Genotype = ['z', 'h', 'p', 'b', 'g', 's', 'd', 'j', 'b', 'c', ' ', 'n', 'r', 'c', 'y', 'u', 'h', 's', 'b']
Phenotype = zhpbgsdjbc nrcyuhsb
 Fitness = 0.15789473684210525

Generation = 40
 Genotype = ['z', 'p', 'u', 'y', 'd', 'p', 'e', 'y', 'v', 'a', 'g', 'm', 'u', 'l', 'h', 'f', 'g', 'o', 'w']
Phenotype = zpuydpeyvagmulhfgow
 Fitness = 0.15789473684210525

Generation = 50
 Genotype = [' ', 'r', 'j', 'g', ' ', 'i', 'f', 'y', 'n', 'm', ' ', 'i', 'u', 'n', 'o', 'p', 'p', 'o', 'g']
Phenotype =  rjg ifynm iunoppog
 Fitness = 0.21052631578947367

Generation = 60
 Genotype = ['z', 's', 'q', 'c', ' ', 's', '

In [28]:
pprint(result2, compact=True)

['t', 'a', 'm', 'o', 'c', 'p', 'v', ' ', 'z', 'b', 'v', 'e', ' ', 'l', 'd', 'd',
 'i', 'z', 'a']


## Problem 3

This is a variation on a theme.
The Caeser Cypher replaces each letter of a string with the letter 13 characters down alphabet (rotating from "z" back to "a" as needed).
This is also known as ROT13 (for "rotate 13").
Latin did not have spaces (and the space is not continguous with the letters a-z) so we'll remove them from our alphabet.
Again, the goal is to derive a "gene vs gene" fitness function, without global transformations.

<div style="background: lemonchiffon; margin:20px; padding: 20px;">
    <strong>Important</strong>
    <p>
        You may not apply ROT13 to an entire string (either target or candidate) at any time.
        Everything must be a computation of one gene against one gene.
        Failure to follow these directions will result in 0 points for the problem.
    </p>
</div>

The best individual will express the target *decoded*.

In [29]:
ALPHABET3 = "abcdefghijklmnopqrstuvwxyz"

In [30]:
target3 = "guvfvffbzhpusha"

In [31]:
result3 = genetic_algorithm(ALPHABET3, target1, population_size = 50, num_generations = 500,
                            crossover_rate = 0.8, mutation_rate = 0.05)

Generation = 10
 Genotype = ['p', 'v', 'k', 'd', 'q', 'i', 'v', 's', 'z', 's', 'd', 'm', 'i', 'l', 'e', 'a', 'f', 'd', 'k']
Phenotype = pvkdqivszsdmileafdk
 Fitness = 0.15789473684210525

Generation = 20
 Genotype = ['t', 'n', 's', 'p', 'q', 'l', 'n', 'a', 'n', 'q', 'y', 'r', 'o', 'h', 'v', 'u', 'f', 'p', 'f']
Phenotype = tnspqlnanqyrohvufpf
 Fitness = 0.10526315789473684

Generation = 30
 Genotype = ['m', 'e', 't', 'h', 'w', 'g', 'i', 'p', 'p', 'x', 'y', 'r', 'q', 'c', 'k', 'v', 'a', 'u', 'v']
Phenotype = methwgippxyrqckvauv
 Fitness = 0.10526315789473684

Generation = 40
 Genotype = ['p', 'n', 'h', 's', 'v', 'y', 'r', 'l', 'n', 'o', 'v', 'h', 'o', 'm', 'f', 'n', 'n', 'w', 'y']
Phenotype = pnhsvyrlnovhomfnnwy
 Fitness = 0.10526315789473684

Generation = 50
 Genotype = ['w', 'h', 'i', 'v', 'k', 'y', 'm', 'z', 'n', 'x', 'u', 'a', 't', 'z', 'i', 'o', 'f', 'j', 'd']
Phenotype = whivkymznxuatziofjd
 Fitness = 0.10526315789473684

Generation = 60
 Genotype = ['w', 'h', 'i', 'v', 'o', 'y', '

In [32]:
pprint(result3, compact=True)

['e', 'a', 'i', 'f', 'j', 'd', 'e', 'e', 't', 'g', 'e', 't', 'j', 'l', 'u', 'w',
 'j', 'u', 'n']


## Challenge

**You do not need to do this problem and it won't be graded if you do. It's just here if you want to push your understanding.**

The original GA used binary encodings for everything.
We're basically using a Base 27 encoding.
You could, however, write a version of the algorithm that uses an 8 bit encoding for each letter (ignore spaces as they're a bit of a bother).
That is, a 4 letter candidate looks like this:

```
0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1
```

If you wrote your `genetic_algorithm` code general enough, with higher order functions, you should be able to implement it using bit strings instead of latin strings.