In [1]:
from typing import Callable
from lab9_lib import make_problem
from src.genome import Genome
from src.population import Population
from src.chromosome import Chromosome, Mask
from random import shuffle
from itertools import product

## Working with chromosomes

The idea is to understand which parts of a good genome make it good and which part must be improved. If we have 2 good genomes and a technique to understand which part of a genome is good then we can improve the average fitness of recombination (xover) outcome.

Unfortunately the proposed technique to understand which part of a genome is good requires to call the fitness function but this may still give an advantage overall.

The method to understand which part of a genome is good involves the following:
1. generate a random genome(whoose fitness is close to expected fitness, so a common random genome)
2. select some genes from the good genome with a bitmask (like a subnet mask used for networks) these bits compose the chromosome we are evaluating
3. create a new genome which is the clone of the random one then inset the chromosome from the good one.
4. measure the difference between `random_with_chromosome.fitness` and `random.fitness` that's the fitness gain of the chromosome.

If the fitness gain of the chromosome is negative that means we are looking at a chromosome which is a promising target for mutation to improve furthermore the fitness in the good genome.

The following code tries to exemplify the application of this idea but fails in some ways:
- Fitness gain should be measured against a pletora of random genomes (this involves calling fitness many more times)
- The possible chromosomes for each genome is the powerset of genes so clearly this space cannot be explored, but a technique to select best attempts it's not proposed. (Maybe can be used to verify differences among good fitting genomes?)

In [2]:
fitness_fn : Callable[[list[int]], float] = make_problem(10)
loci_count = 100
initial_population = Population.initial(loci_count, fitness_fn, population_size = 100)
random_reference_pop = Population.random(loci_count, fitness_fn, population_size = 100)
chromosome_fitness_samples : dict[Chromosome, list[float]] = dict()

In [3]:
mask_seed = [True if i < 13 else False for i in range(loci_count)]
shuffle(mask_seed)
mask: Mask = tuple(mask_seed)
print(''.join(['1' if bit else '0' for bit in mask]))

0010000001000001000000000001000100000101010000000000100001000000100000000000000000000000100000001000


In [4]:
chromosome_pool : list[Chromosome] = [gene.extract_chromosome(mask) for gene in initial_population.genomes]

In [5]:
def chromosome_fitness_sample(c: Chromosome, random_genome: Genome) -> float:
    """Fitness of the chromosome would be an evaluation of this differential with respect to an infinite amount of random genomes,
    this is a sample from the fitness distribution"""

    mutated_random = random_genome.assign_chromosome(c)
    return mutated_random.fitness - random_genome.fitness

In [6]:
for chromosome in chromosome_pool:
    chromosome_fitness_samples[chromosome] = []

for random_genome, chromosome in product(random_reference_pop.genomes, chromosome_pool):
    sample = chromosome_fitness_sample(chromosome, random_genome)
    chromosome_fitness_samples[chromosome].append(sample)

chromosome_fitness_approx = dict()
for chromosome, samples in chromosome_fitness_samples.items():
    avg = sum(samples) / len(samples)
    chromosome_fitness_approx[chromosome] = avg

best_chromosome : Chromosome = max(chromosome_fitness_approx.items(), key = lambda entry : entry[1])[0]

In [7]:
initial_with_chromosome = []
for genome in initial_population.genomes:
    mutated = genome.assign_chromosome(best_chromosome)
    initial_with_chromosome.append(mutated)

mutated_with_chromosome = Population(initial_with_chromosome)

print(f'{initial_population.average_fitness = :.2%}')
print(f'{mutated_with_chromosome.average_fitness = :.2%}')

initial_population.average_fitness = 25.70%
mutated_with_chromosome.average_fitness = 15.37%


## Recap

From an initial optimized population of 100 genes, for each genome a chromosome (subset of loci) is extracted, these 100 extracted chromosomes are the chromosome_pool.
For each chromosome in the pool the fitness of the chromosome is calculated (this is the tricky part, more follows).
The 'best fit' chromosome is then applied to all the individuals in the initial population to form a new mutated population.
The average fitness of the initial vs mutated population are then compared.

## Conclusions

While this approach seems to work with instance 1, it is not working with instance 10

## Chromosome fitness

Assigning a fitness value to a chromosome is tricky and dangerous, the idea to evaluate chromosome fitness is to force this subset of loci on a set of random genomes and measure the fitness differencial between the randoms and randoms with the chromosome, this is a fitness sample.

The samples are then averaged to extimate chromosome fitness.