- When compressing data, depending on the compression algorithm used, the order of the items could impact the compression ratio. This is true for the __compress()__ function from the __zlib__ module.

In [1]:
# import modules and libraries
from __future__ import annotations
from typing import Tuple, List, Any
from chromosome import Chromosome
from genetic_algorithm import GeneticAlgorithm
from random import shuffle, sample
from copy import deepcopy
from zlib import compress
from sys import getsizeof
from pickle import dumps

The list to compress:

In [2]:
# 165 bytes compressed
PEOPLE: List[str] = ['Michael', 'Sarah', 'Joshua', 'Narine', 'David',
     'Sajid', 'Melanie', 'Daniel', 'Wei', 'Dean', 'Brian', 'Murat', 'Lisa'] 

List compression chromosome:

In [3]:
class ListCompression(Chromosome):
    def __init__(self, lst: List[Any]) -> None:
        self.lst: List[Any] = lst
            
    @property
    def bytes_compressed(self) -> int:
        return getsizeof(compress(dumps(self.lst)))
    
    def fitness(self) -> float:
        return 1/self.bytes_compressed
    
    @classmethod
    def random_instance(cls) -> ListCompression:
        mylst: List[str] = deepcopy(PEOPLE)
        shuffle(mylst)
        return ListCompression(mylst)
    
    def crossover(self, other: ListCompression) -> Tuple[ListCompression,
                                                        ListCompresson]:
        child1: ListCompression = deepcopy(self)
        child2: ListCompression = deepcopy(other)
        idx1, idx2 = sample(range(len(self.lst)), k = 2)
        l1, l2 = child1.lst[idx1], child2.lst[idx2]
        child1.lst[child1.lst.index(l2)], child1.lst[idx2] = child1.lst[idx2], l2
        child2.lst[child2.lst.index(l1)], child2.lst[idx1] = child2.lst[idx1], l1
        
        return child1, child2
    
    def mutate(self) -> None: # swap two locations
        idx1, idx2 = sample(range(len(self.lst)), k = 2)
        self.lst[idx1], self.lst[idx2] = self.lst[idx2], self.lst[idx1]
        
    def __str__(self) -> str:
        return f'Order: {self.lst} Bytes: {self.bytes_compressed}'

Run the algorithm:

In [4]:
initial_population: List[ListCompression] = [ListCompression.random_instance() for _ in range(1000)]

ga: GeneticAlgorithm[ListCompression] = GeneticAlgorithm(initial_population=initial_population,
                                        threshold=1.0, max_generations = 1000,
                                        mutation_chance = 0.2, crossover_chance = 0.7,
                                        selection_type=GeneticAlgorithm.SelectionType.TOURNAMENT)

result: ListCompression = ga.run()
print(result)

Generation 0 Best 0.006172839506172839 Avg                   0.006051017898040181
Generation 1 Best 0.006211180124223602 Avg                   0.006097888379233678
Generation 2 Best 0.006211180124223602 Avg                   0.006126513157555196
Generation 3 Best 0.006211180124223602 Avg                   0.006171814999109769
Generation 4 Best 0.006211180124223602 Avg                   0.006179975045019609
Generation 5 Best 0.006211180124223602 Avg                   0.006184372338019521
Generation 6 Best 0.006211180124223602 Avg                   0.006188856381293523
Generation 7 Best 0.006211180124223602 Avg                   0.006189069634937285
Generation 8 Best 0.006211180124223602 Avg                   0.006186912325452493
Generation 9 Best 0.006211180124223602 Avg                   0.006190870476244157
Generation 10 Best 0.006211180124223602 Avg                   0.006188835492910958
Generation 11 Best 0.00625 Avg                   0.006189496111942145
Generation 12 Best 0.00625 

Generation 115 Best 0.00625 Avg                   0.006231157233318561
Generation 116 Best 0.00625 Avg                   0.006229148443581561
Generation 117 Best 0.00625 Avg                   0.006221326260267143
Generation 118 Best 0.00625 Avg                   0.006225492507536378
Generation 119 Best 0.00625 Avg                   0.006226328603869978
Generation 120 Best 0.00625 Avg                   0.006229836728234843
Generation 121 Best 0.00625 Avg                   0.006228903335340165
Generation 122 Best 0.00625 Avg                   0.006228196905742829
Generation 123 Best 0.00625 Avg                   0.006228851407625517
Generation 124 Best 0.00625 Avg                   0.006226981080650095
Generation 125 Best 0.00625 Avg                   0.00622901370192759
Generation 126 Best 0.00625 Avg                   0.006227609493259385
Generation 127 Best 0.00625 Avg                   0.00622774347055749
Generation 128 Best 0.00625 Avg                   0.006227361953883146
Generati

Generation 231 Best 0.00625 Avg                   0.006231075419253039
Generation 232 Best 0.00625 Avg                   0.006227673854495451
Generation 233 Best 0.00625 Avg                   0.006226946454215854
Generation 234 Best 0.00625 Avg                   0.0062262740282000755
Generation 235 Best 0.00625 Avg                   0.006226442035529019
Generation 236 Best 0.00625 Avg                   0.0062259586772748276
Generation 237 Best 0.00625 Avg                   0.006227732897987202
Generation 238 Best 0.00625 Avg                   0.006225120185278694
Generation 239 Best 0.00625 Avg                   0.006230977265141606
Generation 240 Best 0.00625 Avg                   0.0062249068552345495
Generation 241 Best 0.00625 Avg                   0.006228895942672326
Generation 242 Best 0.00625 Avg                   0.006229086270366642
Generation 243 Best 0.00625 Avg                   0.0062276211954326935
Generation 244 Best 0.00625 Avg                   0.006228100319336128
Ge

Generation 347 Best 0.00625 Avg                   0.0062233529123988905
Generation 348 Best 0.00625 Avg                   0.006225061460649557
Generation 349 Best 0.00625 Avg                   0.006229538804107009
Generation 350 Best 0.00625 Avg                   0.0062287503303633624
Generation 351 Best 0.00625 Avg                   0.006229801125024187
Generation 352 Best 0.00625 Avg                   0.006227758424874748
Generation 353 Best 0.00625 Avg                   0.006226537881089845
Generation 354 Best 0.00625 Avg                   0.00622874710530759
Generation 355 Best 0.00625 Avg                   0.006226829823070913
Generation 356 Best 0.00625 Avg                   0.006224214356376277
Generation 357 Best 0.00625 Avg                   0.006225780568615452
Generation 358 Best 0.00625 Avg                   0.006228235330930491
Generation 359 Best 0.00625 Avg                   0.006227840760612717
Generation 360 Best 0.00625 Avg                   0.00623078162892834
Genera

Generation 463 Best 0.00625 Avg                   0.006226578135406227
Generation 464 Best 0.00625 Avg                   0.006222799263975477
Generation 465 Best 0.00625 Avg                   0.006225125199925388
Generation 466 Best 0.00625 Avg                   0.006226954204125544
Generation 467 Best 0.00625 Avg                   0.006223912096033861
Generation 468 Best 0.00625 Avg                   0.006228518443787267
Generation 469 Best 0.00625 Avg                   0.006227259751589926
Generation 470 Best 0.00625 Avg                   0.0062264855087869565
Generation 471 Best 0.00625 Avg                   0.006227211679586465
Generation 472 Best 0.00625 Avg                   0.006226355095355067
Generation 473 Best 0.00625 Avg                   0.006227188533863102
Generation 474 Best 0.00625 Avg                   0.006226947333781853
Generation 475 Best 0.00625 Avg                   0.006228079635140867
Generation 476 Best 0.00625 Avg                   0.006225316544059434
Gener

Generation 579 Best 0.00625 Avg                   0.006228632724945913
Generation 580 Best 0.00625 Avg                   0.006228139420072461
Generation 581 Best 0.00625 Avg                   0.006228318345758556
Generation 582 Best 0.00625 Avg                   0.0062304700427387285
Generation 583 Best 0.00625 Avg                   0.006228610329489843
Generation 584 Best 0.00625 Avg                   0.0062286942478541815
Generation 585 Best 0.00625 Avg                   0.00622443028481597
Generation 586 Best 0.00625 Avg                   0.006227771401909177
Generation 587 Best 0.00625 Avg                   0.0062276991624056225
Generation 588 Best 0.00625 Avg                   0.006228160592757098
Generation 589 Best 0.00625 Avg                   0.006226189013611605
Generation 590 Best 0.00625 Avg                   0.006228641297807007
Generation 591 Best 0.00625 Avg                   0.006227485669924157
Generation 592 Best 0.00625 Avg                   0.0062260675414500906
Gen

Generation 695 Best 0.00625 Avg                   0.006227205641563912
Generation 696 Best 0.00625 Avg                   0.006227593212826738
Generation 697 Best 0.00625 Avg                   0.00622840792452384
Generation 698 Best 0.00625 Avg                   0.0062296299959068435
Generation 699 Best 0.00625 Avg                   0.006228035716662312
Generation 700 Best 0.00625 Avg                   0.006230425313146984
Generation 701 Best 0.00625 Avg                   0.006228210261069888
Generation 702 Best 0.00625 Avg                   0.006228177024510849
Generation 703 Best 0.00625 Avg                   0.006228649944127781
Generation 704 Best 0.00625 Avg                   0.006226786972007673
Generation 705 Best 0.00625 Avg                   0.006222946819132246
Generation 706 Best 0.00625 Avg                   0.006222151733655359
Generation 707 Best 0.00625 Avg                   0.00622812380894875
Generation 708 Best 0.00625 Avg                   0.006224113217780452
Generat

Generation 811 Best 0.00625 Avg                   0.00622737660229388
Generation 812 Best 0.00625 Avg                   0.006226698923445405
Generation 813 Best 0.00625 Avg                   0.006226053803202834
Generation 814 Best 0.00625 Avg                   0.006225515705101928
Generation 815 Best 0.00625 Avg                   0.006226922594884267
Generation 816 Best 0.00625 Avg                   0.006224955715709463
Generation 817 Best 0.00625 Avg                   0.006223667517261362
Generation 818 Best 0.00625 Avg                   0.006226282196730216
Generation 819 Best 0.00625 Avg                   0.006225104584398311
Generation 820 Best 0.00625 Avg                   0.0062261147267411564
Generation 821 Best 0.00625 Avg                   0.0062268962904840795
Generation 822 Best 0.00625 Avg                   0.006226405850824267
Generation 823 Best 0.00625 Avg                   0.006226736865512641
Generation 824 Best 0.00625 Avg                   0.0062262396200535725
Gene

Generation 927 Best 0.00625 Avg                   0.006228826122580393
Generation 928 Best 0.00625 Avg                   0.006228485797226821
Generation 929 Best 0.00625 Avg                   0.006226660453907243
Generation 930 Best 0.00625 Avg                   0.006224425949734344
Generation 931 Best 0.00625 Avg                   0.006228993850313535
Generation 932 Best 0.00625 Avg                   0.006226521609117914
Generation 933 Best 0.00625 Avg                   0.006225282386360907
Generation 934 Best 0.00625 Avg                   0.0062264194668789205
Generation 935 Best 0.00625 Avg                   0.00622617536017447
Generation 936 Best 0.00625 Avg                   0.006223485843185341
Generation 937 Best 0.00625 Avg                   0.00622867454904741
Generation 938 Best 0.00625 Avg                   0.006223027249170667
Generation 939 Best 0.00625 Avg                   0.00622793243856776
Generation 940 Best 0.00625 Avg                   0.006226729574335158
Generati

"Genetic algorithms are not a panacea. In fact, they are not suitable for most problems. For any problem in which a fast deterministic algorithm exists, a genetic algorithm approach does not make sense. Their inherently stochastic nature makes their runtimes unpredictable. To solve this problem, they can be cut off after a certain number of generations. But then it is not clear if a truly optimal solution has been found. "

"Another, more specific issue worth mentioning is challenges related to the roulette-wheel selection method described in this chapter. Roulette-wheel selection, sometimes referred to as fitness proportional selection, can lead to a lack of diversity in a population due to the dominance of relatively fit individuals each time selection is run. On the other hand, if fitness values are close together, roulette-wheel selection can lead to a lack of selection pressure.[5] Further, roulette-wheel selection, as constructed in this chapter, does not work for problems in which fitness can be measured with negative values, as in our simple equation example in section 5.3. "

- Genetic algorithms are often used on problems that do not require perfectly optimal solution, such as complex scheduling problems, protein and drug design.