# KE5207 CA1 Genetic Algorithm Modelling - Data Exploration & Transformation

## Load libraries

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import os
import pandas as pd
from sklearn.model_selection import train_test_split
import myUtilities as mu

## Load data

In [2]:
gifts_df = pd.read_csv(os.path.join('data', 'gifts.csv'), header=0)
gifts_df.head()

Unnamed: 0,GiftId,Latitude,Longitude,Weight
0,1,16.345769,6.303545,1.0
1,2,12.494749,28.626396,15.52448
2,3,27.794615,60.032495,8.058499
3,4,44.426992,110.114216,1.0
4,5,-69.854088,87.946878,25.088892


In [3]:
gifts_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100000 entries, 0 to 99999
Data columns (total 4 columns):
GiftId       100000 non-null int64
Latitude     100000 non-null float64
Longitude    100000 non-null float64
Weight       100000 non-null float64
dtypes: float64(3), int64(1)
memory usage: 3.1 MB


In [4]:
# Split a train dataset.
X_train, X_test = train_test_split(gifts_df.values, test_size=.999, shuffle=True, random_state=42)
X_train[:5]

array([[  4.07750000e+04,   4.56667256e+01,   1.31115021e+02,
          1.00000000e+00],
       [  4.89850000e+04,   4.78515402e+01,   1.16923136e+02,
          1.00000000e+00],
       [  6.12290000e+04,  -7.65615108e+01,  -7.84217433e+01,
          1.00000000e+00],
       [  5.12150000e+04,   6.36793270e+01,   1.06490055e+02,
          3.81441072e+01],
       [  3.80450000e+04,   3.00427648e+01,   4.12529144e+01,
          3.99471623e+01]])

In [5]:
len(X_train)

100

We will just use this subset of the data for testing purposes.

In [6]:
# Calculate the total weight of the gifts in the training split.
X_train[:, 3].sum()

1451.1812270955897

## Model the problem for Genetic Algorithm

In this first phase, the following representation is used :
* a chromosomes is 1 possible trip which will deliver some gifts starting and ending at the depot.
* the hard constraints are :
	* each trip has a total weight less than the weight limit (1000)
	* each gift can only be delivered once
* an individual is a set of trips or chromosomes which will deliver all the gifts
* the fitness function is the total weighted reindeer weariness which we seek to minimise

* Given the fitness function, some soft constraints to consider are :
	* long distance trips should have low weight
	* short distance trips should have high weight

* Other considerations :
	* A trip to deliver a set of gifts can be aranged in an order which minimises the fitness function

## Create an individual for the initial population using a sequential insertion heuristic

The weight limit in myUtilities.py is set to 100 for testing.

In [7]:
indiv = mu.create_indiv(X_train[:,0].tolist())

In [8]:
len(indiv)

17

There are 17 trips in this individual.

In [9]:
indiv[0]

[18432.0,
 17160.0,
 12186.0,
 71933.0,
 770.0,
 87314.0,
 86780.0,
 66558.0,
 9269.0,
 11535.0,
 82799.0,
 43002.0,
 64821.0,
 64926.0,
 31552.0,
 45759.0,
 48985.0,
 60264.0,
 24301.0,
 67122.0,
 76553.0,
 5312.0,
 1017.0,
 41607.0,
 87499.0,
 56887.0,
 68149.0,
 67970.0,
 61229.0,
 44132.0,
 15796.0,
 48556.0]

In [10]:
total_gift_weight = 0
for i in indiv[0]:
    total_gift_weight += mu.gift_weight[i]
total_gift_weight

89.74603067518999

The weight limit is respected. Check for some other trips.

In [11]:
indiv[1]

[861.0,
 11395.0,
 78954.0,
 93017.0,
 80078.0,
 89476.0,
 92094.0,
 37195.0,
 77190.0,
 91388.0,
 94664.0,
 23248.0,
 59151.0,
 39100.0,
 40775.0]

In [12]:
total_gift_weight = 0
for i in indiv[1]:
    total_gift_weight += mu.gift_weight[i]
total_gift_weight

87.56021789493

In [13]:
indiv[7]

[84655.0, 8572.0, 23484.0, 69093.0, 41091.0]

In [14]:
total_gift_weight = 0
for i in indiv[7]:
    total_gift_weight += mu.gift_weight[i]
total_gift_weight

83.97955320653

In [15]:
indiv[16]

[16024.0, 52996.0]

In [16]:
total_gift_weight = 0
for i in indiv[16]:
    total_gift_weight += mu.gift_weight[i]
total_gift_weight

87.6090942406

In [17]:
total_gifts = 0
all_gifts = []
for i in indiv:
    total_gifts += len(i)
    all_gifts = all_gifts + i

In [18]:
total_gifts

100

In [19]:
len(all_gifts)

100

In [20]:
set(all_gifts)^set(X_train[:,0].tolist())

set()

The difference is an empty set. All the gifts had been allocated to the trips in the individual. 