# mlrose_hiive Generator and Runner Usage Examples - Andrew Rollings

## Overview

These examples will not solve assignment 2 for you, but they will give you
some idea on how to use the problem generator and runner classes.

Hopefully this will result in slightly fewer
"How do I &lt;insert basic usage here&gt;?" questions every semester...

### Import Libraries

In [103]:
import mlrose_hiive
import numpy as np
import logging

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
from sklearn.metrics import accuracy_score

# switch off the chatter
logging.basicConfig(level=logging.WARNING)

### Example 1: Generating and Running 8-Queens

In [104]:
from mlrose_hiive import QueensGenerator, SARunner

# Generate a new 8-Queen problem using a fixed seed.
problem = QueensGenerator().generate(seed=123456, size=8)

In [105]:
# create a runner class and solve the problem
sa = SARunner(problem=problem,
              experiment_name='queens8',
              output_directory=None, # note: specify an output directory to have results saved to disk
              seed=123456,
              iteration_list=2 ** np.arange(11),
              max_attempts=500,
              temperature_list=[0.1, 0.5, 0.75, 1.0, 2.0, 5.0],
              decay_list=[mlrose_hiive.GeomDecay])
# the two data frames will contain the results
df_run_stats, df_run_curves = sa.run()

The preceding code will run the `SA` algorithm six times for at most 1024 iterations.
Each run is initialized with the temperature specified in the `temperature_list`
using the temperature values specified.

If the fitness remains static for `max_attempts` iterations, it will terminate that run.

Note that the initial state parameters here are just toy values picked specifically
for this example. You will have to choose your own range of values for your
assignment. I strongly recommend you don't just copy these, or you will find
that the grading is unlikely to go the way you would like.

The output in the `df_run_stats` dataframe contains snapshots of the state of the algorithm at the iterations
specified in the `iteration_list` passed into the runner class.

The first 12 rows (corresponding to the first run of this algorithm) are as follows:

In [106]:
print(df_run_stats[['Iteration', 'Fitness', 'Time', 'State']][0:12])

    Iteration  Fitness      Time                     State
0           0     11.0  0.000697  [1, 2, 2, 1, 0, 3, 7, 3]
1           1      9.0  0.005295  [1, 2, 2, 0, 0, 3, 7, 3]
2           2      8.0  0.009110  [1, 2, 2, 0, 0, 3, 7, 5]
3           4      8.0  0.014962  [1, 2, 2, 5, 0, 3, 7, 5]
4           8      5.0  0.019447  [1, 2, 7, 5, 0, 3, 5, 5]
5          16      4.0  0.026150  [1, 2, 7, 5, 3, 0, 5, 5]
6          32      4.0  0.036980  [1, 5, 7, 5, 0, 0, 3, 4]
7          64      1.0  0.052932  [1, 5, 2, 6, 3, 0, 7, 4]
8         128      1.0  0.076807  [1, 5, 2, 6, 3, 0, 4, 7]
9         256      1.0  0.126790  [1, 7, 2, 6, 3, 5, 0, 4]
10        512      0.0  0.187672  [1, 5, 0, 6, 3, 7, 2, 4]
11       1024      0.0  0.187672  [1, 5, 0, 6, 3, 7, 2, 4]


The state information is excluded from the previous output.

A sample of this (based on the state of the `GeomDecay` object) is below:

In [107]:
state_sample = df_run_stats[['schedule_current_value', 'schedule_init_temp', 'schedule_min_temp']][:1]
print(state_sample)


   schedule_current_value  schedule_init_temp  schedule_min_temp
0                0.099999                 0.1              0.001


So, to pick out the most performant run from the dataframe, you need to find the row with the best fitness.
As 8-Queens is a minimization problem, you'd pick the row with the minimum fitness.

However, I'm going to look in the `run_curves` (which stores minimal basic information every iteration) to
find out which input state achieved the best fitness in the fewest iterations.

In [108]:
best_fitness = df_run_curves['Fitness'].min()
best_runs = df_run_curves[df_run_curves['Fitness'] == best_fitness]

print(best_runs)

      Iteration      Time  Fitness Temperature  max_iters
414         414  0.187672      0.0         0.1       1024
829         414  0.187672      0.0         0.5       1024
2416        561  0.078745      0.0         1.0       1024
2739        322  0.153166      0.0         2.0       1024
3164        424  0.016058      0.0         5.0       1024


This gives us five candidates for the best run. We are going to pick the one with
that reached the best fitness value in the fewest number of iterations.

In [109]:
minimum_iterations = best_runs['Iteration'].min()

best_curve_run = best_runs[best_runs['Iteration'] == minimum_iterations]

The best run using these criteria is as follows:

In [110]:
print(best_curve_run)

      Iteration      Time  Fitness Temperature  max_iters
2739        322  0.153166      0.0         2.0       1024


Which has the following identifying state information:

In [111]:
best_init_temperature = best_curve_run['Temperature'].iloc()[0].init_temp

print(f'Best initial temperature: {best_init_temperature}')


Best initial temperature: 2.0


To map this back to the `run_stats` we look at the configuration data included in
the curve data. The curve data includes at least the minimum identifying information
to determine which run each row came from.

In this case, the value we are looking for is the `Temperature`, which is the initial temperature
used to initialize the `GeomDecay` object.

So, in this case, we are looking for all rows in `df_run_stats` where the temperature is equal to 1.

In [112]:
run_stats_best_run = df_run_stats[df_run_stats['schedule_init_temp'] == best_init_temperature]
print(run_stats_best_run[['Iteration', 'Fitness', 'Time', 'State']])

    Iteration  Fitness      Time                     State
48          0     11.0  0.000240  [1, 2, 2, 1, 0, 3, 7, 3]
49          1      9.0  0.004562  [1, 2, 2, 0, 0, 3, 7, 3]
50          2      8.0  0.008853  [1, 2, 2, 0, 0, 3, 7, 5]
51          4      8.0  0.013515  [1, 2, 2, 5, 0, 3, 7, 5]
52          8      7.0  0.019255  [1, 2, 2, 5, 0, 3, 5, 5]
53         16      6.0  0.025720  [3, 2, 3, 5, 0, 1, 5, 5]
54         32      4.0  0.036483  [3, 5, 6, 5, 5, 0, 4, 7]
55         64      5.0  0.052563  [2, 0, 3, 6, 1, 2, 1, 7]
56        128      3.0  0.076311  [2, 0, 6, 3, 5, 0, 4, 3]
57        256      2.0  0.118043  [7, 1, 3, 6, 6, 4, 0, 5]
58        512      0.0  0.140145  [7, 1, 3, 0, 6, 4, 2, 5]
59       1024      0.0  0.140145  [7, 1, 3, 0, 6, 4, 2, 5]


And the best state associated with this is:

In [114]:
best_state = run_stats_best_run[['schedule_current_value', 'schedule_init_temp', 'schedule_min_temp']][:1]
print(best_state)

    schedule_current_value  schedule_init_temp  schedule_min_temp
48                1.999995                 2.0              0.001
