Python Library Usage

Version 1.2.9

Help

Documentation can be displayed within Python:

import xcsf
help(xcsf.xcsf)

Constructor

Example:

import xcsf

xcs = xcsf.XCS(
    x_dim=8,  # number of input feature variables
    y_dim=1,  # number of predicted target variables (1 for reinforcement learning)
    n_actions=2  # number of actions or classes (1 for supervised learning)
)

Library Stub:

def __init__(self, x_dim: int, y_dim: int, n_actions: int) -> None ...

Initialising General Parameters

Default parameter values are hard-coded within XCSF. At run-time, the values may be overridden within Python by using the following properties:

# General XCSF
xcs.OMP_NUM_THREADS = 8  # number of CPU cores to use 
xcs.POP_INIT = True  # whether to seed the population with random rules
xcs.POP_SIZE = 200  # maximum population size
xcs.MAX_TRIALS = 1000  # number of trials to execute for each xcs.fit()
xcs.PERF_TRIALS = 1000  # number of trials to average performance output
xcs.LOSS_FUNC = "mae"  # mean absolute error
xcs.LOSS_FUNC = "mse"  # mean squared error
xcs.LOSS_FUNC = "rmse"  # root mean squared error
xcs.LOSS_FUNC = "log"  # log loss (cross-entropy)
xcs.LOSS_FUNC = "binary_log"  # binary log loss
xcs.LOSS_FUNC = "onehot"  # one-hot encoding classification error
xcs.LOSS_FUNC = "huber"  # Huber error
xcs.HUBER_DELTA = 1  # delta parameter for Huber error calculation
xcs.seed(seed)  # sets the random number seed; uses the current time if not set

# General Classifier
xcs.E0 = 0.01  # target error, under which accuracy is set to 1
xcs.ALPHA = 0.1  # accuracy offset for rules above E0 (1=disabled)
xcs.NU = 5  # accuracy slope for rules with error above E0
xcs.BETA = 0.1  # learning rate for updating error, fitness, and set size
xcs.DELTA = 0.1  # fraction of least fit classifiers to increase deletion vote
xcs.THETA_DEL = 20  # min experience before fitness used in probability of deletion
xcs.INIT_FITNESS = 0.01  # initial classifier fitness
xcs.INIT_ERROR = 0  # initial classifier error
xcs.M_PROBATION = 10000  # trials since creation a rule must match at least 1 input or be deleted
xcs.STATEFUL = True  # whether classifiers should retain state across trials
xcs.SET_SUBSUMPTION = False  # whether to perform set subsumption
xcs.THETA_SUB = 100  # minimum experience of a classifier to become a subsumer
xcs.COMPACTION = False  # if enabled and sys err < E0, the largest of 2 roulette spins is deleted

# Multi-step Problems
xcs.TELETRANSPORTATION = 50  # num steps to reset a multistep problem if goal not found
xcs.GAMMA = 0.95  # discount factor in calculating the reward for multistep problems
xcs.P_EXPLORE = 0.9  # probability of exploring vs. exploiting in a multistep trial

# Evolutionary Algorithm
xcs.EA_SELECT_TYPE = "roulette"  # roulette wheel parental selection
xcs.EA_SELECT_TYPE = "tournament"  # tournament parental selection
xcs.EA_SELECT_SIZE = 0.4  # fraction of set size for tournament parental selection
xcs.THETA_EA = 50  # average set time between EA invocations
xcs.LAMBDA = 2  # number of offspring to create each EA invocation (use multiples of 2)
xcs.P_CROSSOVER = 0.8  # probability of applying crossover
xcs.ERR_REDUC = 1.0  # amount to reduce an offspring error (1=disabled)
xcs.FIT_REDUC = 0.1  # amount to reduce an offspring fitness (1=disabled)
xcs.EA_SUBSUMPTION = False  # whether to try and subsume offspring classifiers
xcs.EA_PRED_RESET = False  # whether to reset offspring predictions instead of copying

Please note that the default parameters are not intended as general values suitable for all problems and must be set appropriately for the specific learning task.

Initialising Conditions

Always match (dummy)

The use of always matching conditions results in the match set being equal to the population set, i.e., [M] = [P]. The evolutionary algorithm and classifier updates are thus performed within [P], and global models are designed (e.g., neural networks) that cover the entire state-space. This configuration operates as a more traditional evolutionary algorithm, which can be useful for debugging and benchmarking.

Additionally, a single global model (e.g., a linear regression) can be fit by also setting POP_SIZE = 1 and disabling the evolutionary algorithm by setting the invocation frequency to a larger number than will ever be executed, e.g., THETA_EA = 5000000. This can also be useful for debugging and benchmarking.

xcs.condition("dummy")

Ternary Bitstrings

With ternary bitstrings, each classifier's condition is represented as $cl.C \in \{0,1,\#\}^L$ where the length of the string $L$ is equal to the x_dim multiplied by the number of encoding bits.

For binary problems, the number of encoding bits is simply: bits = 1. For real-valued inputs, the values are binarised to the specified number of bits with the assumption that the inputs are in the range [0,1]. For example with bits = 2, an input vector [0.23,0.76,0.45,0.5] will be converted to [0,0,1,1,0,1,0,1] before being tested for matching with the ternary bitstring using the alphabet {0,1,#} where the don't care symbol # matches either bit.

Uniform crossover is applied with probability P_CROSSOVER and a single self-adaptive mutation rate (log normal) is used.

args = {
    "bits": 2, # number of bits per float to binarise inputs
    "p_dontcare": 0.5, # don't care probability during covering
}
xcs.condition("ternary", args)

Related Literature:

S. W. Wilson (1995) Classifier fitness based on accuracy

Hyperrectangles and Hyperellipsoids

Hyperellipsoids currently use the center-spread representation (and axis-rotation is not yet implemented.)

Hyperrectangles currently implement the center-spread and unordered-bound representations.

With the hyperrectangle center-spread representation, each classifier condition is represented as a concatenation of interval predicates, $cl.C = (c_i, s_i)^L$ where $L$ is equal to the x_dim and $c_i, s_i \in \mathbb{R}$. $c_i$ encodes the center of the interval and $s_i$ encodes the spread (or width.) A classifier matches an input $x$ with attributes $x_i$ if and only if $(c_i - s_i) \le x_i \le (c_i + s_i)$ for all $x_i$.

With the hyperrectangle unordered-bound representation, each classifier condition is represented as a concatenation of interval predicates, $cl.C = (p_i, q_i)^L$ where $L$ is equal to the x_dim and $p_i, q_i \in \mathbb{R}$. A classifier matches an input $x$ with attributes $x_i$ if and only if $min(p_i, q_i) \le x_i \le max(p_i, q_i)$ for all $x_i$.

Uniform crossover is applied with probability P_CROSSOVER. A single self-adaptive mutation rate (log normal) specifies the standard deviation used to sample a random Gaussian (with zero mean) which is added to each center and spread value (or bound for unordered-bounds).

For center-spread representations, if eta > 0 each classifier's centers are adjusted at rate $\eta$ towards the mean of the observed inputs during each update (see Tamee et al., 2007). That is, $c_i \leftarrow c_i + \eta (x_i - c_i)$.

args = {
    "min": 0, # minimum value of a center/bound
    "max": 1, # maximum value of a center/bound
    "min_spread": 0.1, # minimum initial spread
    "eta": 0, # gradient descent rate for moving centers to mean inputs matched
}
xcs.condition("hyperrectangle_csr", args)  # center-spread
xcs.condition("hyperrectangle_ubr", args)  # unordered-bound
xcs.condition("hyperellipsoid", args)  # center-spread

Related Literature:

S. W. Wilson (2000) Get real! XCS with continuous-valued inputs
C. Stone and L. Bull (2003) For real! XCS with continuous-valued inputs
M. V. Butz (2005) Kernel-based, ellipsoidal conditions in the real-valued XCS classifier system
M. V. Butz, P.-L. Lanzi, and S. W. Wilson (2006) Hyper-ellipsoidal conditions in XCS: rotation, linear approximation, and solution structure
K. Tamee, L. Bull, and O. Pinngern (2007) Towards clustering with XCS

GP Trees

GP trees currently use arithmetic operators from the set {+,-,/,*}. Return values from each node are clamped [-1000,1000]. The rule matches if the output node is greater than 0.5. Subsumption is not implemented.

Sub-tree crossover is applied with probability P_CROSSOVER. A single self-adaptive mutation rate (rate selection) is used to specify the per allele probability of performing mutation where terminals are randomly replaced with other terminals and functions randomly replaced with other functions.

args = {
    "min_constant": 0, # minimum value of a constant
    "max_constant": 1, # maximum value of a constant
    "n_constants": 100, # number of (global) constants available
    "init_depth": 5, # initial depth of a tree
    "max_len": 10000, # maximum initial length of a tree
}
xcs.condition("tree_gp", args)

DGP Graphs

Temporally dynamic graphs with fuzzy symbolic functions selected from the CFMQVS set: {fuzzy NOT, fuzzy AND, fuzzy OR}. Each graph is initialised with a randomly selected function assigned to each node and random connectivity (including recurrent connections) and is synchronously updated in parallel for T cycles before sampling the output node(s). These graphs can exhibit inherent memory by retaining state across inputs. Inputs must be in the range [0,1].

Currently implements a fixed number of nodes with the connectivity and update cycles evolved along with the function for each node. Log normal self-adaptive mutation is used for node function and connectivity and uniform self-adaptive mutation for the number of update cycles.

When used as conditions, the number of nodes n must be at least 1 and the rule matches a given input if the state of that node is greater than 0.5 after updating the graph T times. When used as condition + action rules, the action is encoded as binary (discretising the node outputs with threshold 0.5); for example with 8 actions, a minimum of 3 additional nodes are required. Subsumption is not implemented.

args = {
    "max_k": 2, # number of connections per node
    "max_t": 10, # maximum number of cycles to update graphs
    "n": 20, # number of nodes in the graph
    "evolve_cycles": True, # whether to evolve the number of update cycles
}
xcs.condition("dgp", args)
xcs.condition("rule_dgp", args) # conditions + actions in single DGP graphs

Neural Networks

Condition output layers should be set to a single neuron, i.e., "n_init": 1. A classifier matches an input if this output neuron is greater than 0.5.

When used to represent conditions and actions within a single network ("rules") the output layers should be "n_init": 1 + binary where binary is the number of outputs required to output binary actions. For example, for 8 actions, 3 binary outputs are required and the output layer should contain 4 neurons. Again, the neuron states of the action outputs are discretised with threshold 0.5. Subsumption is not implemented.

See Neural Network Initialisation.

xcs.condition("neural", layer_args)
xcs.condition("rule_neural", layer_args) # conditions + actions in single neural nets

Related Literature:

L. Bull (2002) On using constructivism in neural classifier systems
R. J. Preen, S. W. Wilson, and L. Bull (2021) Autoencoding with a classifier system
R. J. Preen and L. Bull (2021) Deep learning with a classifier system: Initial results

Initialising Actions

Integers

A constant integer value. A single self-adaptive mutation rate (log normal) specifies the probability of randomly reselecting the value.

xcs.action("integer")

Related Literature:

S. W. Wilson (1995) Classifier fitness based on accuracy

Neural Networks

Output layer should be a softmax. See Neural Network Initialisation.

xcs.action("neural", layer_args)

Related Literature:

T. O'Hara and L. Bull (2005) A memetic accuracy-based neural learning classifier system
P.-L. Lanzi and D. Loiacono (2007) Classifier systems that compute action mappings
D. Howard, L. Bull, and P.-L. Lanzi (2015) A cognitive architecture based on a learning classifier system with spiking classifiers

Initialising Predictions

Constant

Original XCS behaviour can be specified with piece-wise constant predictions. These are updated with (reward or payoff) target $y$ and learning rate $\beta$. For example:

if $exp_j < 1 / \beta$:
- $p_j \leftarrow (p_j \times (exp_j - 1) + y) / exp_j$
otherwise:
- $p_j \leftarrow p_j + \beta (y - p_j)$

xcs.BETA = 0.1 # classifier update rate includes constant predictions
xcs.prediction("constant")

Related Literature:

S. W. Wilson (1995) Classifier fitness based on accuracy

Normalised Least Mean Squares

If eta is evolved, the rate is initialised uniformly random [eta_min, eta]. Offspring inherit the rate and a single (log normal) self-adaptive mutation rate specifies the standard deviation used to sample a random Gaussian (with zero mean) which is added to eta (similar to evolution strategies).

args = {
    "x0": 1, # offset value
    "eta": 0.1, # gradient descent update rate (maximum value, if evolved)
    "eta_min": 0.0001, # minimum gradient descent update rate (if evolved)
    "evolve_eta": True, # whether to evolve the gradient descent rate
}
xcs.prediction("nlms_linear", args)
xcs.prediction("nlms_quadratic", args)

Related Literature:

S. W. Wilson (2001) Function approximation with a classifier system
S. W. Wilson (2002) Classifiers that approximate functions
P.-L. Lanzi, D. Loiacono, S. W. Wilson, and D. E. Goldberg (2005) XCS with computed prediction for the learning of Boolean functions
P.-L. Lanzi, D. Loiacono, S. W. Wilson, and D. E. Goldberg (2005) XCS with computed prediction in multistep environments
P.-L. Lanzi, D. Loiacono, S. W. Wilson, and D. E. Goldberg (2005) Extending XCSF beyond linear approximation

Recursive Least Mean Squares

args = {
    "x0": 1, # offset value
    "scale_factor": 1000, # initial diagonal values of the gain-matrix
    "lambda": 1, # forget rate (small values may be unstable)
}
xcs.prediction("rls_linear", args)
xcs.prediction("rls_quadratic", args)

Related Literature:

P.-L. Lanzi, D. Loiacono, S. W. Wilson, and D. E. Goldberg (2006) Prediction update algorithms for XCSF: RLS, Kalman filter, and gain adaptation
D. Loiacono and P.-L. Lanzi (2007) Recursive least squares and quadratic prediction in continuous multistep problems
M. V. Butz, P.-L. Lanzi, and S. W. Wilson (2008) Function approximation with XCS: Hyperellipsoidal conditions, recursive least squares, and compaction
D. Loiacono and P.-L. Lanzi (2008) Computed prediction in binary multistep problems
D. Loiacono and P.-L. Lanzi (2009) Recursive least squares and quadratic prediction in continuous multistep problems

Neural Networks

Output layer should be "n_init": y_dim. See Neural Network Initialisation.

xcs.prediction("neural", layer_args)

Related Literature:

P.-L. Lanzi and D. Loiacono (2006) XCSF with neural prediction
T. O'Hara and L. Bull (2007) Backpropagation in accuracy-based neural learning classifier systems
R. J. Preen, S. W. Wilson, and L. Bull (2021) Autoencoding with a classifier system
R. J. Preen and L. Bull (2021) Deep learning with a classifier system: Initial results

Neural Network Initialisation

General Network Specification

layer_args = {
    "layer_0": { # first hidden layer
        "type": "connected", # layer type
        ..., # layer specific parameters
    },
    ..., # as many layers as desired
    "layer_n": { # output layer
        "type": "connected", # layer type
        ..., # layer specific parameters
    },          
}

Activation Functions

Note: Neuron states are clamped [-100,100] before activations are applied. Weights are clamped [-10,10].

"logistic", # logistic [0,1]
"relu", # rectified linear unit [0,inf]
"tanh", # tanh [-1,1]
"linear", # linear [-inf,inf]
"gaussian", # Gaussian (0,1]
"sin", # sine [-1,1]
"cos", # cosine [-1,1]
"softplus", # soft plus [0,inf]
"leaky", # leaky rectified linear unit [-inf,inf]
"selu", # scaled exponential linear unit [-1.7581,inf]
"loggy", # logistic [-1,1]

Connected Layers

layer_args = {
    "layer_0": {
        "type": "connected", # layer type
        "activation": "relu", # activation function
        "evolve_weights": True, # whether to evolve weights
        "evolve_connect": True, # whether to evolve connectivity
        "evolve_functions": True, # whether to evolve activation function
        "evolve_neurons": True, # whether to evolve the number of neurons
        "max_neuron_grow": 5, # maximum number of neurons to add or remove per mut
        "n_init": 10, # initial number of neurons
        "n_max": 100, # maximum number of neurons (if evolved)
        "sgd_weights": True, # whether to use gradient descent (only for predictions)
        "evolve_eta": True, # whether to evolve the gradient descent rate   
        "eta": 0.1, # gradient descent update rate (maximum value, if evolved)
        "eta_min": 0.0001, # minimum gradient descent update rate (if evolved)
        "momentum": 0.9, # momentum for gradient descent update
        "decay": 0, # weight decay during gradient descent update
    },       
}

Recurrent Layers

layer_args = {
    "layer_0": {
        "type": "recurrent",
        ..., # other parameters same as for connected layers
    }
}

LSTM Layers

layer_args = {
    "layer_0": {
        "type": "lstm",
        "activation": "tanh", # activation function
        "recurrent_activation": "logistic", # recurrent activation function
        ..., # other parameters same as for connected layers
    }
}

Softmax Layers

Softmax layers can be composed of a linear connected layer and softmax:

layer_args = {
    "layer_0": {
        "type": "connected",
        "activation": "linear",
        "n_init": N_ACTIONS, # number of (softmax) outputs
        ..., # other parameters same as for connected layers
    },       
    "layer_1": {
        "type": "softmax",
        "scale": 1, # softmax temperature
    },       
}

Dropout Layers

layer_args = {
    "layer_0": {
        "type": "dropout",
        "probability": 0.2, # probability of dropping an input
    }
}

Noise Layers

Gaussian noise adding layers.

layer_args = {
    "layer_0": {
        "type": "noise",
        "probability": 0.2, # probability of adding noise to an input
        "scale": 1.0, # standard deviation of Gaussian noise added
    }
}

Convolutional Layers

Convolutional layers require image inputs and produce image outputs. If used as the first layer, the width, height, and number of channels must be specified. If "evolve_neurons": True the number of filters will be evolved using an initial number of filters "n_init" and maximum number "n_max".

layer_args = {
    "layer_0": {
        "type": "convolutional",
        "activation": "relu", # activation function
        "height": 16, # input height
        "width": 16, # input width
        "channels": 1, # number of input channels
        "n_init": 6, # number of convolutional kernel filters
        "size": 3, # the size of the convolution window
        "stride": 1, # the stride of the convolution window
        "pad": 1, # the padding of the convolution window
        ..., # other parameters same as for connected layers
    },       
    "layer_1": {
        "type": "convolutional",
        ..., # parameters same as above; height, width, channels not needed
    },       
}

Max-pooling Layers

Max-pooling layers require image inputs and produce image outputs. If used as the first layer, the width, height, and number of channels must be specified.

layer_args = {
    "layer_0": {
        "type": "maxpool",
        "height": 16, # input height
        "width": 16, # input width
        "channels": 1, # number of input channels
        "size": 2, # the size of the maxpooling operation
        "stride": 2, # the stride of the maxpooling operation
        "pad": 0, # the padding of the maxpooling operation
    },       
    "layer_1": {
        "type": "maxpool",
        "size": 2,
        "stride": 2,
        "pad": 0,
    },       
}

Average-pooling Layers

Average-pooling layers require image inputs. If used as the first layer, the width, height, and number of channels must be specified. Outputs an average for each input channel.

layer_args = {
    "layer_0": {
        "type": "avgpool",
        "height": 16, # input height
        "width": 16, # input width
        "channels": 1, # number of input channels
    },       
    "layer_1": {
        "type": "avgpool",
    },       
}

Upsampling Layers

Upsampling layers require image inputs and produce image outputs. If used as the first layer, the width, height, and number of channels must be specified.

layer_args = {
    "layer_0": {
        "type": "upsample",
        "height": 16, # input height
        "width": 16, # input width
        "channels": 1, # number of input channels
        "stride": 2, # the stride of the upsampling operation
    },       
    "layer_1": {
        "type": "upsample",
        "stride": 2,
    },       
}

Saving and Loading XCSF

XCSF provides support for pickle and also provides the following functions for serializing to a binary file.

Example saving the entire current state of XCSF to a binary file:

xcs.save("saved_name.bin")

Example loading the entire state of XCSF from a binary file:

xcs.load("saved_name.bin")

Functions return the total number of elements written or read.

Library Stub:

def save(self, filename: str) -> int: ...
def load(self, filename: str) -> int: ...

Storing and Retrieving XCSF

Example storing the current XCSF population in memory for later retrieval, overwriting any previously stored population:

xcs.store()

Example retrieving the previously stored XCSF population from memory:

xcs.retrieve()

Library Stub:

def store(self) -> None: ...
def retrieve(self) -> None: ...

Printing XCSF

Example printing the current XCSF parameters:

xcs.print_params()

Example printing the current XCSF population:

xcs.print_pset()

Library Stub:

def print_params(self) -> None: ...
def print_pset(self, condition: bool = True, action: bool = True, prediction: bool = True) -> None: ...

XCSF Getters

Values for all general parameters are directly accessible via the property. Specific getter functions:

# General
xcs.pset_size() # returns the mean population size
xcs.pset_num() # returns the mean population numerosity
xcs.mset_size() # returns the mean match set size
xcs.aset_size() # returns the mean action set size
xcs.mfrac() # returns the mean fraction of inputs matched by the best rule
xcs.time() # returns the current EA time
xcs.version_major() # returns the XCSF major version number
xcs.version_minor() # returns the XCSF minor version number
xcs.version_build() # returns the XCSF build version number
xcs.pset_mean_cond_size() # returns the mean condition size
xcs.pset_mean_pred_size() # returns the mean prediction size

# Neural network specific - population set averages
# "layer" argument is an integer specifying the location of a layer: first layer=0
xcs.pset_mean_pred_eta(layer) # returns the mean eta for a prediction layer
xcs.pset_mean_pred_neurons(layer) # returns the mean number of neurons for a prediction layer
xcs.pset_mean_pred_layers() # returns the mean number of layers in the prediction networks
xcs.pset_mean_pred_connections(layer) # returns the number of active connections for a prediction layer
xcs.pset_mean_cond_neurons(layer) # returns the mean number of neurons for a condition layer
xcs.pset_mean_cond_layers() # returns the mean number of layers in the condition networks
xcs.pset_mean_cond_connections(layer) # returns the number of active connections for a condition layer

Library Stub:

def aset_size(self) -> float: ...
def mfrac(self) -> float: ...
def mset_size(self) -> float: ...
def pset_mean_cond_connections(self, layer: int) -> float: ...
def pset_mean_cond_layers(self) -> float: ...
def pset_mean_cond_neurons(self, layer: int) -> float: ...
def pset_mean_cond_size(self) -> float: ...
def pset_mean_pred_connections(self, layer: int) -> float: ...
def pset_mean_pred_eta(self, layer: int) -> float: ...
def pset_mean_pred_layers(self) -> float: ...
def pset_mean_pred_neurons(self, layer: int) -> float: ...
def pset_mean_pred_size(self) -> float: ...
def pset_num(self) -> int: ...
def pset_size(self) -> int: ...
def time(self) -> int: ...
def version_build(self) -> int: ...
def version_major(self) -> int: ...
def version_minor(self) -> int: ...

Getting the Population as JSON

import json
json_string = xcs.json()
parsed = json.loads(json_string)

Then to print the current population:

print(json.dumps(parsed, indent=4))

Example printing ternary conditions, integer actions, and fitnesses:

fitness = [cl["fitness"] for cl in parsed["classifiers"]]
ternary = [cl["condition"]["string"] for cl in parsed["classifiers"]]
actions = [cl["action"]["action"] for cl in parsed["classifiers"]]
for i in range(len(fitness)):
    print("%s %d %.5f" % (ternary[i], actions[i], fitness[i]))

Printing and returning the individual weights from neural networks is disabled by default. To enable, change the flags in the neural_json_export() functions in cond_neural.c, pred_neural.c, etc.

Library Stub:

def json(self, condition: bool = True, action: bool = True, prediction: bool = True) -> str: ...

Getting the Parameters as JSON

Example getting and printing the current parameters:

import json
json_params = xcs.json_parameters()
parsed_args = json.loads(json_params)
print(json.dumps(parsed_args, indent=4))

Library Stub:

def json_parameters(self) -> str: ...

Seeding the Population

Classifiers can be inserted into the population in a number of ways.

The json_insert_cl() function can be used to insert a single new classifier into the population. The new classifier is initialised with a random condition, action, prediction, and then any supplied properties overwrite these values. This means that all properties are optional. If the population set numerosity exceeds xcs.POP_SIZE after inserting the rule, the standard roulette wheel deletion mechanism will be invoked to maintain the population limit.

GP trees and neural networks are not yet implemented.

Example inserting a rule with specified hyperrectangle condition and integer action, while the prediction is initialised as normal. See notebook example.

import json
import xcsf
xcs = xcsf.XCS(x_dim=8, y_dim=1, n_actions=2)
xcs.condition("hyperrectangle_ubr")
xcs.action("integer")
xcs.prediction("nlms_linear")

cl_dict = {
    "error": 10, # each of these properties are optional
    "fitness": 1.01,
    "accuracy": 2,
    "set_size": 100,
    "numerosity": 2,
    "experience": 3,
    "time": 3,
    "samples_seen": 2,
    "samples_matched": 1,
    "condition": {
        "type": "hyperrectangle_ubr",
        "bound1": [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8],
        "bound2": [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8],
        "mutation": [0.2] # this parameter still self-adapts
    },
    "action": {
        "type": "integer",
        "action": 1,
        "mutation": [0.28]
    }
}

json_str = json.dumps(cl_dict) # dictionary to JSON
xcs.json_insert_cl(json_str)
xcs.print_pset()

Note: when manually adding classifiers, be careful that the keys are correct because if an exact match is not found it will be ignored silently.

Multiple classifiers can be added through the same mechanism as a single JSON string with json_insert().

Additionally, the entire population set can be written in JSON format to a plain text file:

xcs.json_write("pset.json")

And read into the population with:

xcs.json_read("pset.json")

Note that this is not the recommended way to backup the system to persistent storage since temporary memory buffers (e.g., update matrices) and parameters are not saved and reloaded. For this purpose, see Saving and Loading XCSF.

Library Stub:

def json_insert(self, clset_json: str) -> None: ...
def json_insert_cl(self, cl_json: str) -> None: ...
def json_read(self, filename: str) -> None: ...
def json_write(self, filename: str) -> None: ...

Visualising GP Trees

The TreeViz class from viz.py will generate a tree with graphviz. The first argument must be the tree array; and the second, the filename to save the output as a pdf. Optionally accepts a list of strings representing the feature_names. Optionally accepts a string note, which will add a note/caption at the bottom.

Example plotting the first classifier condition:

import json
from xcsf.utils.viz import TreeViz
parsed = json.loads(xcs.json())
trees = [cl["condition"]["tree"]["array"] for cl in parsed["classifiers"]]
TreeViz(trees[0], "test")

Note this will require the graphviz package installed with:

$ pip install graphviz

TreeViz Stub:

def __init__(self, 
    tree: list[str], 
    filename: str, 
    note: str | None = None, 
    feature_names: list[str] | None = None,
) -> None: ...

Visualising DGP Graphs

The DGPViz class from viz.py will generate a graph with graphviz. The first argument must be the graph; and the second, the filename to save the output as a pdf. Optionally accepts a list of strings representing the feature_names. Optionally accepts a string note, which will add a note/caption at the bottom.

Example plotting the first classifier condition and passing the error as a note:

import json
from xcsf.utils.viz import DGPViz
parsed = json.loads(xcs.json())
errors = [cl["error"] for cl in parsed["classifiers"]]
graphs = [cl["condition"]["graph"] for cl in parsed["classifiers"]]
note = "Error = %.5f" % errors[0]
DGPViz(graphs[0], "test", note=note)

DGPViz Stub:

def __init__(self, 
    graph: dict, 
    filename: str, 
    note: str | None = None, 
    feature_names: list[str] | None = None,
) -> None: ...

Reinforcement Learning

Initialisation

Initialise XCSF with y_dim = 1 for predictions to estimate the scalar reward.

import xcsf
xcs = xcsf.XCS(x_dim=X_DIM, y_dim=1, n_actions=N_ACTIONS)

Method 1

The standard method involves the basic loop as shown below. state must be a 1-D numpy array representing the feature values of a single instance; reward must be a scalar value representing the current environmental reward for having performed the action; and done must be a boolean value representing whether the environment is currently in a terminal state.

state = env.reset()
xcs.init_trial()
for cnt in range(xcs.TELETRANSPORTATION):
    xcs.init_step()
    action = xcs.decision(state, explore) # explore specifies whether to explore/exploit
    next_state, reward, done = env.step(action)
    xcs.update(reward, done) # update the current action set and/or previous action set
    err += xcs.error(reward, done, env.max_payoff()) # system prediction error
    xcs.end_step()
    if done:
        break
    state = next_state
cnt += 1
xcs.end_trial()

See notebook example.

Library Stub:

def init_step(self) -> None: ...
def init_trial(self) -> None: ...
def end_step(self) -> None: ...
def end_trial(self) -> None: ...
def error(self) -> float: ...
def update(self, reward: float, done: bool) -> None: ...
def decision(
    self,
    state: np.ndarray[Any, np.dtype[np.float64]],  # shape = (x_dim, )
    explore: bool,
) -> int: ...

Method 2

The fit() function may be used as below to execute one single-step learning trial, i.e., creation of the match and action sets, updating the action set and running the EA as appropriate. The vector state must be a 1-D numpy array representing the feature values of a single instance; action must be an integer representing the selected action (and therefore the action set to update); and reward must be a scalar value representing the current environmental reward for having performed the action.

xcs.fit(state, action, reward)

The entire prediction array for a given state can be returned using the supervised predict() function, which must receive a 2-D numpy array. For example:

prediction_array = xcs.predict(state.reshape(1,-1))[0]

See notebook example.

Library Stub:

@typing.overload
def fit(
    self,
    state: np.ndarray[Any, np.dtype[np.float64]],  # shape = (x_dim, )
    action: int,
    reward: float,
) -> float: ...

def predict(
    self,
    X_predict: np.ndarray[Any, np.dtype[np.float64]],  # shape = (n_samples, x_dim)
) -> np.ndarray[Any, np.dtype[np.float64]]: ...   # shape = (n_samples, y_dim)

Method 3

The supervised fit() and predict() functions can be used for reinforcement learning without action sets, i.e., [A] = [M].

See notebook example using experience replay.

Related Literature:

A. Stein, R. Maier, L. Rosenbauer, and J. Hähner (2020) XCS classifier system with experience replay

Supervised Learning

Initialisation

Initialise XCSF with a single (dummy) integer action. Set conditions and predictions as desired.

import xcsf
xcs = xcsf.XCS(x_dim, y_dim, 1)  # single action
xcs.action("integer")  # dummy integer actions

Fitting

The fit() function may be used as below to execute xcs.MAX_TRIALS number of learning iterations (i.e., single-step trials) using a supplied training set. The input arrays X_train and y_train must be 2-D numpy arrays of the shape (n_samples, x_dim) and (n_samples, y_dim). The third parameter specifies whether to randomly shuffle the training data. The function will return a scalar representing the training prediction error using the loss function as specified by xcs.LOSS_FUNC.

Note that while the training data is supplied as a batch, learning proceeds in the usual online way: one sample at a time. To execute a single trial simply pass a batch size of one by reshaping the data and set xcs.MAX_TRIALS = 1.

train_error = xcs.fit(X_train, y_train, shuffle=True)

Library Stub:

@typing.overload
def fit(
    self,
    X_train: np.ndarray[Any, np.dtype[np.float64]],  # shape = (n_samples, x_dim)
    y_train: np.ndarray[Any, np.dtype[np.float64]],  # shape = (n_samples, y_dim)
    shuffle: bool = True,
) -> float: ...

Scoring

The score() function may be used as below to calculate the prediction error over a single pass of a supplied data set without updates or the EA being invoked (e.g., for scoring a validation set). An argument N may be supplied that specifies the maximum number of iterations performed; if this value is less than the number of instances supplied, samples will be drawn randomly. Returns a scalar representing the error. 2-D numpy arrays are expected as inputs.

Note that if the match set is empty for a given sample then covering will be invoked and this may alter the population set. If this behaviour is undesirable, an optional argument cover can be used to specify the values to use as system output instead of invoking covering. cover must be an array of length y_dim.

val_error = xcs.score(X_val, y_val)

val_error = xcs.score(X_val, y_val, N=1000, cover=[0.1])

Library Stub:

def score(
    self,
    X_val: np.ndarray[Any, np.dtype[np.float64]],  # shape = (n_samples, x_dim)
    y_val: np.ndarray[Any, np.dtype[np.float64]],  # shape = (n_samples, y_dim)
    N: int = 0,  # max number of samples to use
    cover: Optional[np.ndarray[Any, np.dtype[np.float64]]],  # shape = (1, y_dim)
) -> float: ...

Predicting

The predict() function may be used as below to calculate the XCSF predictions for a supplied data set. No updates or EA invocations are performed. The input vector must be a 2-D numpy array of the shape (n_samples, x_dim). Returns a 2-D numpy array of shape (n_samples, y_dim).

Note that similar to score(), if the match set is empty for a given sample then covering will be invoked and this may alter the population set. If this behaviour is undesirable, an optional argument cover can be used to specify the values to use as system output instead of invoking covering. cover must be an array of length y_dim.

predictions = xcs.predict(X_test)

predictions = xcs.predict(X_test, cover=[0.1])

Library Stub:

def predict(
    self, 
    X_test: np.ndarray[Any, np.dtype[np.float64]],  # shape = (n_samples, x_dim)
    cover: Optional[np.ndarray[Any, np.dtype[np.float64]]],  # shape = (1, y_dim)
) -> np.ndarray[Any, np.dtype[np.float64]]: ...   # shape = (n_samples, y_dim)

Notebook Examples

Notes

Self-adaptive mutation

Currently 3 self-adaptive mutation methods are implemented and their use is defined within the various implementations of conditions, actions, and predictions. The smallest allowable mutation rate MU_EPSILON = 0.0005.

Uniform adaptation: selects rates from a uniform random distribution. Initially the rate is drawn at random ~U[MU_EPSILON,1]. Offspring inherit the parent's rate, but with 10% probability the rate is randomly redrawn.
Log normal adaptation: selects rates using a log normal method (similar to evolution strategies). Initially the rate is selected at random from a uniform distribution ~U[MU_EPSILON,1]. Offspring inherit the parent's rate, before applying log normal adaptation: $\mu \leftarrow \mu e^{\mathcal{N}(0,1)}$.
Rate selection adaptation: selects rates from the following set of 10 values: {0.0005, 0.001, 0.002, 0.003, 0.005, 0.01, 0.015, 0.02, 0.05, 0.1}. Initially the rate is selected at random. Offspring inherit the parent's rate, but with 10% probability the rate is randomly reselected.

Related Literature:

L. Bull and J. Hurst (2003) A neural learning classifier system with self-adaptive constructivism
G. D. Howard, L. Bull, and P.-L. Lanzi (2008) Self-adaptive constructivism in neural XCS and XCSF
M. V. Butz, P. O. Stalph, and P.-L. Lanzi (2008) Self-adaptive mutation in XCSF
M. Serpell and J. E. Smith (2010) Self-adaptation of mutation operator and probability for permutation representations in genetic algorithms

This project is released under the terms of the GNU General Public License v3.0 (GPLv3).

Python Library Usage

Version 1.2.9

Help

Constructor

Initialising General Parameters

Initialising Conditions

Always match (dummy)

Ternary Bitstrings

Hyperrectangles and Hyperellipsoids

GP Trees

DGP Graphs

Neural Networks

Initialising Actions

Integers

Neural Networks

Initialising Predictions

Constant

Normalised Least Mean Squares

Recursive Least Mean Squares

Neural Networks

Neural Network Initialisation

General Network Specification

Activation Functions

Connected Layers

Recurrent Layers

LSTM Layers

Softmax Layers

Dropout Layers

Noise Layers

Convolutional Layers

Max-pooling Layers

Average-pooling Layers

Upsampling Layers

Saving and Loading XCSF

Storing and Retrieving XCSF

Printing XCSF

XCSF Getters

Getting the Population as JSON

Getting the Parameters as JSON

Seeding the Population

Visualising GP Trees

Visualising DGP Graphs

Reinforcement Learning

Initialisation

Method 1

Method 2

Method 3

Supervised Learning

Initialisation

Fitting

Scoring

Predicting

Notebook Examples

Notes

Self-adaptive mutation

Clone this wiki locally