# Conditional Parameter Grids

This example shows the usage of `PyExperimenter` with a conditional parameter grid. We will programmatically define the parameter combinations of a support vector machine, instead of generating the entire cartesian product from the parameters defined in the config file.  

To execute this notebook you need to install:
```
pip install py_experimenter
pip install scikit-learn
```

## Experiment Configuration File
This notebook is based on the execution of the `PyExperimenter` based on a configuration file. Different aspects of this file are explained in the `README` file in the [repository](https://github.com/tornede/py_experimenter). Here, we do not set the parameter values in the config file, as we will create the parameter grid programmatically.


In [1]:
import os

content = """
[PY_EXPERIMENTER]
provider = sqlite 
database = py_experimenter
table = svm_experiment_example

number_parallel_experiments = 5 

keyfields = dataset, cross_validation_splits:int, seed:int, kernel, gamma:DECIMAL, degree:int, coef0:DECIMAL

resultfields = train_f1:DECIMAL, train_accuracy:DECIMAL, test_f1:DECIMAL, test_accuracy:DECIMAL
resultfields.timestamps = false

[CUSTOM] 
path = sample_data
"""

with open(os.path.join('config', 'configuration_cond.cfg'), "w") as f: 
  f.write(content)

## Defining the execution function

Next, the execution of a single experiment has to be defined. Note that this is a dummy example, which contains limited reasonable code. It is meant to show the core functionality of the PyExperimenter. 

The method is called with the parameters, i.e. `keyfields`, of a database entry. The results are meant to be processed to be written into the database, i.e. as `resultfields`. 

In [2]:
import os
import random

import numpy as np
import pandas
from sklearn.datasets import load_iris
from sklearn.model_selection import cross_validate
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC

from py_experimenter.experimenter import PyExperimenter
from py_experimenter.result_processor import ResultProcessor

from time import sleep
from random import randint

def run_svm(parameters: dict, result_processor: ResultProcessor, custom_config: dict):
    sleep(randint(0,5))
    seed = parameters['seed']
    random.seed(seed)
    np.random.seed(seed)

    data = load_iris()

    X = data.data
    y = data.target

    # Create Support Vector Machine with parameters dependent on the kernel
    kernel = parameters['kernel']
    if kernel == 'linear':
        svc = SVC(kernel=parameters['kernel'])
    elif kernel == 'poly':
        svc = SVC(kernel=parameters['kernel'], gamma=parameters['gamma'], coef0=parameters['coef0'], degree=parameters['degree'])
    elif kernel == 'rbf':
        svc = SVC(kernel=parameters['kernel'], gamma=parameters['gamma'])

    svc = SVC()

    model = make_pipeline(StandardScaler(), svc)  

    if parameters['dataset'] != 'iris':
        raise ValueError("Example error")

    scores = cross_validate(model, X, y, cv=parameters['cross_validation_splits'],
        scoring=('accuracy', 'f1_micro'),
        return_train_score=True)
    
    resultfields = {'train_f1': np.mean(scores['train_f1_micro']),
                'train_accuracy': np.mean(scores['train_accuracy'])}
    result_processor.process_results(resultfields)

    resultfields = {'test_f1': np.mean(scores['test_f1_micro']),
                'test_accuracy': np.mean(scores['test_accuracy'])}
    result_processor.process_results(resultfields)

## Executing PyExperimenter

The actual execution of the PyExperimenter is done in multiple steps. 

### Initialize PyExperimenter
The PyExperimenter is initialized with the previously created configuration file. Additionally, the `PyExperimenter` is given a `name`, i.e. job id, which is especially useful for parallel executions of multiple experiments on HPC. 

In [3]:
from py_experimenter.experimenter import PyExperimenter

experimenter = PyExperimenter(experiment_configuration_file_path=os.path.join('config', 'configuration_cond.cfg'), name="SVM_experimenter_01")

### Fill Table

The table is filled programmatically using the  `fill_table_from_combination()` method. We first generate the fixed parameter combinations for each kernel of the SVM in the first three lines.
* For the `rbf` kernel, we need to set values for the `gamma` parameter. The degree and `coef0` parameter are not present in this kernel, so we set these to `'nan'`.
* For the `poly` kernel, we need to set the `gamma`, the degree as well as the `coef0` parameter.
* For the `linear` kernel, we do not need to set any parameters, so all of them are set to `'nan'`.

Afterwards, we combine these with the seed, the dataset and the cross_validation_splits parameters, which are present for all experiment runs. Thus, these are not set unconditionally.

 

Note that the table can easily be obtained as `pandas.Dataframe` via `experimenter.get_table()`.

In [4]:
# Create parameter configurations for each kernel
combinations = [{'kernel': 'rbf', 'gamma': gamma, 'degree':'nan', 'coef0':'nan'} for gamma in ['0.1','0.3']]
combinations += [{'kernel': 'poly', 'gamma': gamma, 'degree': degree, 'coef0': coef0} for gamma in ['0.1','0.3'] for degree in ['3','4'] for coef0 in ['0.0', '0.1']]
combinations += [{'kernel': 'linear','gamma': 'nan', 'degree':'nan', 'coef0':'nan'}]

# Fill experimenter
experimenter.fill_table_from_combination(parameters={'seed': ['1', '2', '3', '4', '5'], 
'dataset': ['iris'],
'cross_validation_splits': ['5'] },
fixed_parameter_combinations=combinations)

# showing database table
experimenter.get_table()

Unnamed: 0,ID,dataset,cross_validation_splits,seed,kernel,gamma,degree,coef0,creation_date,status,start_date,name,machine,train_f1,train_accuracy,test_f1,test_accuracy,end_date,error
0,1,iris,5,1,rbf,0.1,,,"11/12/2022, 11:03:35",done,"11/12/2022, 11:03:41",SVM_experimenter_01,vm-tornede4,0.975,0.975,0.966667,0.966667,"11/12/2022, 11:03:45",
1,2,iris,5,1,rbf,0.3,,,"11/12/2022, 11:03:35",done,"11/12/2022, 11:03:41",SVM_experimenter_01,vm-tornede4,0.975,0.975,0.966667,0.966667,"11/12/2022, 11:03:42",
2,3,iris,5,1,poly,0.1,3,0,"11/12/2022, 11:03:35",done,"11/12/2022, 11:03:38",SVM_experimenter_01,vm-tornede4,0.975,0.975,0.966667,0.966667,"11/12/2022, 11:03:39",
3,4,iris,5,1,poly,0.1,3,0.1,"11/12/2022, 11:03:35",done,"11/12/2022, 11:03:38",SVM_experimenter_01,vm-tornede4,0.975,0.975,0.966667,0.966667,"11/12/2022, 11:03:38",
4,5,iris,5,1,poly,0.1,4,0,"11/12/2022, 11:03:35",done,"11/12/2022, 11:03:46",SVM_experimenter_01,vm-tornede4,0.975,0.975,0.966667,0.966667,"11/12/2022, 11:03:47",
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
70,71,iris,5,4,poly,0.3,4,0,"11/12/2022, 11:08:10",created,,,,,,,,,
71,72,iris,5,5,poly,0.1,3,0,"11/12/2022, 11:08:10",created,,,,,,,,,
72,73,iris,5,5,poly,0.1,4,0,"11/12/2022, 11:08:10",created,,,,,,,,,
73,74,iris,5,5,poly,0.3,3,0,"11/12/2022, 11:08:10",created,,,,,,,,,


### Execute PyExperimenter
All experiments are executed one after the other by the same `PyExperimenter` due to `max_number_experiments_to_execute=-1`. If just a single one or a predifined number of experiments should be executed, the `-1` has to be replaced by the according amount. The `random_order` is especially important in case of parallel execution of multiple `PyExperimenter`, e.g. when doing it on a HPC, to avoid collusions of accessing the same row of the table. 

The first parameter, i.e. `run_svm`, relates to the actual method that should be executed with the given keyfields of the table. 

In [5]:
experimenter.execute(run_svm, max_number_experiments_to_execute=-1, random_order=True)

# showing database table
experimenter.get_table() 

Unnamed: 0,ID,dataset,cross_validation_splits,seed,kernel,gamma,degree,coef0,creation_date,status,start_date,name,machine,train_f1,train_accuracy,test_f1,test_accuracy,end_date,error
0,1,iris,5,1,rbf,0.1,,,"11/12/2022, 11:03:35",done,"11/12/2022, 11:03:41",SVM_experimenter_01,vm-tornede4,0.975,0.975,0.966667,0.966667,"11/12/2022, 11:03:45",
1,2,iris,5,1,rbf,0.3,,,"11/12/2022, 11:03:35",done,"11/12/2022, 11:03:41",SVM_experimenter_01,vm-tornede4,0.975,0.975,0.966667,0.966667,"11/12/2022, 11:03:42",
2,3,iris,5,1,poly,0.1,3,0,"11/12/2022, 11:03:35",done,"11/12/2022, 11:08:14",SVM_experimenter_01,vm-tornede4,0.975,0.975,0.966667,0.966667,"11/12/2022, 11:08:15",
3,4,iris,5,1,poly,0.1,3,0.1,"11/12/2022, 11:03:35",done,"11/12/2022, 11:03:38",SVM_experimenter_01,vm-tornede4,0.975,0.975,0.966667,0.966667,"11/12/2022, 11:03:38",
4,5,iris,5,1,poly,0.1,4,0,"11/12/2022, 11:03:35",done,"11/12/2022, 11:08:12",SVM_experimenter_01,vm-tornede4,0.975,0.975,0.966667,0.966667,"11/12/2022, 11:08:12",
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
70,71,iris,5,4,poly,0.3,4,0,"11/12/2022, 11:08:10",done,"11/12/2022, 11:08:14",SVM_experimenter_01,vm-tornede4,0.975,0.975,0.966667,0.966667,"11/12/2022, 11:08:15",
71,72,iris,5,5,poly,0.1,3,0,"11/12/2022, 11:08:10",done,"11/12/2022, 11:08:14",SVM_experimenter_01,vm-tornede4,0.975,0.975,0.966667,0.966667,"11/12/2022, 11:08:15",
72,73,iris,5,5,poly,0.1,4,0,"11/12/2022, 11:08:10",done,"11/12/2022, 11:08:10",SVM_experimenter_01,vm-tornede4,0.975,0.975,0.966667,0.966667,"11/12/2022, 11:08:15",
73,74,iris,5,5,poly,0.3,3,0,"11/12/2022, 11:08:10",done,"11/12/2022, 11:08:14",SVM_experimenter_01,vm-tornede4,0.975,0.975,0.966667,0.966667,"11/12/2022, 11:08:15",
