# Example: Conditional Parameter Grids

This example shows the usage of `PyExperimenter` with a conditional parameter grid. We will programmatically define the parameter combinations of a support vector machine, instead of generating the entire cartesian product from the parameters defined in the config file.  

To execute this notebook you need to install:
```
pip install py_experimenter
pip install scikit-learn
```

## Experiment Configuration File
This notebook shows an example execution of `PyExperimenter` based on an experiment configuration file. Further explanation about the usage of `PyExperimenter` can be found in the [documentation](https://tornede.github.io/py_experimenter/usage.html). Here, we only define keyfields and resultfields and do not set the parameter values in the experiment configuration file as we will create the parameter grid programmatically.

In [6]:
import os

content = """
PY_EXPERIMENTER:
  n_jobs: 1

  Database:
    provider: mysql
    database: py_experimenter
    table: 
      name: example_conditional_grid
      keyfields:
        dataset:
          type: VARCHAR(50)
        cross_validation_splits:
          type: int
        seed: 
          type: int
        kernel: 
          type: VARCHAR(50)
        gamma: 
          type: VARCHAR(50)
        degree: 
          type: VARCHAR(50)
        coef0: 
          type: VARCHAR(50)
      result_timestamps: false
      resultfields:
        train_f1: DECIMAL
        train_accuracy: DECIMAL
        test_f1: DECIMAL
        test_accuracy: DECIMAL

  CUSTOM:
    path: sample_data
"""

# Create config directory if it does not exist
if not os.path.exists('config'):
    os.mkdir('config')
    
# Create config file
experiment_configuration_file_path = os.path.join('config', 'example_conditional_grid.yml')
with open(experiment_configuration_file_path, "w") as f: 
  f.write(content)

## Defining the execution function

Next, the execution of a single experiment has to be defined. Note that this is a dummy example, which contains limited reasonable code. It is meant to show the core functionality of the PyExperimenter. 

The method is called with the parameters, i.e. `keyfields`, of a database entry. The results are meant to be processed to be written into the database, i.e. as `resultfields`. 

In [7]:
import random

import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import cross_validate
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC

from py_experimenter.experimenter import PyExperimenter
from py_experimenter.result_processor import ResultProcessor

from time import sleep
from random import randint

def run_svm(parameters: dict, result_processor: ResultProcessor, custom_config: dict):
    sleep(randint(0,5))
    seed = parameters['seed']
    random.seed(seed)
    np.random.seed(seed)

    data = load_iris()

    X = data.data
    y = data.target

    # Create Support Vector Machine with parameters dependent on the kernel
    kernel = parameters['kernel']
    if kernel == 'linear':
        svc = SVC(kernel=parameters['kernel'])
    elif kernel == 'poly':
        svc = SVC(kernel=parameters['kernel'], gamma=parameters['gamma'], coef0=parameters['coef0'], degree=parameters['degree'])
    elif kernel == 'rbf':
        svc = SVC(kernel=parameters['kernel'], gamma=parameters['gamma'])

    svc = SVC()

    model = make_pipeline(StandardScaler(), svc)  

    if parameters['dataset'] != 'iris':
        raise ValueError("Example error")

    scores = cross_validate(model, X, y, 
        cv=parameters['cross_validation_splits'],
        scoring=('accuracy', 'f1_micro'),
        return_train_score=True
    )
    
    result_processor.process_results({
        'train_f1': np.mean(scores['train_f1_micro']),
        'train_accuracy': np.mean(scores['train_accuracy'])
    })

    result_processor.process_results({
        'test_f1': np.mean(scores['test_f1_micro']),
        'test_accuracy': np.mean(scores['test_accuracy'])})

## Executing PyExperimenter

The actual execution of the PyExperimenter is done in multiple steps. 

### Initialize PyExperimenter
The PyExperimenter is initialized with the previously created configuration file. Additionally, `PyExperimenter` is given a `name`, i.e. job id, which is especially useful for parallel executions of multiple experiments on HPC. 

In [8]:
from py_experimenter.experimenter import PyExperimenter

experimenter = PyExperimenter(experiment_configuration_file_path=experiment_configuration_file_path, name="SVM_experimenter_01")

2024-02-19 17:11:53,187  | py-experimenter - INFO     | Found 7 keyfields
2024-02-19 17:11:53,249  | py-experimenter - INFO     | Initialized and connected to database


### Fill Table

The table is filled programmatically using the  `fill_table_from_combination()` method. We first generate the fixed parameter combinations for each kernel of the SVM in the first three lines.
* For the `rbf` kernel, we need to set values for the `gamma` parameter. The degree and `coef0` parameter are not present in this kernel, so we set these to `'nan'`.
* For the `poly` kernel, we need to set the `gamma`, the degree as well as the `coef0` parameter.
* For the `linear` kernel, we do not need to set any parameters, so all of them are set to `'nan'`.

Afterwards, we combine these with the seed, the dataset and the cross_validation_splits parameters, which are present for all experiment runs. Thus, these are not set unconditionally.

 

Note that the table can easily be obtained as `pandas.Dataframe` via `experimenter.get_table()`.

In [9]:
# Create parameter configurations for each kernel
combinations = [{'kernel': 'rbf', 'gamma': gamma, 'degree':None, 'coef0':None} for gamma in ['0.1','0.3']]
combinations += [{'kernel': 'poly', 'gamma': gamma, 'degree': degree, 'coef0': coef0} for gamma in ['0.1','0.3'] for degree in ['3','4'] for coef0 in ['0.0', '0.1']]
combinations += [{'kernel': 'linear','gamma': None, 'degree':None, 'coef0':None}]

# Fill experimenter
experimenter.fill_table_from_combination(parameters={'seed': ['1', '2', '3', '4', '5'], 
'dataset': ['iris'],
'cross_validation_splits': ['5'] },
fixed_parameter_combinations=combinations)

# showing database table
experimenter.get_table()

2024-02-19 17:11:53,387  | py-experimenter - INFO     | 55 rows successfully added to database. 0 rows were skipped.


Unnamed: 0,ID,dataset,cross_validation_splits,seed,kernel,gamma,degree,coef0,creation_date,status,start_date,name,machine,train_f1,train_accuracy,test_f1,test_accuracy,end_date,error
0,1,iris,5,1,rbf,0.1,,,2024-02-19 17:11:53,created,,,,,,,,,
1,2,iris,5,1,rbf,0.3,,,2024-02-19 17:11:53,created,,,,,,,,,
2,3,iris,5,1,poly,0.1,3.0,0.0,2024-02-19 17:11:53,created,,,,,,,,,
3,4,iris,5,1,poly,0.1,3.0,0.1,2024-02-19 17:11:53,created,,,,,,,,,
4,5,iris,5,1,poly,0.1,4.0,0.0,2024-02-19 17:11:53,created,,,,,,,,,
5,6,iris,5,1,poly,0.1,4.0,0.1,2024-02-19 17:11:53,created,,,,,,,,,
6,7,iris,5,1,poly,0.3,3.0,0.0,2024-02-19 17:11:53,created,,,,,,,,,
7,8,iris,5,1,poly,0.3,3.0,0.1,2024-02-19 17:11:53,created,,,,,,,,,
8,9,iris,5,1,poly,0.3,4.0,0.0,2024-02-19 17:11:53,created,,,,,,,,,
9,10,iris,5,1,poly,0.3,4.0,0.1,2024-02-19 17:11:53,created,,,,,,,,,


### Execute PyExperimenter
All experiments are executed one after the other by the same `PyExperimenter` due to `max_experiments=-1`. If just a single one or a predifined number of experiments should be executed, the `-1` has to be replaced by the according amount.
The first parameter, i.e. `run_svm`, relates to the actual method that should be executed with the given keyfields of the table. 

In [10]:
experimenter.execute(run_svm, max_experiments=-1)

# showing database table
experimenter.get_table() 

[codecarbon INFO @ 17:11:53] [setup] RAM Tracking...
[codecarbon INFO @ 17:11:53] [setup] GPU Tracking...
[codecarbon INFO @ 17:11:53] No GPU found.
[codecarbon INFO @ 17:11:53] [setup] CPU Tracking...


[codecarbon INFO @ 17:11:55] CPU Model on constant consumption mode: 12th Gen Intel(R) Core(TM) i7-1260P
[codecarbon INFO @ 17:11:55] >>> Tracker's metadata:
[codecarbon INFO @ 17:11:55]   Platform system: Linux-5.15.133.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
[codecarbon INFO @ 17:11:55]   Python version: 3.9.0
[codecarbon INFO @ 17:11:55]   CodeCarbon version: 2.3.4
[codecarbon INFO @ 17:11:55]   Available RAM : 15.475 GB
[codecarbon INFO @ 17:11:55]   CPU count: 16
[codecarbon INFO @ 17:11:55]   CPU model: 12th Gen Intel(R) Core(TM) i7-1260P
[codecarbon INFO @ 17:11:55]   GPU count: None
[codecarbon INFO @ 17:11:55]   GPU model: None
[codecarbon INFO @ 17:12:02] Energy consumed for RAM : 0.000007 kWh. RAM Power : 5.803094387054443 W
[codecarbon INFO @ 17:12:02] Energy consumed for all CPUs : 0.000048 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 17:12:02] 0.000055 kWh of electricity used since the beginning.
[codecarbon INFO @ 17:12:02] [setup] RAM Tracking...
[codecarbon

Unnamed: 0,ID,dataset,cross_validation_splits,seed,kernel,gamma,degree,coef0,creation_date,status,start_date,name,machine,train_f1,train_accuracy,test_f1,test_accuracy,end_date,error
0,1,iris,5,1,rbf,0.1,,,2024-02-19 17:11:53,done,2024-02-19 17:11:53,SVM_experimenter_01,Worklaptop,1.0,1.0,1.0,1.0,2024-02-19 17:12:02,
1,2,iris,5,1,rbf,0.3,,,2024-02-19 17:11:53,done,2024-02-19 17:12:02,SVM_experimenter_01,Worklaptop,1.0,1.0,1.0,1.0,2024-02-19 17:12:08,
2,3,iris,5,1,poly,0.1,3.0,0.0,2024-02-19 17:11:53,done,2024-02-19 17:12:08,SVM_experimenter_01,Worklaptop,1.0,1.0,1.0,1.0,2024-02-19 17:12:14,
3,4,iris,5,1,poly,0.1,3.0,0.1,2024-02-19 17:11:53,done,2024-02-19 17:12:14,SVM_experimenter_01,Worklaptop,1.0,1.0,1.0,1.0,2024-02-19 17:12:20,
4,5,iris,5,1,poly,0.1,4.0,0.0,2024-02-19 17:11:53,done,2024-02-19 17:12:20,SVM_experimenter_01,Worklaptop,1.0,1.0,1.0,1.0,2024-02-19 17:12:25,
5,6,iris,5,1,poly,0.1,4.0,0.1,2024-02-19 17:11:53,done,2024-02-19 17:12:26,SVM_experimenter_01,Worklaptop,1.0,1.0,1.0,1.0,2024-02-19 17:12:31,
6,7,iris,5,1,poly,0.3,3.0,0.0,2024-02-19 17:11:53,done,2024-02-19 17:12:31,SVM_experimenter_01,Worklaptop,1.0,1.0,1.0,1.0,2024-02-19 17:12:37,
7,8,iris,5,1,poly,0.3,3.0,0.1,2024-02-19 17:11:53,done,2024-02-19 17:12:37,SVM_experimenter_01,Worklaptop,1.0,1.0,1.0,1.0,2024-02-19 17:12:43,
8,9,iris,5,1,poly,0.3,4.0,0.0,2024-02-19 17:11:53,done,2024-02-19 17:12:43,SVM_experimenter_01,Worklaptop,1.0,1.0,1.0,1.0,2024-02-19 17:12:48,
9,10,iris,5,1,poly,0.3,4.0,0.1,2024-02-19 17:11:53,done,2024-02-19 17:12:48,SVM_experimenter_01,Worklaptop,1.0,1.0,1.0,1.0,2024-02-19 17:12:54,


### CodeCarbon
Note that `CodeCarbon` is activated by default, collecting information about the carbon emissions of each experiment. Have a look at our [general usage example](https://tornede.github.io/py_experimenter/examples/example_general_usage.html) and the according [documentation of CodeCarbon fields](https://tornede.github.io/py_experimenter/usage.html#codecarbon-fields) for more information.