# Usage of Logtables

This example shows the usage of `Logtables`. We will show how one can define and fill logtables. For this example you should already understand `PyExerimenter`s basic functionalities. Note that the this notebook does have limited amount of reasonable code.

To execute this notebook you need to install:
```
pip install py_experimenter
pip install scikit-learn
```

# Experiment Configuration File
This notebook shows an example execution of `PyExperimenter` based on the configuration file that is also use in the [general usage](https://tornede.github.io/py_experimenter/examples/example_general_usage.html) notebook. However this file is slightly adapted to show the usage of logtables. The goal in this small example is to find the best kernel for some dataset and log the performance of different kernels. To that end we show two ways of defining logtables: Standard Notation

`logtables = train_scores:log_train_scores... `  
`log_train_scores = f1:DOUBLE, accuracy:DOUBLE, kernel:str`

and Shorthand notation

`logtables = ..., test_f1:DOUBLE, test_accuracy:DOUBLE` .

Instead of creating just the table `logtable_example`, this file specifies 3 more tables:

`logtable_example__train_scores`,  
`logtable_example__test_f1`,  
`logtable_example__test_accuracy`

In [None]:
import os

content = """
[PY_EXPERIMENTER]
provider = sqlite 
database = automl_conf_2023
table = logtable_example

keyfields = dataset, cross_validation_splits:int, seed:int
dataset = iris
cross_validation_splits = 5
seed = 1,2,3,4,5
logtables = train_scores:log_train_scores, test_f1:DOUBLE, test_accuracy:DOUBLE 
log_train_scores = f1:DOUBLE, accuracy:DOUBLE, kernel:str

resultfields = best_kernel_f1:VARCHAR(50), best_kernel_accuracy:VARCHAR(50)
resultfields.timestamps = false

[CUSTOM] 
path = sample_data
"""

# Create config directory if it does not exist
if not os.path.exists('config'):
    os.mkdir('config')
    
# Create config file
experiment_configuration_file_path = os.path.join('config', 'example_general_usage.cfg')
with open(experiment_configuration_file_path, "w") as f: 
  f.write(content)

# Defining the execution function
Next, the execution of a single experiment has to be defined. Note that this dummy example is a slightly modified version of the [general usage](https://tornede.github.io/py_experimenter/examples/example_general_usage.html) notebook. Instead of executing with one kernel we iterate over kernels to find the best one. Additionally the resutls get logged.

In [None]:
import random
import numpy as np

from py_experimenter.result_processor import ResultProcessor

from sklearn.datasets import load_iris
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.model_selection import cross_validate

def run_ml(parameters: dict, result_processor: ResultProcessor, custom_config: dict):
    seed = parameters['seed']

    # Initalize variables
    performance_f1 = 0
    best_kernel_f1 = ''
    performance_accuracy = 0
    best_kernel_accuracy = ''
    for kernel in ['linear', 'poly', 'rbf', 'sigmoid']:
        # Set seed for reproducibility
        random.seed(seed)
        np.random.seed(seed)

        data = load_iris()

        X = data.data
        y = data.target

        model = make_pipeline(StandardScaler(), SVC(kernel=kernel, gamma='auto'))
        scores = cross_validate(model, X, y,
                                cv=parameters['cross_validation_splits'],
                                scoring=('accuracy', 'f1_micro'),
                                return_train_score=True
                                )

        # Log scores to logtables
        result_processor.process_logs(
            {
                'train_scores': {
                    'f1': np.mean(scores['train_f1_micro']),
                    'accuracy': np.mean(scores['train_accuracy']),
                    'kernel': "'" + kernel + "'"
                },
                'test_f1': {
                    'test_f1': np.mean(scores['test_f1_micro'])},
                'test_accuracy': {
                    'test_accuracy': np.mean(scores['test_accuracy'])},
            }
        )

        if np.mean(scores['test_f1_micro']) > performance_f1:
            performance_f1 = np.mean(scores['test_f1_micro'])
            best_kernel_f1 = kernel
        if np.mean(scores['test_accuracy']) > performance_accuracy:
            performance_accuracy = np.mean(scores['test_accuracy'])
            best_kernel_accuracy = kernel

    result_processor.process_results({
        'best_kernel_f1': best_kernel_f1,
        'best_kernel_accuracy': best_kernel_accuracy
    })

# Executing PyExperimenter
Now we create a PyExperimenter object with the config above as configuration file. We also fill the database with with values from the config file.

In [None]:
from py_experimenter.experimenter import PyExperimenter

experimenter = PyExperimenter(experiment_configuration_file_path=experiment_configuration_file_path, name='example_notebook')
experimenter.fill_table_from_config()

experimenter.get_table()

In [None]:
# Read one of the logtables
experimenter.get_logtable('train_scores')

## Run Experiments

We then use the experimenter to execute the `run_ml` function.

In [None]:
experimenter.execute(run_ml, -1)

## Check results
Lastly the content of all tables is returned.

In [None]:
experimenter.get_table()

In [None]:
experimenter.get_logtable('train_scores')

In [None]:
experimenter.get_logtable('test_f1')

In [None]:
experimenter.get_logtable('test_accuracy')