# Example: Usage of Logtables

This example shows the usage of `Logtables`. We will show how one can define and fill logtables. For this example you should already understand the basic functionalities of `PyExerimenter`. Note that the purpose of this notebook is to demonstrate the functionalities of logtables, not to provide reasonable experiments.

To execute this notebook you need to install:
```
pip install py_experimenter
pip install scikit-learn
```

## Experiment Configuration File
This notebook shows an example execution of `PyExperimenter` based the configuration file that is used in the [general usage](https://tornede.github.io/py_experimenter/examples/example_general_usage.html) notebook. However, this file is slightly adapted to show the usage of logtables. The goal in this small example is to find the best kernel for an SVM on some dataset using grid search and log the performance of SVMs initialized with different kernels. Further explanation of logtables can be found in the [documentation](https://tornede.github.io/py_experimenter/usage.html#logtables).

In [1]:
import os

content = """
[PY_EXPERIMENTER]
provider = sqlite 
database = py_experimenter
table = example_logtables

keyfields = dataset, cross_validation_splits:int, seed:int
dataset = iris
cross_validation_splits = 5
seed = 1,2,3,4,5

resultfields = best_kernel_f1:VARCHAR(50), best_kernel_accuracy:VARCHAR(50)
resultfields.timestamps = false

logtables = train_scores:log_train_scores, test_f1:DOUBLE, test_accuracy:DOUBLE 
log_train_scores = f1:DOUBLE, accuracy:DOUBLE, kernel:str

[CUSTOM] 
path = sample_data
"""

# Create config directory if it does not exist
if not os.path.exists('config'):
    os.mkdir('config')
    
# Create config file
experiment_configuration_file_path = os.path.join('config', 'example_logtables.cfg')
with open(experiment_configuration_file_path, "w") as f: 
  f.write(content)

## Defining the execution function
Next, the execution of a single experiment has to be defined. Note that this dummy example is a slightly modified version of the [general usage](https://tornede.github.io/py_experimenter/examples/example_general_usage.html) notebook. Instead of executing with one kernel we iterate over kernels to find the best one. Additionally, the results get logged.

In [2]:
import random
import numpy as np

from py_experimenter.result_processor import ResultProcessor

from sklearn.datasets import load_iris
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.model_selection import cross_validate

def run_ml(parameters: dict, result_processor: ResultProcessor, custom_config: dict):
    seed = parameters['seed']

    # Initalize variables
    performance_f1 = 0
    best_kernel_f1 = ''
    performance_accuracy = 0
    best_kernel_accuracy = ''
    
    for kernel in ['linear', 'poly', 'rbf', 'sigmoid']:
        # Set seed for reproducibility
        random.seed(seed)
        np.random.seed(seed)

        data = load_iris()
        X = data.data
        y = data.target

        model = make_pipeline(StandardScaler(), SVC(kernel=kernel, gamma='auto'))
        scores = cross_validate(model, X, y,
                                cv=parameters['cross_validation_splits'],
                                scoring=('accuracy', 'f1_micro'),
                                return_train_score=True
                                )

        # Log scores to logtables
        result_processor.process_logs(
            {
                'train_scores': {
                    'f1': np.mean(scores['train_f1_micro']),
                    'accuracy': np.mean(scores['train_accuracy']),
                    'kernel': "'" + kernel + "'"
                },
                'test_f1': {
                    'test_f1': np.mean(scores['test_f1_micro'])},
                'test_accuracy': {
                    'test_accuracy': np.mean(scores['test_accuracy'])},
            }
        )

        if np.mean(scores['test_f1_micro']) > performance_f1:
            performance_f1 = np.mean(scores['test_f1_micro'])
            best_kernel_f1 = kernel
        if np.mean(scores['test_accuracy']) > performance_accuracy:
            performance_accuracy = np.mean(scores['test_accuracy'])
            best_kernel_accuracy = kernel

    result_processor.process_results({
        'best_kernel_f1': best_kernel_f1,
        'best_kernel_accuracy': best_kernel_accuracy
    })

## Executing PyExperimenter
Now we create a `PyExperimenter` object with the experiment configuration above. We also fill the database with with values from that experiment configuration file.

In [3]:
from py_experimenter.experimenter import PyExperimenter

experimenter = PyExperimenter(experiment_configuration_file_path=experiment_configuration_file_path, name='example_notebook')
experimenter.fill_table_from_config()

experimenter.get_table()

Unnamed: 0,ID,dataset,cross_validation_splits,seed,creation_date,status,start_date,name,machine,best_kernel_f1,best_kernel_accuracy,end_date,error
0,1,iris,5,1,2023-04-01 17:47:43,done,2023-04-01 17:47:43,example_notebook,MacBook-Pro-von-Tanja.local,linear,linear,2023-04-01 17:47:43,
1,2,iris,5,2,2023-04-01 17:47:43,done,2023-04-01 17:47:43,example_notebook,MacBook-Pro-von-Tanja.local,linear,linear,2023-04-01 17:47:43,
2,3,iris,5,3,2023-04-01 17:47:43,done,2023-04-01 17:47:43,example_notebook,MacBook-Pro-von-Tanja.local,linear,linear,2023-04-01 17:47:43,
3,4,iris,5,4,2023-04-01 17:47:43,done,2023-04-01 17:47:43,example_notebook,MacBook-Pro-von-Tanja.local,linear,linear,2023-04-01 17:47:43,
4,5,iris,5,5,2023-04-01 17:47:43,done,2023-04-01 17:47:43,example_notebook,MacBook-Pro-von-Tanja.local,linear,linear,2023-04-01 17:47:43,


In [4]:
# Read one of the logtables
experimenter.get_logtable('train_scores')

Unnamed: 0,ID,experiment_id,timestamp,f1,accuracy,kernel
0,1,1,2023-04-01 17:47:43,0.971667,0.971667,linear
1,2,1,2023-04-01 17:47:43,0.936667,0.936667,poly
2,3,1,2023-04-01 17:47:43,0.975,0.975,rbf
3,4,1,2023-04-01 17:47:43,0.896667,0.896667,sigmoid
4,5,2,2023-04-01 17:47:43,0.971667,0.971667,linear
5,6,2,2023-04-01 17:47:43,0.936667,0.936667,poly
6,7,2,2023-04-01 17:47:43,0.975,0.975,rbf
7,8,2,2023-04-01 17:47:43,0.896667,0.896667,sigmoid
8,9,3,2023-04-01 17:47:43,0.971667,0.971667,linear
9,10,3,2023-04-01 17:47:43,0.936667,0.936667,poly


## Run Experiments

All experiments are executed sequentially by the same `PyExperimenter` due to `max_experiments=-1` and the implicit `n_jobs=1` as no amount of jobs is specified in the configuration file. If just a single one or a predifined number of experiments should be executed, the `-1` has to be replaced by the corresponding amount.

The first parameter, i.e. `run_ml`, relates to the actual method that should be executed with the given keyfields of the table. 

In [5]:
experimenter.execute(run_ml, max_experiments=-1)

## Check results
The content of all database tables having keyfields and resultfields, as well as every logtable can be easily obtained.

In [6]:
experimenter.get_table()

Unnamed: 0,ID,dataset,cross_validation_splits,seed,creation_date,status,start_date,name,machine,best_kernel_f1,best_kernel_accuracy,end_date,error
0,1,iris,5,1,2023-04-01 17:47:43,done,2023-04-01 17:47:43,example_notebook,MacBook-Pro-von-Tanja.local,linear,linear,2023-04-01 17:47:43,
1,2,iris,5,2,2023-04-01 17:47:43,done,2023-04-01 17:47:43,example_notebook,MacBook-Pro-von-Tanja.local,linear,linear,2023-04-01 17:47:43,
2,3,iris,5,3,2023-04-01 17:47:43,done,2023-04-01 17:47:43,example_notebook,MacBook-Pro-von-Tanja.local,linear,linear,2023-04-01 17:47:43,
3,4,iris,5,4,2023-04-01 17:47:43,done,2023-04-01 17:47:43,example_notebook,MacBook-Pro-von-Tanja.local,linear,linear,2023-04-01 17:47:43,
4,5,iris,5,5,2023-04-01 17:47:43,done,2023-04-01 17:47:43,example_notebook,MacBook-Pro-von-Tanja.local,linear,linear,2023-04-01 17:47:43,


In [7]:
experimenter.get_logtable('train_scores')

Unnamed: 0,ID,experiment_id,timestamp,f1,accuracy,kernel
0,1,1,2023-04-01 17:47:43,0.971667,0.971667,linear
1,2,1,2023-04-01 17:47:43,0.936667,0.936667,poly
2,3,1,2023-04-01 17:47:43,0.975,0.975,rbf
3,4,1,2023-04-01 17:47:43,0.896667,0.896667,sigmoid
4,5,2,2023-04-01 17:47:43,0.971667,0.971667,linear
5,6,2,2023-04-01 17:47:43,0.936667,0.936667,poly
6,7,2,2023-04-01 17:47:43,0.975,0.975,rbf
7,8,2,2023-04-01 17:47:43,0.896667,0.896667,sigmoid
8,9,3,2023-04-01 17:47:43,0.971667,0.971667,linear
9,10,3,2023-04-01 17:47:43,0.936667,0.936667,poly


In [8]:
experimenter.get_logtable('test_f1')

Unnamed: 0,ID,experiment_id,timestamp,test_f1
0,1,1,2023-04-01 17:47:43,0.966667
1,2,1,2023-04-01 17:47:43,0.933333
2,3,1,2023-04-01 17:47:43,0.966667
3,4,1,2023-04-01 17:47:43,0.893333
4,5,2,2023-04-01 17:47:43,0.966667
5,6,2,2023-04-01 17:47:43,0.933333
6,7,2,2023-04-01 17:47:43,0.966667
7,8,2,2023-04-01 17:47:43,0.893333
8,9,3,2023-04-01 17:47:43,0.966667
9,10,3,2023-04-01 17:47:43,0.933333


In [9]:
experimenter.get_logtable('test_accuracy')

Unnamed: 0,ID,experiment_id,timestamp,test_accuracy
0,1,1,2023-04-01 17:47:43,0.966667
1,2,1,2023-04-01 17:47:43,0.933333
2,3,1,2023-04-01 17:47:43,0.966667
3,4,1,2023-04-01 17:47:43,0.893333
4,5,2,2023-04-01 17:47:43,0.966667
5,6,2,2023-04-01 17:47:43,0.933333
6,7,2,2023-04-01 17:47:43,0.966667
7,8,2,2023-04-01 17:47:43,0.893333
8,9,3,2023-04-01 17:47:43,0.966667
9,10,3,2023-04-01 17:47:43,0.933333
