# Research template

This short notebook shows how to use **BatchFlow** research module with everything packed into one callable, that performs all the things you need and returns results that must be saved into `results` dataframe.

In [None]:
# Necessary imports
import os
import sys
import shutil
from tqdm.auto import tqdm

sys.path.append('../seismiqb')
from seismiqb.batchflow import Pipeline, Dataset
from seismiqb.batchflow.research import Research, Option, Domain, Results, FileLogger
from seismiqb.batchflow.research import RP, RC, KV

We create a domain of (hyper)parameters to explore be defining multiple options. In our case, we create one for defining cube and horizon locations, and one for auxilliary number:

In [None]:
# Research options
cubes = ['A', 'B', 'C']
horizons = ['d', 'e']

options = [KV((cube, horizon), '+'.join((cube, horizon)))
           for horizon in horizons for cube in cubes]
domain = (Option('cube_and_horizon', options) * Option('number', [10, 100]))

list(domain.iterator)

In this function we get `config` and `pipeline`, passed from research run. Config contains everything domain-related and, therefore, we can get any of the previous defined options from it. Pipeline is used to transport internal parameters like `device` number to use for model training: we can retrieve those parameters too.

Following things deserve special mention:
- our `perform_one_experiment` just get all the parameters without doing very much with them: in your research, this function can do any computations to produce results

- use `device` to train one model at GPU at a time: otherwise, you might run into resource exhaustion

- all the returned values are stored in the dataframe with research results under the desired names

- it might me a good idea to log steps and intermediate results of this function

In [None]:
def perform_one_experiment(config, ppl):
    config = config.config()
    cube, horizon = config['cube_and_horizon']
    number, n_rep = config['number'], config['repetition']
    
    device = ppl.config['device']
    result = ord(cube) + ord(horizon) + number
    return result, device


def clear_previous_results(res_name):
    if os.path.exists(res_name):
        shutil.rmtree(res_name)

Most of the following code is used to trick `Research` object into just calling our function; we do so by creating fake pipeline with 1 iteration, that does nothing and serves merely as transport for our parameters.

In your research, you might need to change:
- `research_name` to change the destination of logs and resulting dataframe
- `n_reps` to explore the robustness of your models
- list of return names
- `workers` and `devices` to set up multi-GPU training with the desired amount of accelerators

Note the `timeout` argument of research run: it is used to tell `research` that our callable can take up to 1000 minutes to run. Default value is 10 minutes, which is defenitely not enough for a cube train/inference/evaluation combo.

In [None]:
# Name of the directory to save logs and results in
res_name = f'research_template'
clear_previous_results(res_name)

# Fake pipeline is needed to pass parameters around
fake_ppl = Pipeline().set_dataset(Dataset(10)).run_later(1, n_iters=1)

research = (
    Research()
    .add_logger(FileLogger)
    .init_domain(domain, n_reps=2)
    .add_pipeline(fake_ppl, run=True, name='fake')
    .add_callable(
        perform_one_experiment,                         # Callable to run
        returns=['result', 'device'],                   # Names of returned results
        execute='#0',                                   # Execute immediately
        config=RC('fake'),                              # Pass config to the callable
        ppl=RP('fake'),                                 # Pass pipeline to the callable
        name='perform_one_experiment'                   # Name to be shown in the dataframe
    )
)

research.run(
    n_iters=1,
    name=res_name,
    bar=True,
    workers=6,
    devices=[0, 1, 2, 3, 4, 5],
    timeout=1000
)

A regular dataframe with results, that can be manipulated to display results in a suiting manner:

In [None]:
results = Results(res_name)
results.df