# Installation

In [None]:
! pip install flordb

# Getting Started

We start by selecting (or creating) a `git` repository to save our model training code as we iterate and experiment. Flor automatically commits your changes on every run, so no change is lost. Below we provide a sample repository you can use to follow along:

In [None]:
import os
!git clone git@github.com:ucbepic/ml_tutorial ../ml_tutorial
os.chdir('../ml_tutorial/')

Run the `train.py` script to train a small linear model, 
and test your `flordb` installation.

In [None]:
! python train.py --flor myFirstRun

Flor will manage checkpoints, logs, command-line arguments, code changes, and other experiment metadata on each run (More details [below](#storage--data-layout)). All of this data is then expesed to the user via SQL or Pandas queries.


# View your experiment history
From the same directory you ran the examples above, open an iPython terminal, then load and pivot the log records.


In [None]:
from flor import full_pivot, log_records
df = full_pivot(log_records())

df.head()

# Run some more experiments
The `train.py` script has been prepared in advance to define and manage four different hyper-parameters:

In [None]:
%cat train.py | grep flor.arg

You can control any of the hyper-parameters (e.g. `hidden`) using Flor's command-line interface:

In [None]:
! python train.py --flor mySecondRun --hidden 75

### Advanced (Optional): Batch Processing
Alternatively, we can call `flor.batch()` from an interactive environment
inside our model training repository, to dispatch a group of jobs that can be long-runnning:

In [None]:
import flor

jobs = flor.cross_prod(hidden=[i*100 for i in range(1,6)],lr=(1e-4, 1e-3))
assert jobs is not None

flor.batch(jobs)

Then, using a new console or terminal, we start a `flordb` server to process the batch jobs:
```bash
$ python -m flor serve
```

or, if we want to allocate a GPU to the flor server:
```bash
$ python -m flor serve 0 
```
(where 0 is replaced by the GPU id).

You can check the progress of your jobs with the following query:

In [None]:
!sqlite3 ~/.flor/main.db -header 'select done, path, count(*) from jobs group by done, path;'

When finished, the query will report 10 jobs marked as `done` = 1

```
done|path|count(*)
1|/Users/rogarcia/git/ml_tutorial|10
```

You can view the updated pivot view as follows:

In [None]:
df = full_pivot(log_records())

print(df['vid'].drop_duplicates().count(), 'versions')
df.head()

# Model Traing Kit (MTK)
The Model Training Kit (MTK) includes utilities for serializing and checkpointing PyTorch state,
and utilities for resuming, auto-parallelizing, and memoizing executions from checkpoint.

In this context, `Flor` is an alias for `MTK`. The model developer passes objects for checkpointing to `Flor.checkpoints(*args)`,
and gives it control over loop iterators by 
calling `Flor.loop(iterator)` as follows:

In [None]:
!cat train.py | grep -B 3 -A 25 Flor.checkpoints 

As shown, 
we wrap both the nested training loop and main loop with `Flor.loop` so Flor can manage their state. Flor will use loop iteration boundaries to store selected checkpoints adaptively, and on replay time use those same checkpoints to resume training from the appropriate epoch.  


### Logging API

You call `flor.log(name, value)` and `flor.arg(name, default=None)` to log metrics and register tune-able hyper-parameters, respectively. 

In [None]:
%cat train.py | grep -C 3 -e 'flor.arg' -e 'flor.log'

The `name`(s) you use for the variables you intercept with `flor.log` and `flor.arg` will become a column (measure) in the full pivoted view (see [Viewing your exp history](#view-your-experiment-history)).


# Hindsight Logging


Suppose you wanted to start logging the `device`
identifier where the model is run, as well as the
final `accuracy` after training.
You would add the corresponding logging statements
to `train.py`, for example:

In [None]:
%cat train.py | grep -C 4 flor.log

In [None]:
! echo $(pwd)
! git commit -am "hindsight logging stmts added."

Typically, when you add a logging statement, logging 
begins "from now on", and you have no visibility into the past.
With hindsight logging, the aim is to allow model developers to send
new logging statements back in time, and replay the past 
efficiently from checkpoint.

In order to do that, we open up an interactive environent from within the `ml_tutorial` directory, and call `flor.replay()`, asking flor to apply the logging statements with the names `device` and `accuracy` to all previous versions (leave `where_clause` null in `flor.replay()`):

In [None]:
flor.replay(['device', 'accuracy'])

Then, using a new console or terminal, we start a `flordb` server to process the batch jobs:
```bash
$ python -m flor serve
```

or, if we want to allocate a GPU to the flor server:
```bash
$ python -m flor serve 0 
```
(where 0 is replaced by the GPU id).

You can check the progress of your jobs with the following query:

In [None]:
!sqlite3 ~/.flor/main.db -header 'select done, path, appvars, count(*) from replay group by done, path, appvars;'

When the process is finished, you will be able to view the values for `device` and `accuracy` for historical executions, and they will continue to be logged in subsequent iterations:

In [None]:
from flor import full_pivot, log_records
df = full_pivot(log_records())
df[list(flor.DATA_PREP) + ['device', 'accuracy']].drop_duplicates()

Note the new columns `device` and `accuracy` that are backfilled.