These tutorial is based on https://www.tensorflow.org/tutorials/quickstart/beginner

In [1]:
from functions import LoadData, FitModel, EvaluateModel

1. Initalize GIT and DVC
```
git init
dvc init
```

# Content of functions.py:LoadData
```python
class LoadData(DVCOp):
    def config(self):
        self.dvc = DVCParams(
            outs=['x_train.npy', 'y_train.npy', 'x_test.npy', 'y_test.npy']
        )

    def __call__(self, dataset: str, exec_: bool = False, slurm: bool = False, force: bool = False,
                 always_changed: bool = False):
        self.parameters = {"dataset": dataset}
        self.post_call(exec_=exec_, slurm=slurm, force=force, always_changed=always_changed)

    def run_dvc(self, id_=0):
        self.pre_run(id_)

        if self.parameters['dataset'] == "mnist":
            mnist = tf.keras.datasets.mnist
            (x_train, y_train), (x_test, y_test) = mnist.load_data()
            x_train, x_test = x_train / 255.0, x_test / 255.0

            with open(self.files.outs[0], "wb") as f:
                np.save(f, x_train)
            with open(self.files.outs[1], "wb") as f:
                np.save(f, y_train)
            with open(self.files.outs[2], "wb") as f:
                np.save(f, x_test)
            with open(self.files.outs[3], "wb") as f:
                np.save(f, y_test)

            self.results = {"shape": x_train.shape, "targets": len(np.unique(y_train))}
```

2. Create an instance of the first stage.

Calling the stage runs the dvc command. Because on default it uses `--no-exec` it will only create the stage for us, but won't execute it.
Also for simplicity reasons we have only a single parameter.

In [2]:
load_data = LoadData()
load_data(dataset="mnist")

The dvc stage and the config has been created for us. But these values are also available via the class.
If we want to look at the output files, we can do as follows:


In [3]:
load_data.files.outs

[WindowsPath('outs/0_x_train.npy'),
 WindowsPath('outs/0_y_train.npy'),
 WindowsPath('outs/0_x_test.npy'),
 WindowsPath('outs/0_y_test.npy')]

Note: An additional file containing the results will also be created and can be found as `load_data.files.json_file`

The stage has not been executed yet, so no results are present. We can change that via `dvc repro`

In [4]:
load_data.results

No results found!


{}

`dvc repro`

In [5]:
load_data.results

{'shape': [60000, 28, 28], 'targets': 10}

Let us now create further stages that depend on this stage. 
By default every stage is unique and has the `id = 0`. But in some cases it might be handy to have one stage multiple times.
This can be achieved by `DVCParams(multi_use=True)`, which allows for ids > 0. 
This can be interesting if you want to combine multiple data sources but it should not be used for multiple models, because we want to use DVC for that!

# Content of functions.py:FitModel
```python
class FitModel(DVCOp):
    def config(self):
        self.dvc = DVCParams(
            outs=['model']
        )
        self.json_file = False

    def __call__(self, exec_: bool = False, slurm: bool = False, force: bool = False,
                 always_changed: bool = False):
        self.dvc.deps = LoadData(id_=0).files.outs[:2]

        self.parameters = {"layer": 128}
        self.post_call(exec_=exec_, slurm=slurm, force=force, always_changed=always_changed)

    def run_dvc(self, id_=0):
        self.pre_run(id_)

        load_data = LoadData(id_=0)

        input_shape = load_data.results['shape'][1:]
        target_size = load_data.results['targets']

        model = tf.keras.models.Sequential([
            tf.keras.layers.Flatten(input_shape=input_shape),
            tf.keras.layers.Dense(self.parameters['layer'], activation='relu'),
            tf.keras.layers.Dropout(0.2),
            tf.keras.layers.Dense(target_size)
        ])

        loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
        model.compile(optimizer='adam',
                      loss=loss_fn,
                      metrics=['accuracy'])

        with open(load_data.files.outs[0], 'rb') as f:
            x_train = np.load(f)
        with open(load_data.files.outs[1], 'rb') as f:
            y_train = np.load(f)

        model.fit(x_train, y_train, epochs=5)
        model.save(str(self.files.outs[0]))
```

In [6]:
fit_model = FitModel()
fit_model()

# Content of functions.py:EvaluateModel
```python
class EvaluateModel(DVCOp):
    def config(self):
        self.dvc = DVCParams(
            metrics_no_cache=['metrics.json']
        )
        self.json_file = False

    def __call__(self, exec_: bool = False, slurm: bool = False, force: bool = False,
                 always_changed: bool = False):
        self.parameters = {'verbose': 2}
        self.dvc.deps = FitModel(id_=0).files.outs
        self.dvc.deps += LoadData(id_=0).files.outs[2:]
        self.post_call(exec_=exec_, slurm=slurm, force=force, always_changed=always_changed)

    def run_dvc(self, id_=0):
        self.pre_run(id_)

        fit_model = FitModel(id_=0)

        model = tf.keras.models.load_model(str(fit_model.files.outs[0]))

        load_data = LoadData(id_=0)

        with open(load_data.files.outs[2], 'rb') as f:
            x_test = np.load(f)
        with open(load_data.files.outs[3], 'rb') as f:
            y_test = np.load(f)

        out = model.evaluate(x_test, y_test, verbose=self.parameters['verbose'])

        with open(self.files.metrics_no_cache[0], "w") as f:
            json.dump(out, f)
```

In [7]:
eval_model = EvaluateModel()
eval_model()

We have now created a `dvc.yaml` file containing all the information 

```yaml
stages:
  LoadData_0:
    cmd: python -c "from functions import LoadData; LoadData().run_dvc(id_=0)"
    params:
    - config/params.json:
      - LoadData.0
    outs:
    - outs\0_LoadData.json
    - outs\0_x_test.npy
    - outs\0_x_train.npy
    - outs\0_y_test.npy
    - outs\0_y_train.npy
  FitModel_0:
    cmd: python -c "from functions import FitModel; FitModel().run_dvc(id_=0)"
    deps:
    - outs\0_x_train.npy
    - outs\0_y_train.npy
    params:
    - config/params.json:
      - FitModel.0
    outs:
    - outs\0_model
  EvaluateModel_0:
    cmd: python -c "from functions import EvaluateModel; EvaluateModel().run_dvc(id_=0)"
    deps:
    - outs\0_model
    - outs\0_x_test.npy
    - outs\0_y_test.npy
    params:
    - config/params.json:
      - EvaluateModel.0
    metrics:
    - metrics\0_metrics.json:
        cache: false
```
and the respective parameters are stored in 
```json
{
  "LoadData": {
    "0": {
      "dataset": "mnist"
    }
  },
  "FitModel": {
    "0": {
      "layer": 128
    }
  },
  "EvaluateModel": {
    "0": {
      "verbose": 2
    }
  }
}
```

The dependency graph is also available via `dvc dag`
```
              +------------+        

              | LoadData_0 |        

              +------------+        

             ***           ***      

           **                 **    

         **                     **  

+------------+                    **

| FitModel_0 |                  **  

+------------+                **    

             ***           ***      

                **       **         

                  **   **           

            +-----------------+     

            | EvaluateModel_0 |     

            +-----------------+  
```

We can now use `dvc repro` to run all stages we have created and can look at the results in the respective folders