# TensorFlow Example

These tutorial is based on https://www.tensorflow.org/tutorials/quickstart/beginner

In [1]:
from ExampleTensorflow01 import LoadData, FitModel, EvaluateModel
from tempfile import TemporaryDirectory
import shutil
import os

In [2]:
temp_dir = TemporaryDirectory()
shutil.copy('ExampleTensorflow01.py', temp_dir.name)
os.chdir(temp_dir.name)

1. Initalize GIT and DVC

We are using a temporary directory for test reasons and will delete it at the end

In [3]:
!git init
!dvc init

Initialized empty Git repository in C:/Users/fabia/AppData/Local/Temp/tmp41lt62w5/.git/
Initialized DVC repository.

You can now commit the changes to git.

+---------------------------------------------------------------------+
|                                                                     |
|        DVC has enabled anonymous aggregate usage analytics.         |
|     Read the analytics documentation (and how to opt-out) here:     |
|             <https://dvc.org/doc/user-guide/analytics>              |
|                                                                     |
+---------------------------------------------------------------------+

What's next?
------------
- Check out the documentation: <https://dvc.org/doc>
- Get help and share ideas: <https://dvc.org/chat>
- Star us on GitHub: <https://github.com/iterative/dvc>


*Content of ExampleTensorflow01.py:LoadData*
```python
from pytrack import PyTrack, DVCParams
import tensorflow as tf
import numpy as np

class LoadData(PyTrack):
    def __init__(self, id_=None, filter_=None):
        super().__init__()
        self.dvc = DVCParams(
            outs=['x_train.npy', 'y_train.npy', 'x_test.npy', 'y_test.npy']
        )
        self.post_init(id_, filter_)

    def __call__(self, dataset: str, exec_: bool = False, slurm: bool = False, force: bool = False,
                 always_changed: bool = False):
        self.parameters = {"dataset": dataset}
        self.post_call(exec_=exec_, slurm=slurm, force=force, always_changed=always_changed)

    def run(self):
        self.pre_run()

        if self.parameters['dataset'] == "mnist":
            mnist = tf.keras.datasets.mnist
            (x_train, y_train), (x_test, y_test) = mnist.load_data()
            x_train, x_test = x_train / 255.0, x_test / 255.0

            with open(self.files.outs[0], "wb") as f:
                np.save(f, x_train)
            with open(self.files.outs[1], "wb") as f:
                np.save(f, y_train)
            with open(self.files.outs[2], "wb") as f:
                np.save(f, x_test)
            with open(self.files.outs[3], "wb") as f:
                np.save(f, y_test)

            self.results = {"shape": x_train.shape, "targets": len(np.unique(y_train))}
```

2. Create an instance of the first stage.

Calling the stage runs the dvc command. Because on default it uses `--no-exec` it will only create the stage for us, but won't execute it.
Also for simplicity reasons we have only a single parameter.

In [4]:
load_data = LoadData()
load_data(dataset="mnist")

The dvc stage and the config has been created for us. But these values are also available via the class.
If we want to look at the output files, we can do as follows:


In [5]:
load_data.files.outs

[WindowsPath('outs/0_x_train.npy'),
 WindowsPath('outs/0_y_train.npy'),
 WindowsPath('outs/0_x_test.npy'),
 WindowsPath('outs/0_y_test.npy')]

Note: An additional file containing the results will also be created and can be found as `load_data.files.json_file`

The stage has not been executed yet, so no results are present. We can change that via `dvc repro`

In [6]:
load_data.results

No results found!


{}

In [7]:
!dvc repro

Running stage 'LoadData_0':
> python -c "from functions import LoadData; LoadData(id_=0).run()"
Generating lock file 'dvc.lock'
Updating lock file 'dvc.lock'



2021-06-09 11:54:51.227324: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll


To track the changes with git, run:

	git add dvc.lock
Use `dvc push` to send your updates to remote storage.


In [8]:
load_data.results

{'shape': [60000, 28, 28], 'targets': 10}

Let us now create further stages that depend on this stage. 
By default every stage is unique and has the `id = 0`. But in some cases it might be handy to have one stage multiple times.
This can be achieved by `DVCParams(multi_use=True)`, which allows for ids > 0. 
This can be interesting if you want to combine multiple data sources but it should not be used for multiple models, because we want to use DVC for that!

*Content of ExampleTensorflow01.py:FitModel*
```python
from pytrack import PyTrack, DVCParams
import tensorflow as tf
import numpy as np

class FitModel(PyTrack):
    def __init__(self, id_=None, filter_=None):
        super().__init__()
        self.dvc = DVCParams(
            outs=['model']
        )
        self.json_file = False
        self.post_init(id_, filter_)

    def __call__(self, exec_: bool = False, slurm: bool = False, force: bool = False,
                 always_changed: bool = False):
        self.dvc.deps = LoadData(id_=0).files.outs[:2]

        self.parameters = {"layer": 128}
        self.post_call(exec_=exec_, slurm=slurm, force=force, always_changed=always_changed)

    def run(self):
        self.pre_run()

        load_data = LoadData(id_=0)

        input_shape = load_data.results['shape'][1:]
        target_size = load_data.results['targets']

        model = tf.keras.models.Sequential([
            tf.keras.layers.Flatten(input_shape=input_shape),
            tf.keras.layers.Dense(self.parameters['layer'], activation='relu'),
            tf.keras.layers.Dropout(0.2),
            tf.keras.layers.Dense(target_size)
        ])

        loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
        model.compile(optimizer='adam',
                      loss=loss_fn,
                      metrics=['accuracy'])

        with open(load_data.files.outs[0], 'rb') as f:
            x_train = np.load(f)
        with open(load_data.files.outs[1], 'rb') as f:
            y_train = np.load(f)

        model.fit(x_train, y_train, epochs=5)
        model.save(str(self.files.outs[0]))
```

In [9]:
fit_model = FitModel()
fit_model()

*Content of ExampleTensorflow01.py:EvaluateModel*
```python
from pytrack import PyTrack, DVCParams
import tensorflow as tf
import numpy as np
import json

class EvaluateModel(PyTrack):

    def __init__(self, id_=None, filter_=None):
        super().__init__()
        self.dvc = DVCParams(
            metrics_no_cache=['metrics.json']
        )
        self.json_file = False
        self.post_init(id_, filter_)

    def __call__(self, exec_: bool = False, slurm: bool = False, force: bool = False,
                 always_changed: bool = False):
        self.parameters = {'verbose': 2}
        self.dvc.deps = FitModel(id_=0).files.outs
        self.dvc.deps += LoadData(id_=0).files.outs[2:]
        self.post_call(exec_=exec_, slurm=slurm, force=force, always_changed=always_changed)

    def run(self):
        self.pre_run()

        fit_model = FitModel(id_=0)

        model = tf.keras.models.load_model(str(fit_model.files.outs[0]))

        load_data = LoadData(id_=0)

        with open(load_data.files.outs[2], 'rb') as f:
            x_test = np.load(f)
        with open(load_data.files.outs[3], 'rb') as f:
            y_test = np.load(f)

        out = model.evaluate(x_test, y_test, verbose=self.parameters['verbose'])

        with open(self.files.metrics_no_cache[0], "w") as f:
            json.dump(out, f)
```

In [10]:
eval_model = EvaluateModel()
eval_model()

We have now created a `dvc.yaml` file containing all the information 

```yaml
stages:
  LoadData_0:
    cmd: python -c "from functions import LoadData; LoadData().run_dvc(id_=0)"
    params:
    - config/params.json:
      - LoadData.0
    outs:
    - outs\0_LoadData.json
    - outs\0_x_test.npy
    - outs\0_x_train.npy
    - outs\0_y_test.npy
    - outs\0_y_train.npy
  FitModel_0:
    cmd: python -c "from functions import FitModel; FitModel().run_dvc(id_=0)"
    deps:
    - outs\0_x_train.npy
    - outs\0_y_train.npy
    params:
    - config/params.json:
      - FitModel.0
    outs:
    - outs\0_model
  EvaluateModel_0:
    cmd: python -c "from functions import EvaluateModel; EvaluateModel().run_dvc(id_=0)"
    deps:
    - outs\0_model
    - outs\0_x_test.npy
    - outs\0_y_test.npy
    params:
    - config/params.json:
      - EvaluateModel.0
    metrics:
    - metrics\0_metrics.json:
        cache: false
```
and the respective parameters are stored in 
```json
{
  "LoadData": {
    "0": {
      "dataset": "mnist"
    }
  },
  "FitModel": {
    "0": {
      "layer": 128
    }
  },
  "EvaluateModel": {
    "0": {
      "verbose": 2
    }
  }
}
```

The dependency graph is also available via `dvc dag`
```
              +------------+        

              | LoadData_0 |        

              +------------+        

             ***           ***      

           **                 **    

         **                     **  

+------------+                    **

| FitModel_0 |                  **  

+------------+                **    

             ***           ***      

                **       **         

                  **   **           

            +-----------------+     

            | EvaluateModel_0 |     

            +-----------------+  
```

We can now use `dvc repro` to run all stages we have created and can look at the results in the respective folders

In [11]:
!dvc repro

Stage 'LoadData_0' didn't change, skipping
Running stage 'FitModel_0':
> python -c "from functions import FitModel; FitModel(id_=0).run()"
Epoch 1/5

   1/1875 [..............................] - ETA: 10:05 - loss: 2.3944 - accuracy: 0.1250
  67/1875 [>.............................] - ETA: 1s - loss: 1.1767 - accuracy: 0.6600   
 129/1875 [=>............................] - ETA: 1s - loss: 0.8790 - accuracy: 0.7469
 191/1875 [==>...........................] - ETA: 1s - loss: 0.7380 - accuracy: 0.7888
 254/1875 [===>..........................] - ETA: 1s - loss: 0.6568 - accuracy: 0.8134
 305/1875 [===>..........................] - ETA: 1s - loss: 0.6146 - accuracy: 0.8251
 343/1875 [====>.........................] - ETA: 1s - loss: 0.5833 - accuracy: 0.8335
 392/1875 [=====>........................] - ETA: 1s - loss: 0.5506 - accuracy: 0.8430
Epoch 2/5

   1/1875 [..............................] - ETA: 1s - loss: 0.0864 - accuracy: 0.9688
  64/1875 [>.............................] - ETA: 

2021-06-09 11:55:02.414290: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
2021-06-09 11:55:04.724504: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll
2021-06-09 11:55:04.749461: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:09:00.0 name: NVIDIA GeForce GTX 1050 Ti computeCapability: 6.1
coreClock: 1.43GHz coreCount: 6 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 104.43GiB/s
2021-06-09 11:55:04.749488: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
2021-06-09 11:55:04.757393: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll
2021-06-09 11:55:04.757423: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt6


Epoch 3/5

   1/1875 [..............................] - ETA: 1s - loss: 0.0274 - accuracy: 1.0000
  69/1875 [>.............................] - ETA: 1s - loss: 0.1082 - accuracy: 0.9669
 137/1875 [=>............................] - ETA: 1s - loss: 0.1114 - accuracy: 0.9656
 204/1875 [==>...........................] - ETA: 1s - loss: 0.1115 - accuracy: 0.9660
 271/1875 [===>..........................] - ETA: 1s - loss: 0.1105 - accuracy: 0.9655
 338/1875 [====>.........................] - ETA: 1s - loss: 0.1126 - accuracy: 0.9649
 408/1875 [=====>........................] - ETA: 1s - loss: 0.1138 - accuracy: 0.9655
Epoch 4/5

   1/1875 [..............................] - ETA: 1s - loss: 0.0094 - accuracy: 1.0000
  67/1875 [>.............................] - ETA: 1s - loss: 0.0975 - accuracy: 0.9725
 134/1875 [=>............................] - ETA: 1s - loss: 0.0897 - accuracy: 0.9748
 203/1875 [==>...........................] - ETA: 1s - loss: 0.0903 - accuracy: 0.9734
 270/1875 [===>.....

In [12]:
!dvc destroy -f

In [13]:
os.chdir('..')
temp_dir.cleanup()