# Abalone

In this example, we'll demonstrate how to use [dataduit](https://github.com/JackBurdick/dataduit) to create tensorflow datasets from a pandas dataframe by specifying a config file.

We'll then demonstrate how to use yeahml to create/build/evaluate a model on the created data.

#### Note:
> The model for this project likely doesn't make sense. I am not personally familiar with the dataset/problem, I was interested showing an example

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import pandas as pd
import tensorflow as tf
import dataduit as dd
import yeahml as yml

## Create Datasets

In [3]:
# Reading a file from online
# more information can be found here:
# > https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/
h = ["sex",
"length",
"diameter",
"height",
"whole_weight",
"shucked_weight",
"viscera_weight",
"shell_weight",
"rings"]
df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data',names=h)

In [4]:
# only use 2 of the features
dd_dict = {
    "meta": {
        "name": "abalone",
        "logging": {"log_stream_level": "INFO"},
        "in": {"from": "memory", "type": "pandas"},
    },
    "read": {
        "split_percents": [75, 15, 10],
        "split_names": ["train", "val", "test"],
        "iterate": {
            "return_type": "tuple",
            "schema": {
                "x": {
                    "length": {
                        "indicator": "length",
                        "datatype": {
                            "in": {"options": {"dtype": "float64", "shape": 1}},
                            "out": {},
                        },
                        "special": "decode",
                    },
                    "diameter": {
                        "indicator": "diameter",
                        "datatype": {
                            "in": {"options": {"dtype": "float64", "shape": 1}},
                            "out": {},
                        },
                        "special": "decode",
                    },
                },
                "y": {
                    "rings": {
                        "datatype": {
                            "in": {"options": {"dtype": "int64", "shape": 1}},
                            "out": {},
                        }
                    }
                },
            },
        },
    },
}

In [5]:
# create the datasets based on the above defined names/splits/specifed data
ds_dict = dd.read(dd_dict, df)

`ds_dict` is a dictionary containing the tensorflow datasets (as specified above). which can be accessed like this:

```python
ds_val = ds_dict["val"]
```

## Specify the Model

In [6]:
# %load_ext autoreload
# %autoreload 2
# import yeahml as yml
example = "./main_config.yml"
yml_dict = yml.create_configs(example)

In [7]:
import pprint
pprint.pprint(yml_dict)

{'data': {'in_dim': [None, 2, 1],
          'in_dtype': 'float64',
          'input_layer_dim': [2, 1],
          'label_dtype': 'int32',
          'label_one_hot': False,
          'output_dim': [None, 1],
          'reshape_in_to': None},
 'hyper_parameters': {'dataset': {'batch': 16, 'shuffle_buffer': 128},
                      'early_stopping': {'epochs': None, 'warm_up': None},
                      'epochs': 30,
                      'optimizer': {'options': {'beta_1': 0.91,
                                                'learning_rate': 0.0001},
                                    'type': 'adam'}},
 'logging': {'console': {'format_str': '%(name)-12s: %(levelname)-8s '
                                       '%(message)s',
                         'level': 'info'},
             'file': {'format_str': '%(filename)s:%(lineno)s - '
                                    '%(funcName)20s()][%(levelname)-8s]: '
                                    '%(message)s',
                      'lev

## Build the model

In [8]:
# If you receive an error:
# AttributeError: 'google.protobuf.pyext._message.RepeatedCompositeCo' object has no attribute 'append'
# I personally used `pip install -U protobuf=3.8.0` to resolve
# per https://github.com/tensorflow/tensorflow/issues/33348
model = yml.build_model(yml_dict)

build_logger: INFO     -> START building graph
build_logger: INFO     -> START building hidden block
graph_logger: INFO     | dense_1         | no_shape
graph_logger: INFO     | dense_2         | no_shape
graph_logger: INFO     | dense_3_output  | no_shape
build_logger: INFO     [END] building hidden block
build_logger: INFO     information json file created


In [9]:
model.summary()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 2, 1)]            0         
_________________________________________________________________
dense_1 (Dense)              (None, 2, 16)             32        
_________________________________________________________________
dense_2 (Dense)              (None, 2, 8)              136       
_________________________________________________________________
dense_3_output (Dense)       (None, 2, 1)              9         
Total params: 177
Trainable params: 177
Non-trainable params: 0
_________________________________________________________________


## Train the Model

Notice here that we're using the created training and validation sets from `ds_dict`

In [10]:
train_dict = yml.train_model(model, yml_dict, (ds_dict["train"], ds_dict["val"]))

train_logger: INFO     -> START training graph
W0202 17:52:35.153477 140451729581888 base_layer.py:1790] Layer dense_1 is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2.  The layer has dtype float32 because it's dtype defaults to floatx.


To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

train_logger: INFO     start creating train_dict
train_logger: INFO     [END] creating train_dict


## Evaluate the Model

In [11]:
eval_dict = yml.eval_model(
    model,
    yml_dict,
    dataset=ds_dict["test"]
)
print(eval_dict)

eval_logger : INFO     params loaded from yeahml/abalone/trial_00/save/params/run_2020_02_02-17_52_34/best_params.h5
eval_logger : INFO     -> START evaluating model
eval_logger : INFO     [END] evaluating model
eval_logger : INFO     -> START creating eval_dict
eval_logger : INFO     [END] creating eval_dict


{'meansquarederror': 6.541078, 'meanabsoluteerror': 1.9089029}


## Inspect model in Tensorflow

In the command line you can navigate to the `albalone` directory and run: (provided tensorboard is installed in your environment)

```bash
tensorboard --logdir model_a/
```