# Abalone

In this example, we'll demonstrate how to use [dataduit](https://github.com/JackBurdick/dataduit) to create tensorflow datasets from a pandas dataframe by specifying a config file.

We'll then demonstrate how to use yeahml to create/build/evaluate a model on the created data.

#### Note:
> The model for this project likely doesn't make sense. I am not personally familiar with the dataset/problem, I was interested showing an example

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import pandas as pd
import tensorflow as tf
import dataduit as dd
import yeahml as yml

## Create Datasets

In [3]:
# Reading a file from online
# more information can be found here:
# > https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/
h = ["sex",
"length",
"diameter",
"height",
"whole_weight",
"shucked_weight",
"viscera_weight",
"shell_weight",
"rings"]
df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data',names=h)

In [4]:
# only use 2 of the features
dd_dict = {
    "meta": {
        "name": "abalone",
        "logging": {"log_stream_level": "INFO"},
        "in": {"from": "memory", "type": "pandas"},
    },
    "read": {
        "split_percents": [75, 15, 10],
        "split_names": ["train", "val", "test"],
        "iterate": {
            "return_type": "tuple",
            "schema": {
                "x": {
                    "length": {
                        "indicator": "length",
                        "datatype": {
                            "in": {"options": {"dtype": "float64", "shape": 1}},
                            "out": {},
                        },
                        "special": "decode",
                    },
                    "diameter": {
                        "indicator": "diameter",
                        "datatype": {
                            "in": {"options": {"dtype": "float64", "shape": 1}},
                            "out": {},
                        },
                        "special": "decode",
                    },
                },
                "y": {
                    "rings": {
                        "datatype": {
                            "in": {"options": {"dtype": "int64", "shape": 1}},
                            "out": {},
                        }
                    }
                },
            },
        },
    },
}

In [5]:
# create the datasets based on the above defined names/splits/specifed data
ds_dict = dd.read(dd_dict, df)

`ds_dict` is a dictionary containing the tensorflow datasets (as specified above). which can be accessed like this:

```python
ds_val = ds_dict["val"]
```

## Specify the Model

In [6]:
# %load_ext autoreload
# %autoreload 2
# import yeahml as yml
example = "./main_config.yml"
yml_dict = yml.create_configs(example)

In [7]:
import pprint
pprint.pprint(yml_dict["subgraphs"])

{'dense_out': {'sequence': [((2,
                              [[((2,
                                  [['feature_a',
                                    'flatten_1',
                                    'dense_1',
                                    'dense_2'],
                                   ['feature_a',
                                    'flatten_1',
                                    'dense_1',
                                    'dense_2b']]),
                                 'concat_1'),
                                'concat_1',
                                'dense_3a'],
                               ['feature_a',
                                'flatten_1',
                                'dense_1',
                                'dense_2',
                                'dense_3b']]),
                             'concat_3'),
                            'concat_3',
                            'dense_out']}}


## Build the model

In [8]:
# If you receive an error:
# AttributeError: 'google.protobuf.pyext._message.RepeatedCompositeCo' object has no attribute 'append'
# I personally used `pip install -U protobuf=3.8.0` to resolve
# per https://github.com/tensorflow/tensorflow/issues/33348
model = yml.build_model(yml_dict)

build_logger: INFO     -> START building graph
W0311 17:03:05.084430 140657664378688 base_layer.py:1790] Layer flatten_1 is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2.  The layer has dtype float32 because it's dtype defaults to floatx.


To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

build_logger: INFO     information json file created


In [9]:
model.summary()

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
feature_a (InputLayer)          [(None, 2, 1)]       0                                            
__________________________________________________________________________________________________
flatten_1 (Flatten)             (None, 2)            0           feature_a[0][0]                  
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 16)           48          flatten_1[0][0]                  
__________________________________________________________________________________________________
dense_2 (Dense)                 (None, 8)            136         dense_1[0][0]                    
______________________________________________________________________________________________

## Train the Model

Notice here that we're using the created training and validation sets from `ds_dict`

In [10]:
train_dict = yml.train_model(model, yml_dict, (ds_dict["train"], ds_dict["val"]))

train_logger: INFO     -> START training graph
train_logger: INFO     start creating train_dict
train_logger: INFO     [END] creating train_dict


In [11]:
print(train_dict)

{'train_losses': [98.86153, 82.09828, 61.93428, 41.781635, 25.87617, 16.368034, 12.445256, 11.432039, 11.273733, 11.2540865, 11.245658, 11.23651, 11.226537, 11.21602, 11.205081, 11.1938, 11.182244, 11.170487, 11.158597, 11.146648, 11.134712, 11.1228485, 11.111124, 11.09959, 11.088282, 11.077237, 11.066508, 11.056091, 11.0460205, 11.036299], 'val_losses': [90.79514, 71.904106, 50.739506, 31.959602, 18.997303, 12.570291, 10.521917, 10.129151, 10.078239, 10.06737, 10.058612, 10.049304, 10.039502, 10.02932, 10.018826, 10.008089, 9.997165, 9.986119, 9.975015, 9.963911, 9.952863, 9.941925, 9.931147, 9.920564, 9.910216, 9.900133, 9.890341, 9.880853, 9.871686, 9.862852], 'epochs': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29], 'meansquarederror': 9.004641, 'val_meansquarederror': 8.18518, 'meanabsoluteerror': 2.1914637, 'val_meanabsoluteerror': 2.1028595}


## Evaluate the Model

In [12]:
eval_dict = yml.eval_model(
    model,
    yml_dict,
    dataset=ds_dict["test"]
)
print(eval_dict)

eval_logger : INFO     params loaded from yeahml/abalone/trial_00/model/run_2020_03_11-17_03_10/save/params/best_params.h5
eval_logger : INFO     -> START evaluating model
eval_logger : INFO     [END] evaluating model
eval_logger : INFO     -> START creating eval_dict
eval_logger : INFO     [END] creating eval_dict


{'meansquarederror': 6.8508453, 'meanabsoluteerror': 1.9398348}


## Inspect model in Tensorflow

In the command line you can navigate to the `albalone` directory and run: (provided tensorboard is installed in your environment)

```bash
tensorboard --logdir model_a/
```