# Train two image classification models in one pipeline
In this sample,  we train two simple CNN image classification models. 

First, we build a custom environment from scratch that we'll use for training. Then, we'll write a pipeline to prepare two datasets from the CIFAR10 image dataset, train the models using Keras, and then test and score the models.

We deploy these models in [Deploy two models in one online deployment](). 

# Build the environment
To build the environment, we start with the Azure minimal Ubuntu 18.04 image. We install the additional libraries `libpng` via apt and packages `tensorflow`, `pillow`, and `keras-preprocessing` via pip.
## Inputs
### Dockerfile
```dockerfile
FROM "mcr.microsoft.com/azureml/minimal-ubuntu18.04-py37-cpu-inference:latest"

USER root
RUN apt-get update
RUN apt-get install -y p7zip-full libpng-dev

USER dockeruser

COPY requirements.txt /tmp/requirements.txt

RUN pip install -r /tmp/requirements.txt
```
### requirements.txt
```
tensorflow-gpu>=2.8, <2.9
keras>= 2.8, <2.9
pillow>= 9.1, <9.2
keras-preprocessing>=1.1, <1.2
``` 

## Build
To build the environment, will call `az` with the environment file `environment.yaml` which contains the name of the environment `dualdeployment` as well as the directory in which the Dockerfile and `requirements.txt` file is located. 

```yml
$schema: https://azuremlschemas.azureedge.net/latest/environment.schema.json
name: dualdeployment
build:
  path: ./deploy
```
Build the environment with the following command:

In [None]:
!az ml environment create --file environment.yaml

# Model Pipeline
Next, we assemble the code and YAML files to prepare the dataset, train the models, and test and score them. The complete pipeline job YAML file is reproduced at the end of this section. 
## Data Prep
Our data comes from the [CIFAR10 dataset](https://www.cs.toronto.edu/~kriz/cifar.html) which consists of 32x32 images divided into 10 classes. We use the [cifar10](https://pypi.org/project/cifar10/) Python wrapper to download the images and generate balanced test/train split datasets - one for cats and another for horses. We perform the data preprocessing in the `prep.py` file which is launched via the command job `prep-job` in `pipeline.yaml`. 

### prep.py (snippet)
```python
# Default parameters
output_path = "prep"
categories = {'cats' : 3, 'horses' : 7}
train_size = .7

# Generate dataset from CIFAR10 with test train split for one category
def make_dataset(category_id, train_size=.7):
    target = [(x[0], 1) for x in cifar10.data_batch_generator() if x[1] == category_id]
    not_target = [x for x in cifar10.data_batch_generator() if x[1] != category_id]
    ds = target + [(x[0], 0) for x in random.sample(not_target, len(target))] 
    random.shuffle(ds)
    train_idxs = np.array(random.sample(range(len(ds)), int(len(ds)*train_size)))
    ds_x = np.stack([x[0] for x in ds])
    ds_y = np.array([x[1] for x in ds])
    train = {'x': ds_x[train_idxs], 'y': ds_y[train_idxs]}
    test = {'x': np.delete(ds_x, train_idxs, 0), 'y': np.delete(ds_y, train_idxs, 0)}
    return {'train': train, 'test': test}

datasets = {name : make_dataset(category_id, train_size) for name, category_id in categories.items()}
output_path = Path(output_path)
if os.path.exists(output_path):
    shutil.rmtree(output_path)
for category, split in itertools.product(datasets.keys(), ('train','test')):
    pth = output_path / category
    try: 
        os.makedirs(pth)
    except FileExistsError:
        pass
    pickle.dump(datasets[category][split], open(pth / split, 'wb'))
```
### pipeline.yaml (snippet)
The pipeline mounts storage for `prep.py` to output training and test data and passes the paths as arguments.
```yaml
  prep-job:
    type: command
    outputs:
      train_data: 
        mode: upload
      test_data: 
        mode: upload
    code: src/prep
    environment: azureml:dualdeployment:1
    compute: azureml:dualdeploymentcompute
    command: >-
      python prep.py 
      --training_data ${{outputs.train_data}}
      --test_data ${{outputs.test_data}}
```


## Train
In the `train.py` file, we build a simple CNN in Keras and save the model to the `model` output directory. 
### train.py (snippet)
```python
def make_model():
    resize_rescale = k.Sequential([
        k.layers.Resizing(32, 32),
        k.layers.Rescaling(1./255)
    ])
    data_aug = k.Sequential([
        k.layers.RandomFlip(),
        k.layers.RandomContrast(.2)
    ])
    model = k.Sequential([
        resize_rescale, 
        data_aug,
        k.layers.Conv2D(32, 3, activation='relu', padding='same'),
        k.layers.Conv2D(32, 3, activation='relu', padding='same'), 
        k.layers.MaxPooling2D(2),
        k.layers.BatchNormalization(),
        k.layers.Conv2D(64, 3, activation='relu', padding='same'),
        k.layers.Conv2D(64, 3, activation='relu', padding='same'), 
        k.layers.MaxPooling2D(2),
        k.layers.BatchNormalization(),
        k.layers.Conv2D(128, 3, activation='relu', padding='same'),
        k.layers.Conv2D(128, 3, activation='relu', padding='same'),
        k.layers.MaxPooling2D(2),
        k.layers.BatchNormalization(),
        k.layers.Flatten(),
        k.layers.Dense(128, activation = 'relu'),
        k.layers.Dropout(0.1),
        k.layers.Dense(1, activation='sigmoid'),
    ])
    model.compile(loss=k.losses.binary_crossentropy, 
                  optimizer=k.optimizers.Adam())
    return model

def run():
    tf.keras.backend.clear_session()
    tn_X = tf.convert_to_tensor(dataset['x'])
    tn_Y = tf.convert_to_tensor(dataset['y'])
    model = make_model()
    model.fit(tn_X, tn_Y, batch_size=batch_size,epochs=epochs)
    return model
```
### pipeline.yaml (snippet)
The `train.py` file operates on one model at a time, so we include separate command jobs for the horse and cat models.
```yaml
  cats-train-job:
    type: command
    inputs:
      train_data: ${{parent.jobs.prep-job.outputs.train_data}}
    outputs:
      model: 
        mode: upload
    code: src/prep
    environment: azureml:dualdeployment:1
    compute: azureml:dualdeploymentcompute
    command: >-
      python train.py 
      --train_data ${{inputs.train_data}}
      --model ${{outputs.model}}
  horses-train-job:
    type: command
    inputs:
      train_data: ${{parent.jobs.prep-job.outputs.train_data}}
    outputs:
      model: 
        mode: upload
    code: src/prep
    environment: azureml:dualdeployment:1
    compute: azureml:dualdeploymentcompute
    command: >-
      python train.py 
      --train_data ${{inputs.train_data}}
      --model ${{outputs.model}}
```

## Predict

## Score

# Full pipeline

# Run
To build and run the pipeline, run the following command: 

In [None]:
!az ml job create --file pipeline.yaml