# Your First Image Classifier: Using MLP to Classify Images
# Train

The purpose of this dataset is to correctly classify an image as containing a dog, cat, or panda.
Containing only 3,000 images, the Animals dataset is meant to be another **introductory** dataset
that we can quickly train a MLP model and obtain a comparative results.


Let's take the following steps:

1. Encoding target variable
2. Training the MLP model
3. Export the model and the encoder object

<center><img width="900" src="https://drive.google.com/uc?export=view&id=1haMB_Zt6Et9q9sPHxfuR4g3FT5QRXlTI"></center>


## Step 01: Setup

Start out by installing the experiment tracking library and setting up your free W&B account:


*   **pip install wandb** – Install the W&B library
*   **import wandb** – Import the wandb library
*   **wandb login** – Login to your W&B account so you can log all your metrics in one place

In [1]:
!pip install wandb -qU

In [2]:
# a Python package for tracking the carbon emissions produced by various
# kinds of computer programs, from straightforward algorithms to deep neural networks.
!pip install codecarbon

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


### Import Packages

In [3]:
# import the necessary packages
from imutils import paths
import logging
import os
import cv2
import numpy as np
import joblib
from codecarbon import EmissionsTracker
from sklearn.preprocessing import LabelEncoder
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import fbeta_score, precision_score, recall_score, accuracy_score
import wandb

In [4]:
wandb.login()

ERROR:wandb.jupyter:Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33mthaisaraujom[0m. Use [1m`wandb login --relogin`[0m to force relogin


True

In [5]:
# configure logging
# reference for a logging obj
logger = logging.getLogger()

# set level of logging
logger.setLevel(logging.INFO)

# create handlers
c_handler = logging.StreamHandler()
c_format = logging.Formatter(fmt="%(asctime)s %(message)s",datefmt='%d-%m-%Y %H:%M:%S')
c_handler.setFormatter(c_format)

# add handler to the logger
logger.handlers[0] = c_handler

## Step 02 Basic configuration and download artifacts

In [6]:
# since we are using Jupyter Notebooks we can replace our argument
# parsing code with *hard coded* arguments and values
args = {
  "project_name": "mlp_classifier",
  "train_feature_artifact": "train_x:latest",
  "train_target_artifact": "train_y:latest",
  "val_feature_artifact": "val_x:latest",
  "val_target_artifact": "val_y:latest",
  "neighbors": 1,
  "jobs": -1,
  "encoder": "target_encoder",
  "inference_model": "model"
}

In [7]:
# open the W&B project created in the Fetch step
run = wandb.init(entity="thaisaraujom",project=args["project_name"], job_type="Train")

logger.info("Downloading the train and validation data")
# train x
train_x_artifact = run.use_artifact(args["train_feature_artifact"])
train_x_path = train_x_artifact.file()

# train y
train_y_artifact = run.use_artifact(args["train_target_artifact"])
train_y_path = train_y_artifact.file()

# validation x
val_x_artifact = run.use_artifact(args["val_feature_artifact"])
val_x_path = val_x_artifact.file()

# validation y
val_y_artifact = run.use_artifact(args["val_target_artifact"])
val_y_path = val_y_artifact.file()

# unpacking the artifacts
train_x = joblib.load(train_x_path)
train_y = joblib.load(train_y_path)
val_x = joblib.load(val_x_path)
val_y = joblib.load(val_y_path)

17-10-2022 01:11:56 Downloading the train and validation data


In [8]:
logger.info("Train x: {}".format(train_x.shape))
logger.info("Train y: {}".format(train_y.shape))
logger.info("Validation x: {}".format(val_x.shape))
logger.info("Validation y: {}".format(val_y.shape))

17-10-2022 01:11:58 Train x: (1687, 3072)
17-10-2022 01:11:58 Train y: (1687,)
17-10-2022 01:11:58 Validation x: (563, 3072)
17-10-2022 01:11:58 Validation y: (563,)


## Step 03: Encoder

In [9]:
# encode the labels as integers
le = LabelEncoder()
train_y = le.fit_transform(train_y)

val_y = le.transform(val_y)

In [10]:
# train a MLP classifier
logger.info("[INFO] training MLP classifier...")
model = MLPClassifier(hidden_layer_sizes=(100,100), activation='relu', solver='adam')
model.fit(train_x, train_y)

17-10-2022 01:11:58 [INFO] training MLP classifier...


MLPClassifier(hidden_layer_sizes=(100, 100))

In [11]:
logger.info("Dumping the model and encoder artifacts to the disk")

# Save the artifacts using joblib
joblib.dump(le, args["encoder"])
joblib.dump(model, args["inference_model"])

17-10-2022 01:12:33 Dumping the model and encoder artifacts to the disk


['model']

In [12]:
# encoder artifact
artifact = wandb.Artifact(args["encoder"],
                          type="INFERENCE_MODEL",
                          description="A json file representing the target encoder"
                          )

logger.info("Logging the target encoder artifact")
artifact.add_file(args["encoder"])
run.log_artifact(artifact)

17-10-2022 01:12:33 Logging the target encoder artifact


<wandb.sdk.wandb_artifacts.Artifact at 0x7f32f5ab1850>

In [13]:
# inference model artifact
artifact = wandb.Artifact(args["inference_model"],
                          type="INFERENCE_MODEL",
                          description="A json file representing the inference model"
                          )

logger.info("Logging the inference model artifact")
artifact.add_file(args["inference_model"])
run.log_artifact(artifact)

17-10-2022 01:12:33 Logging the inference model artifact


<wandb.sdk.wandb_artifacts.Artifact at 0x7f32f21f9890>

## Step 04: Sweep (hyperparameter tuning)

### Sweep setup

ℹ️ [Reference](https://docs.wandb.ai/guides/sweeps/define-sweep-configuration)

**Sweep configuration structure**

Sweep configurations are nested; keys can have, as their values, further keys. The top-level keys are listed and briefly described below, and then detailed in the following section.

| Top-Level Key | Description                                         |
|---------------|-----------------------------------------------------|
| **program**       | (required) Training script to run.                  |
| **method**        | (required) Specify the <br>search strategy.         |
| **parameters**    | (required) Specify <br>parameters bounds to search. |

<br>

**Search type methods**

The following list describes hyperparameter search methods. Specify the search strategy with the **method**:

- **grid**  – Iterate over every combination of hyperparameter values. Can be computationally costly.
- **random**  – Choose a random set of hyperparameter values on each iteration based on provided distributions.
- **bayes** – Create a probabilistic model of a metric score as a function of the hyperparameters, and choose parameters with high probability of improving the metric. 
<br>

**Metric**

Describes the metric to optimize. This metric should be logged **explicitly** to W&B by your training script.

| Key    | Description |
|--------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **name**   | Name of the metric to optimize.|
| **goal**   | Either minimize or  maximize (Default is <br>minimize.|
| **target** | Goal value for the metric you're optimizing. <br>When any run in the sweep achieves that target value,<br> the sweep's state will be set to finished. <br>This means all agents with active runs will <br>finish those jobs, but no new runs will <br>be launched in the sweep. |

In [14]:
# Configure the sweep 
sweep_config = {
    'name': 'sweep_mlp',
    'method': 'grid', 
    'metric': {
      'name': 'accuracy',
      'goal': 'maximize'   
    },
    'parameters': {
        'hidden_layer_sizes': {
            "values":[(100,100), (200, 200)],
        },
        'activation': {
            'values': ['relu']
        },
        'solver': {
            'values': ['adam']
        },
        'learning_rate' : {
            'values': ['constant','adaptive']
        }
    }
}

In [15]:
sweep_id = wandb.sweep(sweep_config, project=args['project_name'])

Create sweep with ID: 13ezjnkq
Sweep URL: https://wandb.ai/thaisaraujom/mlp_classifier/sweeps/13ezjnkq


### Training

In [16]:
def train():
    with wandb.init() as run:
        # create codecarbon tracker
        # codecarbon is too much verbose, change the log level for more info
        # tracker = EmissionsTracker(log_level="critical")
        # tracker.start()
        model = MLPClassifier(hidden_layer_sizes=run.config.hidden_layer_sizes,
                              activation=run.config.activation,
                              solver=run.config.solver, 
                              learning_rate=run.config.learning_rate)
        # training
        logger.info("Training")
        model.fit(train_x,train_y)

        # infering
        logger.info("Infering")
        predict = model.predict(val_x)

        # get co2 emissions from tracker
        # "CO2 emission (in Kg)"
        # emissions = tracker.stop()

        # Evaluation Metrics
        logger.info("Evaluation metrics")
        fbeta = fbeta_score(val_y, 
                            predict, 
                            beta=1, 
                            zero_division=1,
                            average='weighted')
        precision = precision_score(val_y, 
                                    predict, 
                                    zero_division=1,
                                    average='weighted')
        recall = recall_score(val_y, 
                              predict, 
                              zero_division=1,
                              average='weighted')

        acc = accuracy_score(val_y, predict)

        logger.info("Test Accuracy: {}".format(acc))
        logger.info("Test Precision: {}".format(precision))
        logger.info("Test Recall: {}".format(recall))
        logger.info("Test F1: {}".format(fbeta))

        run.summary["Acc"] = acc
        run.summary["Precision"] = precision
        run.summary["Recall"] = recall
        run.summary["F1"] = fbeta

        # energy unit is kWh
        # run.summary["Energy_Consumed"] = tracker.final_emissions_data.energy_consumed
        # run.summary["Energy_RAM"] = tracker.final_emissions_data.ram_energy
        # run.summary["Energy_GPU"] = tracker.final_emissions_data.gpu_energy
        # run.summary["Energy_CPU"] = tracker.final_emissions_data.cpu_energy
        # # kg
        # run.summary["CO2_Emissions"] = tracker.final_emissions_data.emissions


In [17]:
# Initialize a new sweep
# Arguments:
#     – sweep_id: the sweep_id to run - this was returned above by wandb.sweep()
#     – function: function that defines your model architecture and trains it
wandb.agent(sweep_id = sweep_id, function=train)

[34m[1mwandb[0m: Agent Starting Run: gql8t2iw with config:
[34m[1mwandb[0m: 	activation: relu
[34m[1mwandb[0m: 	hidden_layer_sizes: [100, 100]
[34m[1mwandb[0m: 	learning_rate: constant
[34m[1mwandb[0m: 	solver: adam


17-10-2022 01:12:44 Training
17-10-2022 01:12:57 Infering
17-10-2022 01:12:57 Evaluation metrics
17-10-2022 01:12:57 Test Accuracy: 0.5062166962699822
17-10-2022 01:12:57 Test Precision: 0.5330622895729342
17-10-2022 01:12:57 Test Recall: 0.5062166962699822
17-10-2022 01:12:57 Test F1: 0.48593099954641217


VBox(children=(Label(value='0.000 MB of 0.008 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=0.047015…

0,1
Acc,0.50622
F1,0.48593
Precision,0.53306
Recall,0.50622


[34m[1mwandb[0m: Agent Starting Run: bhf46haj with config:
[34m[1mwandb[0m: 	activation: relu
[34m[1mwandb[0m: 	hidden_layer_sizes: [100, 100]
[34m[1mwandb[0m: 	learning_rate: adaptive
[34m[1mwandb[0m: 	solver: adam


17-10-2022 01:13:09 Training
17-10-2022 01:13:23 Infering
17-10-2022 01:13:23 Evaluation metrics
17-10-2022 01:13:23 Test Accuracy: 0.5523978685612788
17-10-2022 01:13:23 Test Precision: 0.5609992494978712
17-10-2022 01:13:23 Test Recall: 0.5523978685612788
17-10-2022 01:13:23 Test F1: 0.5495677769739736


VBox(children=(Label(value='0.000 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=0.289764…

0,1
Acc,0.5524
F1,0.54957
Precision,0.561
Recall,0.5524


[34m[1mwandb[0m: Agent Starting Run: s3iru5ro with config:
[34m[1mwandb[0m: 	activation: relu
[34m[1mwandb[0m: 	hidden_layer_sizes: [200, 200]
[34m[1mwandb[0m: 	learning_rate: constant
[34m[1mwandb[0m: 	solver: adam


17-10-2022 01:13:35 Training
17-10-2022 01:13:56 Infering
17-10-2022 01:13:56 Evaluation metrics
17-10-2022 01:13:56 Test Accuracy: 0.5630550621669627
17-10-2022 01:13:56 Test Precision: 0.5615217916491777
17-10-2022 01:13:56 Test Recall: 0.5630550621669627
17-10-2022 01:13:56 Test F1: 0.5392526841521542


VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
Acc,0.56306
F1,0.53925
Precision,0.56152
Recall,0.56306


[34m[1mwandb[0m: Agent Starting Run: dhywl3p2 with config:
[34m[1mwandb[0m: 	activation: relu
[34m[1mwandb[0m: 	hidden_layer_sizes: [200, 200]
[34m[1mwandb[0m: 	learning_rate: adaptive
[34m[1mwandb[0m: 	solver: adam


17-10-2022 01:14:06 Training
17-10-2022 01:14:29 Infering
17-10-2022 01:14:29 Evaluation metrics
17-10-2022 01:14:29 Test Accuracy: 0.5310834813499112
17-10-2022 01:14:29 Test Precision: 0.5566116741651531
17-10-2022 01:14:29 Test Recall: 0.5310834813499112
17-10-2022 01:14:29 Test F1: 0.5265627623120022


VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
Acc,0.53108
F1,0.52656
Precision,0.55661
Recall,0.53108


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Sweep Agent: Exiting.


Error in callback <function _WandbInit._pause_backend at 0x7f32f63008c0> (for post_run_cell):


BrokenPipeError: ignored

In [35]:
run.finish()

Error in callback <function _WandbInit._resume_backend at 0x7f326ad263b0> (for pre_run_cell):


BrokenPipeError: ignored

BrokenPipeError: ignored

Error in callback <function _WandbInit._pause_backend at 0x7f326ad26440> (for post_run_cell):


BrokenPipeError: ignored