# Your First Image Classifier: Using MLP to Classify Images
# Train

The purpose of this dataset is to correctly classify an image as containing a dog, cat, or panda.
Containing only 3,000 images, the Animals dataset is meant to be another **introductory** dataset
that we can quickly train a MLP model and obtain a comparative results.


Let's take the following steps:

1. Encoding target variable
2. Training the MLP model
3. Export the model and the encoder object

<center><img width="900" src="https://drive.google.com/uc?export=view&id=1haMB_Zt6Et9q9sPHxfuR4g3FT5QRXlTI"></center>


## Step 01: Setup

Start out by installing the experiment tracking library and setting up your free W&B account:


*   **pip install wandb** – Install the W&B library
*   **import wandb** – Import the wandb library
*   **wandb login** – Login to your W&B account so you can log all your metrics in one place

In [1]:
!pip install wandb -qU

In [2]:
# a Python package for tracking the carbon emissions produced by various
# kinds of computer programs, from straightforward algorithms to deep neural networks.
!pip install codecarbon

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


### Import Packages

In [3]:
# import the necessary packages
from imutils import paths
import logging
import os
import cv2
import numpy as np
import joblib
from codecarbon import EmissionsTracker
from sklearn.preprocessing import LabelEncoder
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import fbeta_score, precision_score, recall_score, accuracy_score
import wandb

In [4]:
wandb.login()

ERROR:wandb.jupyter:Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33mthaisaraujom[0m. Use [1m`wandb login --relogin`[0m to force relogin


True

In [5]:
# configure logging
# reference for a logging obj
logger = logging.getLogger()

# set level of logging
logger.setLevel(logging.INFO)

# create handlers
c_handler = logging.StreamHandler()
c_format = logging.Formatter(fmt="%(asctime)s %(message)s",datefmt='%d-%m-%Y %H:%M:%S')
c_handler.setFormatter(c_format)

# add handler to the logger
logger.handlers[0] = c_handler

## Step 02 Basic configuration and download artifacts

In [6]:
# since we are using Jupyter Notebooks we can replace our argument
# parsing code with *hard coded* arguments and values
args = {
  "project_name": "mlp_classifier",
  "train_feature_artifact": "train_x:latest",
  "train_target_artifact": "train_y:latest",
  "val_feature_artifact": "val_x:latest",
  "val_target_artifact": "val_y:latest",
  "neighbors": 1,
  "jobs": -1,
  "encoder": "target_encoder",
  "inference_model": "model"
}

In [7]:
# open the W&B project created in the Fetch step
run = wandb.init(entity="thaisaraujom",project=args["project_name"], job_type="Train")

logger.info("Downloading the train and validation data")
# train x
train_x_artifact = run.use_artifact(args["train_feature_artifact"])
train_x_path = train_x_artifact.file()

# train y
train_y_artifact = run.use_artifact(args["train_target_artifact"])
train_y_path = train_y_artifact.file()

# validation x
val_x_artifact = run.use_artifact(args["val_feature_artifact"])
val_x_path = val_x_artifact.file()

# validation y
val_y_artifact = run.use_artifact(args["val_target_artifact"])
val_y_path = val_y_artifact.file()

# unpacking the artifacts
train_x = joblib.load(train_x_path)
train_y = joblib.load(train_y_path)
val_x = joblib.load(val_x_path)
val_y = joblib.load(val_y_path)

18-10-2022 01:01:44 Downloading the train and validation data


In [8]:
logger.info("Train x: {}".format(train_x.shape))
logger.info("Train y: {}".format(train_y.shape))
logger.info("Validation x: {}".format(val_x.shape))
logger.info("Validation y: {}".format(val_y.shape))

18-10-2022 01:01:50 Train x: (1687, 3072)
18-10-2022 01:01:50 Train y: (1687,)
18-10-2022 01:01:50 Validation x: (563, 3072)
18-10-2022 01:01:50 Validation y: (563,)


## Step 03: Encoder

In [9]:
# encode the labels as integers
le = LabelEncoder()
train_y = le.fit_transform(train_y)

val_y = le.transform(val_y)

In [10]:
# train a MLP classifier
logger.info("[INFO] training MLP classifier...")
model = MLPClassifier(hidden_layer_sizes=(100,100), activation='relu', solver='adam')
model.fit(train_x, train_y)

18-10-2022 01:01:50 [INFO] training MLP classifier...


MLPClassifier(hidden_layer_sizes=(100, 100))

In [11]:
logger.info("Dumping the model and encoder artifacts to the disk")

# Save the artifacts using joblib
joblib.dump(le, args["encoder"])
joblib.dump(model, args["inference_model"])

18-10-2022 01:02:14 Dumping the model and encoder artifacts to the disk


['model']

In [12]:
# encoder artifact
artifact = wandb.Artifact(args["encoder"],
                          type="INFERENCE_MODEL",
                          description="A json file representing the target encoder"
                          )

logger.info("Logging the target encoder artifact")
artifact.add_file(args["encoder"])
run.log_artifact(artifact)

18-10-2022 01:02:14 Logging the target encoder artifact


<wandb.sdk.wandb_artifacts.Artifact at 0x7f7edb58cbd0>

In [13]:
# inference model artifact
artifact = wandb.Artifact(args["inference_model"],
                          type="INFERENCE_MODEL",
                          description="A json file representing the inference model"
                          )

logger.info("Logging the inference model artifact")
artifact.add_file(args["inference_model"])
run.log_artifact(artifact)

18-10-2022 01:02:14 Logging the inference model artifact


<wandb.sdk.wandb_artifacts.Artifact at 0x7f7ed8f54a90>

In [14]:
run.finish()

## Step 04: Sweep (hyperparameter tuning)

### Sweep setup

ℹ️ [Reference](https://docs.wandb.ai/guides/sweeps/define-sweep-configuration)

**Sweep configuration structure**

Sweep configurations are nested; keys can have, as their values, further keys. The top-level keys are listed and briefly described below, and then detailed in the following section.

| Top-Level Key | Description                                         |
|---------------|-----------------------------------------------------|
| **program**       | (required) Training script to run.                  |
| **method**        | (required) Specify the <br>search strategy.         |
| **parameters**    | (required) Specify <br>parameters bounds to search. |

<br>

**Search type methods**

The following list describes hyperparameter search methods. Specify the search strategy with the **method**:

- **grid**  – Iterate over every combination of hyperparameter values. Can be computationally costly.
- **random**  – Choose a random set of hyperparameter values on each iteration based on provided distributions.
- **bayes** – Create a probabilistic model of a metric score as a function of the hyperparameters, and choose parameters with high probability of improving the metric. 
<br>

**Metric**

Describes the metric to optimize. This metric should be logged **explicitly** to W&B by your training script.

| Key    | Description |
|--------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **name**   | Name of the metric to optimize.|
| **goal**   | Either minimize or  maximize (Default is <br>minimize.|
| **target** | Goal value for the metric you're optimizing. <br>When any run in the sweep achieves that target value,<br> the sweep's state will be set to finished. <br>This means all agents with active runs will <br>finish those jobs, but no new runs will <br>be launched in the sweep. |

In [15]:
# Configure the sweep 
sweep_config = {
    'method': 'random', 
    'metric': {
      'name': 'accuracy',
      'goal': 'maximize'   
    },
    'parameters': {
        'hidden_layer_sizes': {
            "values":[(100,100), (100, 100)],
        },
        'activation': {
            'values': ['relu']
        },
        'solver': {
            'values': ['adam']
        },
        'learning_rate' : {
            'values': ['constant','adaptive']
        }
    }
}

In [16]:
sweep_id = wandb.sweep(sweep_config, project=args['project_name'])

Create sweep with ID: 31jxcg4q
Sweep URL: https://wandb.ai/thaisaraujom/mlp_classifier/sweeps/31jxcg4q


### Training

In [17]:
def train():
    with wandb.init() as run:
        # create codecarbon tracker
        # codecarbon is too much verbose, change the log level for more info
        tracker = EmissionsTracker(log_level="critical")
        tracker.start()
        model = MLPClassifier(hidden_layer_sizes=run.config.hidden_layer_sizes,
                              activation=run.config.activation,
                              solver=run.config.solver, 
                              learning_rate=run.config.learning_rate)
        # training
        logger.info("Training")
        model.fit(train_x,train_y)

        # infering
        logger.info("Infering")
        predict = model.predict(val_x)

        # get co2 emissions from tracker
        # "CO2 emission (in Kg)"
        emissions = tracker.stop()

        # Evaluation Metrics
        logger.info("Evaluation metrics")
        fbeta = fbeta_score(val_y, 
                            predict, 
                            beta=1, 
                            zero_division=1,
                            average='weighted')
        precision = precision_score(val_y, 
                                    predict, 
                                    zero_division=1,
                                    average='weighted')
        recall = recall_score(val_y, 
                              predict, 
                              zero_division=1,
                              average='weighted')

        acc = accuracy_score(val_y, predict)

        logger.info("Val Accuracy: {}".format(acc))
        logger.info("Val Precision: {}".format(precision))
        logger.info("Val Recall: {}".format(recall))
        logger.info("Val F1: {}".format(fbeta))

        run.summary["Acc"] = acc
        run.summary["Precision"] = precision
        run.summary["Recall"] = recall
        run.summary["F1"] = fbeta

        # energy unit is kWh
        run.summary["Energy_Consumed"] = tracker.final_emissions_data.energy_consumed
        run.summary["Energy_RAM"] = tracker.final_emissions_data.ram_energy
        run.summary["Energy_GPU"] = tracker.final_emissions_data.gpu_energy
        run.summary["Energy_CPU"] = tracker.final_emissions_data.cpu_energy
        # kg
        run.summary["CO2_Emissions"] = tracker.final_emissions_data.emissions


In [18]:
# Initialize a new sweep
# Arguments:
#     – sweep_id: the sweep_id to run - this was returned above by wandb.sweep()
#     – function: function that defines your model architecture and trains it
wandb.agent(sweep_id = sweep_id, function=train, count=10)

[34m[1mwandb[0m: Agent Starting Run: gnc1mltc with config:
[34m[1mwandb[0m: 	activation: relu
[34m[1mwandb[0m: 	hidden_layer_sizes: [100, 100]
[34m[1mwandb[0m: 	learning_rate: adaptive
[34m[1mwandb[0m: 	solver: adam
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


18-10-2022 01:02:32 Training
18-10-2022 01:02:36 Infering
18-10-2022 01:02:37 Evaluation metrics
18-10-2022 01:02:37 Val Accuracy: 0.4422735346358792
18-10-2022 01:02:37 Val Precision: 0.5943067785176246
18-10-2022 01:02:37 Val Recall: 0.4422735346358792
18-10-2022 01:02:37 Val F1: 0.44720845551983757


VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
Acc,0.44227
CO2_Emissions,5e-05
Energy_CPU,6e-05
Energy_Consumed,8e-05
Energy_GPU,1e-05
Energy_RAM,1e-05
F1,0.44721
Precision,0.59431
Recall,0.44227


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: vdtxiah8 with config:
[34m[1mwandb[0m: 	activation: relu
[34m[1mwandb[0m: 	hidden_layer_sizes: [100, 100]
[34m[1mwandb[0m: 	learning_rate: adaptive
[34m[1mwandb[0m: 	solver: adam
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


18-10-2022 01:02:57 Training
18-10-2022 01:03:10 Infering
18-10-2022 01:03:10 Evaluation metrics
18-10-2022 01:03:10 Val Accuracy: 0.49023090586145646
18-10-2022 01:03:10 Val Precision: 0.5113086912000974
18-10-2022 01:03:10 Val Recall: 0.49023090586145646
18-10-2022 01:03:10 Val F1: 0.42336705541449793


VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
Acc,0.49023
CO2_Emissions,0.00012
Energy_CPU,0.00015
Energy_Consumed,0.00019
Energy_GPU,3e-05
Energy_RAM,2e-05
F1,0.42337
Precision,0.51131
Recall,0.49023


[34m[1mwandb[0m: Agent Starting Run: 8d51moq7 with config:
[34m[1mwandb[0m: 	activation: relu
[34m[1mwandb[0m: 	hidden_layer_sizes: [100, 100]
[34m[1mwandb[0m: 	learning_rate: constant
[34m[1mwandb[0m: 	solver: adam
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


18-10-2022 01:03:25 Training
18-10-2022 01:03:32 Infering
18-10-2022 01:03:32 Evaluation metrics
18-10-2022 01:03:32 Val Accuracy: 0.3907637655417407
18-10-2022 01:03:32 Val Precision: 0.5807930827839765
18-10-2022 01:03:32 Val Recall: 0.3907637655417407
18-10-2022 01:03:32 Val F1: 0.3808553769388578


VBox(children=(Label(value='0.000 MB of 0.009 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=0.050425…

0,1
Acc,0.39076
CO2_Emissions,7e-05
Energy_CPU,9e-05
Energy_Consumed,0.00012
Energy_GPU,2e-05
Energy_RAM,1e-05
F1,0.38086
Precision,0.58079
Recall,0.39076


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: r8qc56cv with config:
[34m[1mwandb[0m: 	activation: relu
[34m[1mwandb[0m: 	hidden_layer_sizes: [100, 100]
[34m[1mwandb[0m: 	learning_rate: adaptive
[34m[1mwandb[0m: 	solver: adam
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


18-10-2022 01:03:53 Training
18-10-2022 01:04:01 Infering
18-10-2022 01:04:01 Evaluation metrics
18-10-2022 01:04:01 Val Accuracy: 0.49733570159857904
18-10-2022 01:04:01 Val Precision: 0.5676147641186415
18-10-2022 01:04:01 Val Recall: 0.49733570159857904
18-10-2022 01:04:01 Val F1: 0.49205670119926226


0,1
Acc,0.49734
CO2_Emissions,8e-05
Energy_CPU,9e-05
Energy_Consumed,0.00012
Energy_GPU,2e-05
Energy_RAM,1e-05
F1,0.49206
Precision,0.56761
Recall,0.49734


[34m[1mwandb[0m: Agent Starting Run: e6l7ih10 with config:
[34m[1mwandb[0m: 	activation: relu
[34m[1mwandb[0m: 	hidden_layer_sizes: [100, 100]
[34m[1mwandb[0m: 	learning_rate: constant
[34m[1mwandb[0m: 	solver: adam
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


18-10-2022 01:04:16 Training
18-10-2022 01:04:29 Infering
18-10-2022 01:04:29 Evaluation metrics
18-10-2022 01:04:29 Val Accuracy: 0.5150976909413855
18-10-2022 01:04:29 Val Precision: 0.5204372982097921
18-10-2022 01:04:29 Val Recall: 0.5150976909413855
18-10-2022 01:04:29 Val F1: 0.5105568082195162


VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
Acc,0.5151
CO2_Emissions,0.00012
Energy_CPU,0.00015
Energy_Consumed,0.0002
Energy_GPU,3e-05
Energy_RAM,2e-05
F1,0.51056
Precision,0.52044
Recall,0.5151


[34m[1mwandb[0m: Agent Starting Run: 9s760goi with config:
[34m[1mwandb[0m: 	activation: relu
[34m[1mwandb[0m: 	hidden_layer_sizes: [100, 100]
[34m[1mwandb[0m: 	learning_rate: adaptive
[34m[1mwandb[0m: 	solver: adam
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


18-10-2022 01:04:43 Training
18-10-2022 01:04:48 Infering
18-10-2022 01:04:48 Evaluation metrics
18-10-2022 01:04:48 Val Accuracy: 0.566607460035524
18-10-2022 01:04:48 Val Precision: 0.5951154529307283
18-10-2022 01:04:48 Val Recall: 0.566607460035524
18-10-2022 01:04:48 Val F1: 0.47929460707688526


VBox(children=(Label(value='0.000 MB of 0.009 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=0.050436…

0,1
Acc,0.56661
CO2_Emissions,5e-05
Energy_CPU,6e-05
Energy_Consumed,8e-05
Energy_GPU,1e-05
Energy_RAM,1e-05
F1,0.47929
Precision,0.59512
Recall,0.56661


[34m[1mwandb[0m: Agent Starting Run: ag224kuh with config:
[34m[1mwandb[0m: 	activation: relu
[34m[1mwandb[0m: 	hidden_layer_sizes: [100, 100]
[34m[1mwandb[0m: 	learning_rate: constant
[34m[1mwandb[0m: 	solver: adam
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


18-10-2022 01:05:04 Training
18-10-2022 01:05:15 Infering
18-10-2022 01:05:15 Evaluation metrics
18-10-2022 01:05:15 Val Accuracy: 0.5257548845470693
18-10-2022 01:05:15 Val Precision: 0.5407653076960758
18-10-2022 01:05:15 Val Recall: 0.5257548845470693
18-10-2022 01:05:15 Val F1: 0.4992027769596058


VBox(children=(Label(value='0.000 MB of 0.009 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=0.050409…

0,1
Acc,0.52575
CO2_Emissions,0.00011
Energy_CPU,0.00013
Energy_Consumed,0.00018
Energy_GPU,3e-05
Energy_RAM,1e-05
F1,0.4992
Precision,0.54077
Recall,0.52575


[34m[1mwandb[0m: Agent Starting Run: 0w4adt1b with config:
[34m[1mwandb[0m: 	activation: relu
[34m[1mwandb[0m: 	hidden_layer_sizes: [100, 100]
[34m[1mwandb[0m: 	learning_rate: adaptive
[34m[1mwandb[0m: 	solver: adam
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


18-10-2022 01:05:31 Training
18-10-2022 01:05:37 Infering
18-10-2022 01:05:37 Evaluation metrics
18-10-2022 01:05:37 Val Accuracy: 0.5115452930728241
18-10-2022 01:05:37 Val Precision: 0.5534158381582893
18-10-2022 01:05:37 Val Recall: 0.5115452930728241
18-10-2022 01:05:37 Val F1: 0.47867477478209824


VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
Acc,0.51155
CO2_Emissions,6e-05
Energy_CPU,7e-05
Energy_Consumed,0.0001
Energy_GPU,2e-05
Energy_RAM,1e-05
F1,0.47867
Precision,0.55342
Recall,0.51155


[34m[1mwandb[0m: Agent Starting Run: 8b0v0r3u with config:
[34m[1mwandb[0m: 	activation: relu
[34m[1mwandb[0m: 	hidden_layer_sizes: [100, 100]
[34m[1mwandb[0m: 	learning_rate: constant
[34m[1mwandb[0m: 	solver: adam
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


18-10-2022 01:05:52 Training
18-10-2022 01:06:02 Infering
18-10-2022 01:06:02 Evaluation metrics
18-10-2022 01:06:02 Val Accuracy: 0.566607460035524
18-10-2022 01:06:02 Val Precision: 0.5731509323590956
18-10-2022 01:06:02 Val Recall: 0.566607460035524
18-10-2022 01:06:02 Val F1: 0.5656522817366586


0,1
Acc,0.56661
CO2_Emissions,0.0001
Energy_CPU,0.00011
Energy_Consumed,0.00015
Energy_GPU,3e-05
Energy_RAM,1e-05
F1,0.56565
Precision,0.57315
Recall,0.56661


[34m[1mwandb[0m: Agent Starting Run: jgg6hf6t with config:
[34m[1mwandb[0m: 	activation: relu
[34m[1mwandb[0m: 	hidden_layer_sizes: [100, 100]
[34m[1mwandb[0m: 	learning_rate: constant
[34m[1mwandb[0m: 	solver: adam
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


18-10-2022 01:06:18 Training
18-10-2022 01:06:28 Infering
18-10-2022 01:06:28 Evaluation metrics
18-10-2022 01:06:28 Val Accuracy: 0.477797513321492
18-10-2022 01:06:28 Val Precision: 0.5451274580240218
18-10-2022 01:06:28 Val Recall: 0.477797513321492
18-10-2022 01:06:28 Val F1: 0.4711604505548896


VBox(children=(Label(value='0.000 MB of 0.009 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=0.050425…

0,1
Acc,0.4778
CO2_Emissions,9e-05
Energy_CPU,0.00011
Energy_Consumed,0.00015
Energy_GPU,2e-05
Energy_RAM,1e-05
F1,0.47116
Precision,0.54513
Recall,0.4778


In [19]:
run.finish()