# Your First Image Classifier: Using CNN to Classify Images
# Data Segregation

The purpose of this dataset is to correctly classify an image as containing a dog, cat, or panda.
Containing only 3,000 images, the Animals dataset is meant to be another **introductory** dataset
that we can quickly train a CNN model and obtain a comparative results with the previous KNN model.

Let's take the following steps:

1. Data segregation
2. Split clean data into train, validation and test

<center><img width="900" src="https://drive.google.com/uc?export=view&id=1haMB_Zt6Et9q9sPHxfuR4g3FT5QRXlTI"></center>


## Step 01: Setup

Start out by installing the experiment tracking library and setting up your free W&B account:


*   **pip install wandb** – Install the W&B library
*   **import wandb** – Import the wandb library
*   **wandb login** – Login to your W&B account so you can log all your metrics in one place

In [8]:
!pip install wandb -qU

### Import Packages

In [9]:
# import the necessary packages
import logging
import joblib
from sklearn.model_selection import train_test_split
import wandb

In [10]:
wandb.login()



True

In [11]:
# configure logging
# reference for a logging obj
logger = logging.getLogger()

# set level of logging
logger.setLevel(logging.INFO)

# create handlers
c_handler = logging.StreamHandler()
c_format = logging.Formatter(fmt="%(asctime)s %(message)s",datefmt='%d-%m-%Y %H:%M:%S')
c_handler.setFormatter(c_format)

# add handler to the logger
logger.handlers[0] = c_handler

## Step 02 Data Segregation

In [12]:
# since we are using Jupyter Notebooks we can replace our argument
# parsing code with *hard coded* arguments and values
args = {
  "project_name": "cnn_classifier",
  "artifact_name_feature": "clean_features:latest",
  "artifact_name_target": "labels:latest",
  "train_feature_artifact": "train_x",
  "train_target_artifact": "train_y",
  "val_feature_artifact": "val_x",
  "val_target_artifact": "val_y",
  "test_feature_artifact": "test_x",
  "test_target_artifact": "test_y",
}

In [13]:
# open the W&B project created in the Fetch step
run = wandb.init(entity="thaisaraujom",project=args["project_name"], job_type="data_segregation")

logger.info("Downloading and reading clean data artifact")
clean_data = run.use_artifact(args["artifact_name_feature"])
clean_data_path = clean_data.file()

logger.info("Downloading and reading label data artifact")
label_data = run.use_artifact(args["artifact_name_target"])
label_data_path = label_data.file()

# unpacking the artifacts
data = joblib.load(clean_data_path)
label = joblib.load(label_data_path)

VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

16-10-2022 14:58:10 Downloading and reading clean data artifact
16-10-2022 14:58:14 Downloading and reading label data artifact



<center><img width="600" src="https://drive.google.com/uc?export=view&id=15ynGAo9KLIOB_6fNv5dh-hAS30YT_mMd"></center>

In [14]:
# partition the data into training, test splits using 75% of
# the data for training and the remaining 25% for test
(train_x, test_x, train_y, test_y) = train_test_split(data, label,test_size=0.25, random_state=42)

In [15]:
# partition the training into training, validation splits using 75% of
# the training set for training and the remaining 25% for validation
(train_x, val_x, train_y, val_y) = train_test_split(train_x, train_y,test_size=0.25, random_state=42)

In [16]:
logger.info("Train x: {}".format(train_x.shape))
logger.info("Train y: {}".format(train_y.shape))
logger.info("Validation x: {}".format(val_x.shape))
logger.info("Validation y: {}".format(val_y.shape))
logger.info("Test x: {}".format(test_x.shape))
logger.info("Test y: {}".format(test_y.shape))

16-10-2022 14:58:25 Train x: (1687, 32, 32, 3)
16-10-2022 14:58:25 Train y: (1687,)
16-10-2022 14:58:25 Validation x: (563, 32, 32, 3)
16-10-2022 14:58:25 Validation y: (563,)
16-10-2022 14:58:25 Test x: (750, 32, 32, 3)
16-10-2022 14:58:25 Test y: (750,)


In [17]:
# Save the artifacts using joblib
joblib.dump(train_x, args["train_feature_artifact"])
joblib.dump(train_y, args["train_target_artifact"])
joblib.dump(val_x, args["val_feature_artifact"])
joblib.dump(val_y, args["val_target_artifact"])
joblib.dump(test_x, args["test_feature_artifact"])
joblib.dump(test_y, args["test_target_artifact"])

logger.info("Dumping the train and validation data artifacts to the disk")

16-10-2022 14:58:30 Dumping the train and validation data artifacts to the disk


In [18]:
# train_x artifact
artifact = wandb.Artifact(args["train_feature_artifact"],
                          type="TRAIN_DATA",
                          description="A json file representing the train_x"
                          )

logger.info("Logging train_x artifact")
artifact.add_file(args["train_feature_artifact"])
run.log_artifact(artifact)

16-10-2022 14:58:30 Logging train_x artifact


<wandb.sdk.wandb_artifacts.Artifact at 0x7f33e2b73490>

In [19]:
# train_y artifact
artifact = wandb.Artifact(args["train_target_artifact"],
                          type="TRAIN_DATA",
                          description="A json file representing the train_y"
                          )

logger.info("Logging train_y artifact")
artifact.add_file(args["train_target_artifact"])
run.log_artifact(artifact)

16-10-2022 14:58:50 Logging train_y artifact


<wandb.sdk.wandb_artifacts.Artifact at 0x7f33e2027290>

In [20]:
# val_x artifact
artifact = wandb.Artifact(args["val_feature_artifact"],
                          type="VAL_DATA",
                          description="A json file representing the val_x"
                          )

logger.info("Logging val_x artifact")
artifact.add_file(args["val_feature_artifact"])
run.log_artifact(artifact)

16-10-2022 14:58:53 Logging val_x artifact


<wandb.sdk.wandb_artifacts.Artifact at 0x7f33e2027250>

In [21]:
# val_y artifact
artifact = wandb.Artifact(args["val_target_artifact"],
                          type="VAL_DATA",
                          description="A json file representing the val_y"
                          )

logger.info("Logging val_y artifact")
artifact.add_file(args["val_target_artifact"])
run.log_artifact(artifact)

16-10-2022 14:58:53 Logging val_y artifact


<wandb.sdk.wandb_artifacts.Artifact at 0x7f33e2024250>

In [22]:
# test_x artifact
artifact = wandb.Artifact(args["test_feature_artifact"],
                          type="TEST_DATA",
                          description="A json file representing the test_x"
                          )

logger.info("Logging test_x artifact")
artifact.add_file(args["test_feature_artifact"])
run.log_artifact(artifact)

16-10-2022 14:58:55 Logging test_x artifact


<wandb.sdk.wandb_artifacts.Artifact at 0x7f33e201f3d0>

In [23]:
# test_y artifact
artifact = wandb.Artifact(args["test_target_artifact"],
                          type="TEST_DATA",
                          description="A json file representing the test_y"
                          )

logger.info("Logging test_y artifact")
artifact.add_file(args["test_target_artifact"])
run.log_artifact(artifact)

16-10-2022 14:58:57 Logging test_y artifact


<wandb.sdk.wandb_artifacts.Artifact at 0x7f33e2024ad0>

In [24]:
run.finish()