# 06. Models training

In the previous notebooks, the dataset were curated and several feature extracted to train various machine learning models. In this notebook, the models will be initialized and trained with the curated datasets. The dataset is split 80/20 for training and testing, respectively.

## 06.a. Imports, logging configuration and dataset preparation

The first step is to perform the necessary imports and configure the program.

In [1]:
# Enable these line if live changes in the codebase are made
# %load_ext autoreload
# %autoreload 2

In [2]:
# Disable tensorflow logging
import os
import logging
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'  # or any {'0', '1', '2'}
logging.getLogger('tensorflow').setLevel(logging.FATAL)

In [3]:
# Specific instruction to run the notebooks from a sub-folder.
import sys
sys.path.append("..")

In [4]:
import logging
from bugfinder.settings import LOGGER
from bugfinder.base.dataset import CodeWeaknessClassificationDataset as Dataset
from bugfinder.models.dnn_classifier import DNNClassifierTraining
from bugfinder.models.linear_classifier import LinearClassifierTraining

In [5]:
# Setup logging to only output INFO level messages
LOGGER.setLevel(logging.INFO)
LOGGER.propagate = False

In [6]:
# Dataset directories (DO NOT EDIT)
v__2_dataset_path = [
    "../data/ai-dataset_v112", "../data/ai-dataset_v122", "../data/ai-dataset_v212", "../data/ai-dataset_v222"
]
v__3_dataset_path = [
    "../data/ai-dataset_v113", "../data/ai-dataset_v123", "../data/ai-dataset_v213", "../data/ai-dataset_v223"
]

dataset_to_copy = [
    v__2_dataset_path, v__3_dataset_path
]

## 06.b. Linear Regression

In [7]:
for dataset_path in v__2_dataset_path[:1]:
    LOGGER.info("Processing %s..." % dataset_path)
    dataset = Dataset(dataset_path)
    dataset.queue_operation(
        LinearClassifierTraining, {"name": "lin-cls", "max_items": 1000, "epochs": 10, "reset": True}
    )
    dataset.process()

[2022-04-21 18:10:04][INFO] Processing ../data/ai-dataset_v112...
[2022-04-21 18:10:04][INFO] Dataset initialized in 114ms.
[2022-04-21 18:10:04][INFO] Operation queue validated in 0ms.
[2022-04-21 18:10:04][INFO] Running operation 1/1 (bugfinder.models.linear_classifier.LinearClassifierTraining)...
[2022-04-21 18:10:04][INFO] Training LinearClassifierV2 on 270 items over 10 epochs. Testing on 133 items, focusing on f1-score...
[2022-04-21 18:10:05][INFO] Training dataset for epoch 1/10...




[2022-04-21 18:13:53][INFO] Training dataset for epoch 2/10...
[2022-04-21 18:17:34][INFO] Training dataset for epoch 3/10...
[2022-04-21 18:20:37][INFO] Training dataset for epoch 4/10...
[2022-04-21 18:23:36][INFO] Training dataset for epoch 5/10...
[2022-04-21 18:26:34][INFO] Training dataset for epoch 6/10...
[2022-04-21 18:29:30][INFO] Training dataset for epoch 7/10...
[2022-04-21 18:32:24][INFO] Training dataset for epoch 8/10...
[2022-04-21 18:35:17][INFO] Training dataset for epoch 9/10...
[2022-04-21 18:38:13][INFO] Training dataset for epoch 10/10...
[2022-04-21 18:42:04][INFO] Precision: 68.969% (nan%); Recall: 72.932% (nan%); F-score: 69.780% (nan%).
[2022-04-21 18:42:04][INFO] 1 operations run in 31m59s.


## 06.c. Multilayer Perceptron (default size)

In [8]:
for dataset_path in v__2_dataset_path[:1]:
    LOGGER.info("Processing %s..." % dataset_path)
    dataset = Dataset(dataset_path)
    dataset.queue_operation(DNNClassifierTraining, {"name": "dnn-default", "epochs": 10, "reset": True})
    dataset.process()

[2022-04-21 18:42:04][INFO] Processing ../data/ai-dataset_v112...
[2022-04-21 18:42:04][INFO] Dataset initialized in 268ms.
[2022-04-21 18:42:04][INFO] Operation queue validated in 0ms.
[2022-04-21 18:42:04][INFO] Running operation 1/1 (bugfinder.models.dnn_classifier.DNNClassifierTraining)...
[2022-04-21 18:42:04][INFO] Training DNNClassifierV2 on 270 items over 10 epochs. Testing on 133 items, focusing on f1-score...
[2022-04-21 18:42:04][INFO] Training dataset for epoch 1/10...
[2022-04-21 18:42:32][INFO] Training dataset for epoch 2/10...
[2022-04-21 18:42:53][INFO] Training dataset for epoch 3/10...
[2022-04-21 18:43:15][INFO] Training dataset for epoch 4/10...
[2022-04-21 18:43:36][INFO] Training dataset for epoch 5/10...
[2022-04-21 18:43:59][INFO] Training dataset for epoch 6/10...
[2022-04-21 18:44:24][INFO] Training dataset for epoch 7/10...
[2022-04-21 18:44:49][INFO] Training dataset for epoch 8/10...
[2022-04-21 18:45:14][INFO] Training dataset for epoch 9/10...
[2022-04-2

## 06.d. Multilayer Perceptron (configured size)

In [9]:
for dataset_path in v__3_dataset_path[:1]:
    LOGGER.info("Processing %s..." % dataset_path)
    dataset = Dataset(dataset_path)
    dataset.queue_operation(DNNClassifierTraining, {"name": "dnn-custom", "architecture": [25, 50, 75, 50, 25], 
                                                    "epochs": 10, "reset": True})
    dataset.process()

[2022-04-21 18:46:10][INFO] Processing ../data/ai-dataset_v113...
[2022-04-21 18:46:11][INFO] Dataset initialized in 593ms.
[2022-04-21 18:46:11][INFO] Operation queue validated in 0ms.
[2022-04-21 18:46:11][INFO] Running operation 1/1 (bugfinder.models.dnn_classifier.DNNClassifierTraining)...
[2022-04-21 18:46:11][INFO] Training DNNClassifierV2 on 270 items over 10 epochs. Testing on 133 items, focusing on f1-score...
[2022-04-21 18:46:11][INFO] Training dataset for epoch 1/10...
[2022-04-21 18:48:34][INFO] Training dataset for epoch 2/10...
[2022-04-21 18:50:51][INFO] Training dataset for epoch 3/10...
[2022-04-21 18:53:13][INFO] Training dataset for epoch 4/10...
[2022-04-21 18:55:34][INFO] Training dataset for epoch 5/10...
[2022-04-21 18:57:52][INFO] Training dataset for epoch 6/10...
[2022-04-21 19:00:12][INFO] Training dataset for epoch 7/10...
[2022-04-21 19:02:42][INFO] Training dataset for epoch 8/10...
[2022-04-21 19:04:58][INFO] Training dataset for epoch 9/10...
[2022-04-2

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
