# 05. Models training

In the previous notebooks, the dataset were curated and several feature extracted to train various machine learning models. In this notebook, the model will be initialized and trained with the various dataset. The dataset is split 80/20, respectively, for training and testing.

## 05.a. Imports, logging configuration and dataset preparation

The first step is to perform the necessary imports and configure the program.

In [10]:
# Enable these line if live changes in the codebase are made
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [11]:
# Disable tensorflow logging
import os
import logging
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'  # or any {'0', '1', '2'}
logging.getLogger('tensorflow').setLevel(logging.FATAL)

In [12]:
# Specific instruction to run the notebooks from a sub-folder.
import sys
sys.path.append("..")

In [13]:
import logging
from bugfinder.settings import LOGGER
from bugfinder.dataset import CWEClassificationDataset as Dataset
from bugfinder.models.dnn_classifier import DNNClassifierTraining
from bugfinder.models.linear_classifier import LinearClassifierTraining


In [14]:
# Setup logging to only output INFO level messages
LOGGER.setLevel(logging.INFO)
LOGGER.propagate = False

In [15]:
# Dataset directories (DO NOT EDIT)
cwe121_v__0_dataset_path = [
    "../data/cwe121_v110", "../data/cwe121_v120", "../data/cwe121_v210", "../data/cwe121_v220", 
#     "../data/cwe121_v310", "../data/cwe121_v320"
]
cwe121_v__1_dataset_path = [
    "../data/cwe121_v111", "../data/cwe121_v121", "../data/cwe121_v211", "../data/cwe121_v221", 
#     "../data/cwe121_v311", "../data/cwe121_v321"
]
cwe121_v__2_dataset_path = [
    "../data/cwe121_v112", "../data/cwe121_v122", "../data/cwe121_v212", "../data/cwe121_v222", 
#     "../data/cwe121_v312", "../data/cwe121_v322"
]
cwe121_v__3_dataset_path = [
    "../data/cwe121_v113", "../data/cwe121_v123", "../data/cwe121_v213", "../data/cwe121_v223", 
#     "../data/cwe121_v313", "../data/cwe121_v323"
]
# cwe121_v__4_dataset_path = [
#     "../data/cwe121_v114", "../data/cwe121_v124", "../data/cwe121_v214", "../data/cwe121_v224", 
#     "../data/cwe121_v314", "../data/cwe121_v324"
# ]

dataset_to_copy = [
#     cwe121_v__1_dataset_path, cwe121_v__2_dataset_path, cwe121_v__3_dataset_path, cwe121_v__4_dataset_path
    cwe121_v__1_dataset_path, cwe121_v__2_dataset_path, cwe121_v__3_dataset_path
]

## 05.b. Linear Regression

In [18]:
for dataset_path in cwe121_v__1_dataset_path[:1]:
    LOGGER.info("Processing %s..." % dataset_path)
    dataset = Dataset(dataset_path)
    dataset.queue_operation(LinearClassifierTraining, {"name": "lin-cls", "reset": True})
    dataset.process()

[2020-06-05 14:33:13][INFO] Processing ../data/cwe121_v111...
[2020-06-05 14:33:13][INFO] Dataset index build in 35ms. 200 test_cases, 2 classes, 52 features (v0).
[2020-06-05 14:33:13][INFO] Running operation 1/1 (LinearClassifierTraining)...
[2020-06-05 14:33:13][INFO] Removing ../data/cwe121_v111/models/lin-cls...
[2020-06-05 14:33:36][INFO] Precision: 71.401%; Recall: 68.841%; F-score: 69.698%
[2020-06-05 14:33:36][INFO] 1 operations run in 22489ms.


## 05.c. Multilayer Perceptron (default size)

In [None]:
for dataset_path in cwe121_v__1_dataset_path:
    LOGGER.info("Processing %s..." % dataset_path)
    dataset = Dataset(dataset_path)
    dataset.queue_operation(DNNClassifierTraining, {"name": "dnn-default", "reset": True})
    dataset.process()

## 05.d. Multilayer Perceptron (configured size)

In [None]:
for dataset_path in cwe121_v__1_dataset_path:
    LOGGER.info("Processing %s (Epoch %d)..." % (dataset_path, e))
    dataset = Dataset(dataset_path)
    dataset.queue_operation(DNNClassifierTraining, {"name": "dnn-custom", "architecture": [25, 50, 75, 50, 25], 
                                                    "reset": True})