# 02. Joern Processing

In this notebook, the [previously created](./01_pre_processing.ipynb) dataset will be parsed using various version of Joern. The parsed data will then be imported or converted into a Neo4J v3 database for further processing.

## 02.a.  Imports, logging configuration and dataset preparation

The first step is to perform the necessary imports and configure the program. Additionnally, the previously used dataset is copied into 3 different datasets to be processed by the various Joern versions.

In [1]:
# Specific instruction to run the notebooks from a sub-folder.
import sys
sys.path.append("..")

In [2]:
import logging
from tools.settings import LOGGER
from tools.dataset import CWEClassificationDataset as Dataset
from tools.dataset.processing.dataset_ops import CopyDataset, RightFixer
from tools.joern.v031 import JoernDatasetProcessing as Joern031DatasetProcessing
from tools.joern.v040 import JoernDatasetProcessing as Joern040DatasetProcessing
from tools.neo4j.converter import Neo4J2Converter, Neo4J3Converter
from tools.neo4j.importer import Neo4J3Importer
from tools.neo4j.annot import Neo4JAnnotations

In [3]:
# Setup logging to only output INFO level messages
LOGGER.setLevel(logging.INFO)

In [4]:
# Dataset directories (DO NOT EDIT)
cwe121_dataset_path = "../data/cwe121_v000_orig"
cwe121_v100_dataset_path = "../data/cwe121_v100"
cwe121_v200_dataset_path = "../data/cwe121_v200"
cwe121_v300_dataset_path = "../data/cwe121_v300"

# Number of sample to test (edit this number, performances will be impacted, max. 6288)
sample_nb = 200

In [5]:
# Copy the dataset into 3 different dataset for future use.
cwe121_dataset = Dataset(cwe121_dataset_path)
cwe121_dataset.queue_operation(CopyDataset, {"to_path": cwe121_v100_dataset_path, "force": True})
cwe121_dataset.queue_operation(CopyDataset, {"to_path": cwe121_v200_dataset_path, "force": True})
cwe121_dataset.queue_operation(CopyDataset, {"to_path": cwe121_v300_dataset_path, "force": True})

cwe121_dataset.process()

[2019-12-09 17:48:08][INFO] Dataset index build in 11ms. 200 test_cases, 2 classes, 0 features (v0).
[2019-12-09 17:48:08][INFO] Running operation 1/3 (CopyDataset)...
[2019-12-09 17:48:08][INFO] Dataset index build in 9ms. 200 test_cases, 2 classes, 0 features (v0).
[2019-12-09 17:48:08][INFO] Running operation 1/1 (RightFixer)...
[2019-12-09 17:48:09][INFO] 1 operations run in 843ms.
[2019-12-09 17:48:09][INFO] Running operation 2/3 (CopyDataset)...
[2019-12-09 17:48:09][INFO] Dataset index build in 14ms. 200 test_cases, 2 classes, 0 features (v0).
[2019-12-09 17:48:09][INFO] Running operation 1/1 (RightFixer)...
[2019-12-09 17:48:10][INFO] 1 operations run in 866ms.
[2019-12-09 17:48:10][INFO] Running operation 3/3 (CopyDataset)...
[2019-12-09 17:48:10][INFO] Dataset index build in 8ms. 200 test_cases, 2 classes, 0 features (v0).
[2019-12-09 17:48:10][INFO] Running operation 1/1 (RightFixer)...
[2019-12-09 17:48:11][INFO] 1 operations run in 924ms.
[2019-12-09 17:48:11][INFO] 3 oper

## 02.b. Joern v0.3.1

In [6]:
# Build the dataset that is going to be used
cwe121_v100_dataset = Dataset(cwe121_v100_dataset_path)

# Apply Joern 3.1 conversion and import into Neo4J v3
cwe121_v100_dataset.queue_operation(Joern031DatasetProcessing)
cwe121_v100_dataset.queue_operation(Neo4J2Converter)
cwe121_v100_dataset.queue_operation(RightFixer, {"command_args": "neo4j_v3.db 101 101"})
cwe121_v100_dataset.queue_operation(Neo4J3Converter)
cwe121_v100_dataset.queue_operation(Neo4JAnnotations)

cwe121_v100_dataset.process()

[2019-12-09 17:48:11][INFO] Dataset index build in 14ms. 200 test_cases, 2 classes, 0 features (v0).
[2019-12-09 17:48:11][INFO] Running operation 1/5 (JoernDatasetProcessing)...
[2019-12-09 17:48:19][INFO] Running operation 2/5 (Neo4J2Converter)...
[2019-12-09 17:48:26][INFO] Running operation 3/5 (RightFixer)...
[2019-12-09 17:48:27][INFO] Running operation 4/5 (Neo4J3Converter)...
[2019-12-09 17:48:41][INFO] Running operation 5/5 (Neo4JAnnotations)...
[2019-12-09 17:48:51][INFO] Running commands...
[2019-12-09 17:48:53][INFO] Command 1 out of 8 run in 2027ms
[2019-12-09 17:48:53][INFO] Command 2 out of 8 run in 98ms
[2019-12-09 17:48:53][INFO] Command 3 out of 8 run in 26ms
[2019-12-09 17:48:55][INFO] Command 4 out of 8 run in 1729ms
[2019-12-09 17:48:55][INFO] Command 5 out of 8 run in 424ms
[2019-12-09 17:48:56][INFO] Command 6 out of 8 run in 388ms
[2019-12-09 17:48:56][INFO] Command 7 out of 8 run in 111ms
[2019-12-09 17:48:56][INFO] Command 8 out of 8 run in 797ms
[2019-12-09 1

## 02.c. Joern v0.4.0

In [7]:
# Build the dataset that is going to be used
cwe121_v200_dataset = Dataset(cwe121_v200_dataset_path)

# Apply Joern 4.0 conversion and import into Neo4J v3
cwe121_v200_dataset.queue_operation(Joern040DatasetProcessing)
cwe121_v200_dataset.queue_operation(Neo4J3Importer)
cwe121_v200_dataset.queue_operation(RightFixer, {"command_args": "neo4j_v3.db 101 101"})
cwe121_v200_dataset.queue_operation(Neo4JAnnotations)


cwe121_v200_dataset.process()

[2019-12-09 17:48:57][INFO] Dataset index build in 10ms. 200 test_cases, 2 classes, 0 features (v0).
[2019-12-09 17:48:57][INFO] Running operation 1/4 (JoernDatasetProcessing)...
[2019-12-09 17:49:04][INFO] Running operation 2/4 (Neo4J3Importer)...
[2019-12-09 17:49:24][INFO] Running operation 3/4 (RightFixer)...
[2019-12-09 17:49:25][INFO] Running operation 4/4 (Neo4JAnnotations)...
[2019-12-09 17:49:35][INFO] Running commands...
[2019-12-09 17:49:36][INFO] Command 1 out of 8 run in 1478ms
[2019-12-09 17:49:37][INFO] Command 2 out of 8 run in 82ms
[2019-12-09 17:49:37][INFO] Command 3 out of 8 run in 69ms
[2019-12-09 17:49:38][INFO] Command 4 out of 8 run in 1595ms
[2019-12-09 17:49:39][INFO] Command 5 out of 8 run in 471ms
[2019-12-09 17:49:39][INFO] Command 6 out of 8 run in 438ms
[2019-12-09 17:49:39][INFO] Command 7 out of 8 run in 261ms
[2019-12-09 17:49:40][INFO] Command 8 out of 8 run in 836ms
[2019-12-09 17:49:40][INFO] Database annotated.
[2019-12-09 17:49:41][INFO] 4 operati

## 02.d. Joern v1.0.62

*Currently in development. Coming soon™...*

In [8]:
# Reserved space for future Joern version
# cwe121_v300_dataset = Dataset(cwe121_v300_dataset_path)
# ...

## Conclusion

In this notebook, the cleaned dataset was parsed using various Joern version and is now ready to be further processed in Neo4J. The [next notebook](./03_neo4j_processing.ipynb) details the step continue the processing before the feature extraction.