Name		Name	Last commit message	Last commit date
parent directory ..
devign_modified		devign_modified
README.md		README.md
example_run.png		example_run.png

README.md

Devign Implementation

This is a re-implementation and demonstration of the Devign model of AI source code validation.

Notes

Environment

The Devign model was ran via Cygwin and Windows Powershell.

Prerequisites

The Devign model was made two years ago nearly, which meant that some tweaks were necessary prior to running.

The following prerequisites worked on their modern versions:

However, some prerequisites have been updated since, and required a version downgrade for the model to work.

PyTorch Geometric (downgraded to: 1.6.3)

Code changes

A few changes were necessary prior to running the model.

datamanager.py: pandas append functions have been deprecated, and were replaced with concat functions. Functionally remains overall unchanged.

Dataset

Devign uses this partial dataset released by the authors.

The data used is a variety of example C-style code. They are drawn from the commit histoiry of the FFmpeg project.

An example piece of data looks like the following. (Some whitespace has been removed for brevity)

static int process_frame(FFFrameSync *fs)
{

    AVFilterContext *ctx = fs->parent;
    LUT2Context *s = fs->opaque;
    AVFilterLink *outlink = ctx->outputs[0];
    AVFrame *out, *srcx, *srcy;
    int ret;

    if ((ret = ff_framesync2_get_frame(&s->fs, 0, &srcx, 0)) < 0 ||
        (ret = ff_framesync2_get_frame(&s->fs, 1, &srcy, 0)) < 0)
        return ret;

    if (ctx->is_disabled) {
        out = av_frame_clone(srcx);
        if (!out)
            return AVERROR(ENOMEM);

    } else {
        out = ff_get_video_buffer(outlink, outlink->w, outlink->h);
        if (!out)
            return AVERROR(ENOMEM);

        av_frame_copy_props(out, srcx);
        s->lut2(s, out, srcx, srcy);

    }

    out->pts = av_rescale_q(s->fs.pts, s->fs.time_base, outlink->time_base);
    return ff_filter_frame(outlink, out);
}

Sample Results

Model details

Output number: 200
Number of layers: 6

Layers

gated graph conv:
- output size: 200
- number of layers: 6
- aggr: add
- bias: true
conv1d 1:
- input size: 205
- output size: 50
- kernel size: 3
- padding: 1
conv1d 2:
- input: 50
- output: 20
- kernel size: 1
- padding : 1
maxpool1d 1:
- kernel size: 3
- stride: 2
maxpool1d 2:
- kernel size: 2
- stride: 2
embedding:
- input size: 101

Values

The training process is ran with the following values:

learning rate: 1e-4
weight decay: 1.3e-6
loss lambda: 1.3e-6
epochs: 100
batch size: 8

Results

Training Loss: 0.1771
Training Accuracy: 0.2
Validation Loss: 10.7016
Validation Accuracy: 0.2308

Sample output (10 epochs)

The model has 515,542 trainable parameters
<class 'pandas.core.frame.DataFrame'>
Int64Index: 497 entries, 2 to 3330
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   input   497 non-null    object
 1   target  497 non-null    int64
dtypes: int64(1), object(1)
memory usage: 27.2 KB
<class 'pandas.core.frame.DataFrame'>
Int64Index: 497 entries, 3331 to 6789
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   input   497 non-null    object
 1   target  497 non-null    int64
dtypes: int64(1), object(1)
memory usage: 27.2 KB
Splitting Dataset

Epoch 1; - Train Loss: 0.9258; Acc: 0.16; - Validation Loss: 0.6975; Acc: 0.0769; - Time: 20.524963855743408

Epoch 2; - Train Loss: 0.6979; Acc: 0.2; - Validation Loss: 0.6942; Acc: 0.0; - Time: 39.477110385894775

Epoch 3; - Train Loss: 0.6962; Acc: 0.17; - Validation Loss: 0.6932; Acc: 0.0; - Time: 56.8951313495636

Epoch 4; - Train Loss: 0.6942; Acc: 0.09; - Validation Loss: 0.6928; Acc: 0.0; - Time: 75.01300835609436

Epoch 5; - Train Loss: 0.6935; Acc: 0.14; - Validation Loss: 0.6925; Acc: 0.0; - Time: 93.91109228134155

Epoch 6; - Train Loss: 0.6935; Acc: 0.13; - Validation Loss: 0.6925; Acc: 0.0; - Time: 111.09103393554688

Epoch 7; - Train Loss: 0.6924; Acc: 0.16; - Validation Loss: 0.6925; Acc: 0.0; - Time: 129.56305360794067

Epoch 8; - Train Loss: 0.692; Acc: 0.13; - Validation Loss: 0.6925; Acc: 0.0; - Time: 146.13589143753052

Epoch 9; - Train Loss: 0.692; Acc: 0.14; - Validation Loss: 0.6927; Acc: 0.0; - Time: 161.65720987319946

Epoch 10; - Train Loss: 0.6906; Acc: 0.1; - Validation Loss: 0.6929; Acc: 0.0; - Time: 177.7955515384674
Testing
0      1
1      1
2      0
3      1
4      1
      ..
96     1
97     1
98     1
99     1
100    1
Length: 101, dtype: int64

Confusion matrix:
[[19 30]
 [12 40]]
TP: 40, FP: 30, TN: 19, FN: 12
Accuracy: 0.5841584158415841
Precision: 0.5714285714285714
Recall: 0.7692307692307693
F-measure: 0.6557377049180327
Precision-Recall AUC: 0.5661734637066375
AUC: 0.5953689167974883
MCC: 0.17011022247941873
Error: 94.39932166846525

Credit

Devign

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

devign_implementation

devign_implementation

README.md

Devign Implementation

Notes

Environment

Prerequisites

Code changes

Dataset

Sample Results

Model details

Layers

Values

Results

Sample output (10 epochs)

Credit

Files

devign_implementation

Directory actions

More options

Directory actions

More options

Latest commit

History

devign_implementation

Folders and files

parent directory

README.md

Devign Implementation

Notes

Environment

Prerequisites

Code changes

Dataset

Sample Results

Model details

Layers

Values

Results

Sample output (10 epochs)

Credit