## Transformer Model for code written in Python

This notebook is intended to be run on a unix server with a single GPU processor and pytorch preconfigured to run on the machine.

The following code is written for raw Python input data: 

Input Features:
- Method bodies <br/>
- Method declaration

Output: Natural language short summaries of codes

#We do not strip off punctuation to measure their effect on the summaries

In [None]:
# Step 1: change directory so that OpenNMT-py actually works
%cd OpenNMT-py

# Step 2: install some additional needed packages for OpenNMT
!pip3 install -r requirements.txt

#Step 3: install torch
!pip3 install torch

In [None]:
#Check GPU availability
import torch
torch.cuda.get_device_name(0)

In [None]:
# Preprocess the data as needed by Opennmt
!python3 preprocess.py \
-train_src data/src_train.txt \
-train_tgt data/tgt_train.txt \
-valid_src data/src_val.txt \
-valid_tgt data/tgt_val.txt \
-save_data data/prepped

In [None]:
#Train the model
!python3 train.py \
    -data data/prepped \
    -save_model model/python_summ06022019 \
    -keep_checkpoint 2 \
    -layers 6 \
    -rnn_size 512 \
    -word_vec_size 512 \
    -transformer_ff 2048 \
    -heads 8 \
    -encoder_type transformer \
    -decoder_type transformer \
    -position_encoding \
    -max_generator_batches 0 \
    -dropout 0.1 \
    -batch_size 40 \
    -batch_type tokens \
    -normalization tokens \
    -accum_count 2 \
    -optim adam \
    -adam_beta2 0.998 \
    -decay_method noam \
    -warmup_steps 8000 \
    -learning_rate 0.2 \
    -max_grad_norm 0 \
    -param_init 0  \
    -param_init_glorot \
    -label_smoothing 0.1 \
    -valid_steps 100000 \
    -save_checkpoint_steps 1000 \
    -world_size 1 \
    -gpu_ranks 0

In [None]:
python3 translate.py \
    -model model/python_summ06022019_step_99000.pt \
    -src data/src_test.txt \
    -tgt data/tgt_test.txt \
    -output pred.txt \
    -report_time 