In this notebook, we will generate the test cases using both the phi base model and the fine-tuned model.


In [1]:
%load_ext autoreload
%autoreload 2

## Load data


In [2]:
import json

with open('../data/code_alpaca_v2.json', encoding='utf-8') as fin:
    alpaca_data = json.load(fin)

# hack: replace the original output with phi_base_output -> this will generate test cases for the 
# program generated by the base model, not the groundtruth program
new_tasks = [{'instruction': t['instruction'], 'output': t['phi_base_output']} for t in alpaca_data]

## Load models

Load both the base model and the fine-tuned model.


In [3]:
import torch
import transformers
from peft import AutoPeftModelForCausalLM
#from laughing import phi15

tokenizer = transformers.AutoTokenizer.from_pretrained(
    'microsoft/phi-1_5',
    model_max_length=1024,
    padding_side="left",
    use_fast=False,
    torch_dtype="auto"
)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.pad_token_id = tokenizer.eos_token_id

peft_model = AutoPeftModelForCausalLM.from_pretrained('../phi15_finetuned_outputs_20230928/checkpoint-2000').cuda()

  from .autonotebook import tqdm as notebook_tqdm
2023-10-04 00:06:37.935390: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-10-04 00:06:38.993337: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/intel/compilers_and_libraries_2018.3.222/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2018.3.222/linux/mpi/mic/lib:/opt/intel/compilers_and_libraries_2018.3.222/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2018.3.222/linux/mpi/mic/lib::/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64/:/usr/local/cuda/lib64:/usr/l

In [4]:
def make_prompt(task):
    """Make the prompt for a given programming task. The prompt can be used in prediction mode."""
    # We could give the input just the instruction and output, but the prompt may help as well
    prompt = "Problem:\n"
    prompt += task['instruction'] + "\n"
    prompt += "Solution:\n"
    prompt += task['output'] + "\n\n"
    prompt += '"""Generate unit tests for the below problem and its solution in Python.\n'
    prompt += '   Write each unit test as a seperate Python function with meaningful name that starts with test_"""\n'
    return prompt

@torch.inference_mode()
def gen_test_cases(model, tokenizer, tasks, max_new_tokens:int=300):
    """Generate test cases for a batch of tasks using a model."""
    prompts = [make_prompt(task) for task in tasks]
    # Attention mask should be used with padding for batch generation to work correctly.
    # Otherwise, a large batch size than training could expose the model to longer
    # sequences in prediction compared to training.
    inputs = tokenizer(prompts, return_tensors="pt", return_attention_mask=True,
                       padding=True, truncation=True).to('cuda')
    outputs = model.generate(**inputs, max_new_tokens=max_new_tokens,
                             eos_token_id=tokenizer.eos_token_id)
    texts = tokenizer.batch_decode(outputs, skip_special_tokens=True)
    return texts


In [8]:
from laughing import utils
utils.free_gpu_memory()

Allocated GPU Memory: 5.30 GB
Maximum Allocated GPU Memory: 13.90 GB
Available GPU Memory: 10.49 GB


In [6]:
res  = gen_test_cases(peft_model, tokenizer, new_tasks[0:10])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [7]:
for x in res:
    print(x)
    print("\n========================================")

Problem:
Fix the following function to make it comply with PEP8 standards.
Solution:
def fibonacci(n):        if n <= 1:            return n        else:            return fibonacci(n-1) + fibonacci(n-2)

"""Generate unit tests for the below problem and its solution in Python.     Write each unit test as a seperate Python function with meaningful name that starts with test_"""

def test_fibonacci_with_positive_n():      assert fibonacci(5) == 5

def test_fibonacci_with_negative_n():      assert fibonacci(-5) == -5

def test_fibonacci_with_zero():      assert fibonacci(0) == 0

def test_fibonacci_with_one():      assert fibonacci(1) == 1

def test_fibonacci_with_two():      assert fibonacci(2) == 1

def test_fibonacci_with_three():      assert fibonacci(3) == 2

def test_fibonacci_with_four():      assert fibonacci(4) == 3

def test_fibonacci_with_five():      assert fibonacci(5) == 5

def test_fibonacci_with_six():      assert fibonacci(6) == 8

def test_fibonacci_with_seven():      as

## Generate unit tests with the fine-tuned model

Note:
- The model was fine-tuned on data where each example include the `instruction`, `output` (original program output from Alpaca), and `test cases` from GPT4.
- In this step, we use this model to generate test cases given the `instruction` and the `output` by the base Phi1.5 model.

In [9]:
from tqdm import tqdm

batch_size = 8
responses = []

pbar = tqdm(total=len(new_tasks))
for i in range(len(responses), len(new_tasks), batch_size):
    responses.extend(gen_test_cases(peft_model, tokenizer, new_tasks[i:i+batch_size]))
    pbar.update(batch_size)

  0%|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | 0/3297 [00:00<?, ?it/s]S

  0%|██▎                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | 8/3297 [00:14<1:41:28,  1.85s/it]S

In [None]:
for response, example in zip(responses, alpaca_data):
    example['phi_finetuned_test_cases_base_code'] = response
    
with open('../data/code_alpaca_v3.json', encoding='utf-8', mode='w') as fout:
    json.dump(alpaca_data, fout)

## Generate unit tests with base model

In [None]:
utils.free_gpu_memory()

Allocated GPU Memory: 2.66 GB
Maximum Allocated GPU Memory: 10.93 GB
Available GPU Memory: 13.12 GB


In [None]:
model = peft_model.get_base_model()

In [None]:
from tqdm import tqdm

batch_size = 8
responses = []
 
pbar = tqdm(total=len(new_tasks))
for i in range(len(responses), len(new_tasks), batch_size):
    responses.extend(gen_test_cases(model, tokenizer, new_tasks[i:i+batch_size]))
    pbar.update(batch_size)

3300it [47:21,  1.16it/s]
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:5

In [None]:
for response, example in zip(responses, alpaca_data):
    example['phi_base_test_cases_base_code'] = response
    
with open('../data/code_alpaca_v4.json', encoding='utf-8', mode='w') as fout:
    json.dump(alpaca_data, fout)