# Prompt Tuning 


# Prompt Tuning Tutorial

This Jupyter notebook is designed to introduce you to the concept of prompt tuning, a method used to adapt language models to specific tasks or datasets without the need for extensive retraining. We'll go through setting up our environment, preparing our data, loading a model, and applying prompt tuning techniques.

Prompt tuning offers a way to leverage large pre-trained models like GPT or BERT for specific tasks by fine-tuning them with prompts – small pieces of text that guide the model in generating responses or predictions in a desired context.

In this tutorial, we will use a pre-trained model from the `transformers` library and apply prompt tuning to it for a specific task. The aim is to show how prompt tuning can be effectively used to guide the model's predictions.

Paper can be found [here](https://arxiv.org/pdf/2104.08691.pdf)

In [1]:
#pip install transformers --upgrade  # Upgrade the transformers library to the latest version  # This command installs or upgrades the specified Python library.


## Environment Setup

Before we start, it's crucial to set up our environment by installing the necessary Python libraries. These libraries include `transformers` for accessing pre-trained models and utilities, `protobuf` for data serialization, and others that may be required for specific tasks. Uncomment any installations if you're working in an environment where these libraries are not already available.

The code cells below install the necessary libraries. These installations are crucial for the subsequent parts of this tutorial.

In [None]:
#pip install torch==1.7.*  # Optional: Install a specific version of PyTorch if necessary

In [None]:
# pip install flash-attn  # Optional: Install flash attention for faster transformer computations

In [2]:
#pip install protobuf==3.20.*  # Install a specific version of protobuf required for compatibility  # This command installs or upgrades the specified Python library.

In [None]:
# pip install torchvision==0.14.0  # Optional: Install a specific version of torchvision if working with images

In [None]:
# pip install flash_attn --upgrade  

In [3]:
#pip install -U accelerate    # This command installs or upgrades the specified Python library.

In [None]:
# !wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcusparse-dev-11-7_11.7.3.50-1_amd64.deb -O /tmp/libcusparse-dev-11-7_11.7.3.50-1_amd64.deb && \  
#   wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcublas-dev-11-7_11.10.1.25-1_amd64.deb -O /tmp/libcublas-dev-11-7_11.10.1.25-1_amd64.deb && \  
#   wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcusolver-dev-11-7_11.4.0.1-1_amd64.deb -O /tmp/libcusolver-dev-11-7_11.4.0.1-1_amd64.deb && \  
#   wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcurand-dev-11-7_10.2.10.91-1_amd64.deb -O /tmp/libcurand-dev-11-7_10.2.10.91-1_amd64.deb && \  
#   dpkg -i /tmp/libcusparse-dev-11-7_11.7.3.50-1_amd64.deb && \  
#   dpkg -i /tmp/libcublas-dev-11-7_11.10.1.25-1_amd64.deb && \  
#   dpkg -i /tmp/libcusolver-dev-11-7_11.4.0.1-1_amd64.deb && \  
#   dpkg -i /tmp/libcurand-dev-11-7_10.2.10.91-1_amd64.deb  

In [None]:
#pip install torch==1.11.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html  

In [None]:
#pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116  

In [None]:
#pip install torchaudio  

In [None]:
#pip install ninja  

In [None]:
#pip install flash-attn --no-build-isolation  

In [None]:
# dbutils.library.restartPython()  

In [None]:
#!flash-attn --version  


## Data Preparation

In this section, we will prepare our dataset for the prompt tuning process. This involves loading the data, preprocessing it according to the requirements of our model, and setting up training and validation splits. Data preparation is a crucial step to ensure our model can learn effectively from our dataset.
    
- **Load Data**: Load your dataset from a file or a database.
- **Preprocess Data**: Clean and format your data to make it suitable for the model. This might involve tokenization, removing unnecessary parts of the data, or formatting it in a specific way.
- **Split Data**: Divide your dataset into training and validation sets to evaluate the model's performance.


In [None]:
  
!nvidia-smi  

Wed Oct 11 18:46:28 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA A10G                    Off | 00000000:00:1E.0 Off |                    0 |
|  0%   23C    P8               9W / 300W |      4MiB / 23028MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                         

In [None]:
import torch  
torch.cuda.empty_cache()   

In [None]:
import os  
  
os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'max_split_size_mb:16'  

In [None]:
# from datasets import load_dataset  
  
# dataset = load_dataset("wikisql")  
  
# # Take only 1k data  
# #dataset = dataset.select(range(2000))  

In [None]:
# from llama_attn_replace import replace_llama_attn  
# replace_llama_attn()  

In [None]:
from datasets import load_dataset, DatasetDict  
  
# Load the original dataset  
original_dataset = load_dataset("wikisql")  
  
# Define the sizes for the train, test, and validation splits  
train_size = 2000  
test_size = 500  
validation_size = 300  
  
# Create new datasets for train, test, and validation  
train_dataset = original_dataset["train"].shuffle(seed=42).select([i for i in range(train_size)])  
test_dataset = original_dataset["test"].shuffle(seed=42).select([i for i in range(test_size)])  
validation_dataset = original_dataset["validation"].shuffle(seed=42).select([i for i in range(validation_size)])  
  
# Create a new DatasetDict and assign the sampled datasets to their respective keys  
new_dataset_dict = DatasetDict({  
    "train": train_dataset,  
    "test": test_dataset,  
    "validation": validation_dataset,  
})  
  
# Print the sizes of the new datasets in the same format  
for split, split_dataset in new_dataset_dict.items():  
    print(f"{split}: Dataset({{ features: {list(split_dataset.features.keys())}, num_rows: {len(split_dataset)} }})")  
  



Downloading builder script:   0%|          | 0.00/6.57k [00:00<?, ?B/s]

Downloading metadata:   0%|          | 0.00/2.76k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/7.80k [00:00<?, ?B/s]



Downloading data:   0%|          | 0.00/26.2M [00:00<?, ?B/s]

Generating test split:   0%|          | 0/15878 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/8421 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/56355 [00:00<?, ? examples/s]

train: Dataset({ features: ['phase', 'question', 'table', 'sql'], num_rows: 2000 })
test: Dataset({ features: ['phase', 'question', 'table', 'sql'], num_rows: 500 })
validation: Dataset({ features: ['phase', 'question', 'table', 'sql'], num_rows: 300 })


In [None]:
dataset=new_dataset_dict  


## Model Loading

Here, we will load a pre-trained model and tokenizer from the `transformers` library. The choice of model depends on the task at hand and the language of your data. For example, for English text, GPT-3 or BERT models are commonly used.

- **Load Tokenizer**: Load the tokenizer corresponding to your chosen model. This tokenizer will convert text into a format that the model can understand.
- **Load Model**: Load the pre-trained model. We will fine-tune this model on our specific task using prompt tuning.


In [None]:
import torch  
"cuda" if torch.cuda.is_available() else "cpu"  

'cuda'

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, default_data_collator, get_linear_schedule_with_warmup  
from peft import get_peft_config, get_peft_model, PromptTuningInit, PromptTuningConfig, TaskType, PeftType  
import torch  
from datasets import load_dataset  
import os  
from torch.utils.data import DataLoader  
from tqdm import tqdm  
  
device = "cuda"  
model_name_or_path = "meta-llama/Llama-2-7b-hf"  
tokenizer_name_or_path = "meta-llama/Llama-2-7b-hf"  
peft_config = PromptTuningConfig(  
    task_type=TaskType.CAUSAL_LM,  
    prompt_tuning_init=PromptTuningInit.TEXT,  
    num_virtual_tokens=8,  
    prompt_tuning_init_text="###Instruction Convert question in natural language to SQL",  
    tokenizer_name_or_path=model_name_or_path,  
)  
  
  
text_column = "question"  
label_column = "human_readable"  
max_length = 64  
lr = 3e-2  
num_epochs = 3  
batch_size = 8  

In [None]:
from huggingface_hub import notebook_login  
notebook_login()  

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)  
if tokenizer.pad_token_id is None:  
    tokenizer.pad_token_id = tokenizer.eos_token_id  

Downloading (…)okenizer_config.json:   0%|          | 0.00/776 [00:00<?, ?B/s]

Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

In [None]:
tokenizer  

Using sep_token, but it is not set yet.
Using cls_token, but it is not set yet.
Using mask_token, but it is not set yet.


LlamaTokenizerFast(name_or_path='meta-llama/Llama-2-7b-hf', vocab_size=32000, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '</s>'}, clean_up_tokenization_spaces=False),  added_tokens_decoder={
	0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}

## Converting wikisql to instruction dataset

```Format: Question : <Question> \n Table Columns : <List_of_columns> \n SQL : <SQL_TO_BE_GENERATED>```

In [None]:
def fetch_table_context(table):  
    header_type = [f"{header}:{typ}" for header,typ in zip(table['header'],table['types'])]  
    return ",".join(header_type)  
  
  
def preprocess_function(examples):  
    batch_size = len(examples[text_column])  
    inputs = [f"{text_column} : {x} \n ###Table Columns : {fetch_table_context(t)} \n ###SQL : " for x,t in zip(examples[text_column],examples['table'])]   
    print(inputs[0])  
    targets = [str(x[label_column])+'\n' for x in examples["sql"]]  
    model_inputs = tokenizer(inputs)  
    labels = tokenizer(targets)  
    for i in range(batch_size):  
        sample_input_ids = model_inputs["input_ids"][i]  
        label_input_ids = labels["input_ids"][i] + [tokenizer.pad_token_id]  
        # print(i, sample_input_ids, label_input_ids)  
        model_inputs["input_ids"][i] = sample_input_ids + label_input_ids  
        labels["input_ids"][i] = [-100] * len(sample_input_ids) + label_input_ids  
        model_inputs["attention_mask"][i] = [1] * len(model_inputs["input_ids"][i])  
    # print(model_inputs)  
    for i in range(batch_size):  
        sample_input_ids = model_inputs["input_ids"][i]  
        label_input_ids = labels["input_ids"][i]  
        model_inputs["input_ids"][i] = [tokenizer.pad_token_id] * (  
            max_length - len(sample_input_ids)  
        ) + sample_input_ids  
        model_inputs["attention_mask"][i] = [0] * (max_length - len(sample_input_ids)) + model_inputs[  
            "attention_mask"  
        ][i]  
        labels["input_ids"][i] = [-100] * (max_length - len(sample_input_ids)) + label_input_ids  
        model_inputs["input_ids"][i] = torch.tensor(model_inputs["input_ids"][i][:max_length])  
        model_inputs["attention_mask"][i] = torch.tensor(model_inputs["attention_mask"][i][:max_length])  
        labels["input_ids"][i] = torch.tensor(labels["input_ids"][i][:max_length])  
    model_inputs["labels"] = labels["input_ids"]  
    return model_inputs  

In [None]:
train_processed_datasets = dataset['train'].map(  
    preprocess_function,  
    batched=True,  
    num_proc=1,  
    load_from_cache_file=False,  
    remove_columns=dataset["train"].column_names,  
    desc="Running tokenizer on dataset",  
)  
  
test_processed_datasets = dataset['test'].map(  
    preprocess_function,  
    batched=True,  
    num_proc=1,  
    load_from_cache_file=False,  
    remove_columns=dataset["test"].column_names,  
    desc="Running tokenizer on dataset",  
)  

Running tokenizer on dataset:   0%|          | 0/2000 [00:00<?, ? examples/s]

question : Which sum of week that had an attendance larger than 55,767 on September 28, 1986? 
 ###Table Columns : Week:real,Date:text,Opponent:text,Result:text,Attendance:real 
 ###SQL : 
question : What is the sum of losses for Geelong Amateur, with 0 byes? 
 ###Table Columns : Bellarine FL:text,Wins:real,Byes:real,Losses:real,Draws:real,Against:real 
 ###SQL : 


Running tokenizer on dataset:   0%|          | 0/500 [00:00<?, ? examples/s]

question : Name the incelandic of the glossary for 218 
 ###Table Columns : Word number:text,The Basque of the glossary:text,Modern Basque:text,The Icelandic of the glossary:text,English translation:text 
 ###SQL : 


In [None]:
dataset["train"][0]  

{'phase': 2,
 'question': 'Which sum of week that had an attendance larger than 55,767 on September 28, 1986?',
 'table': {'header': ['Week', 'Date', 'Opponent', 'Result', 'Attendance'],
  'page_title': '1986 Kansas City Chiefs season',
  'page_id': '12536732',
  'types': ['real', 'text', 'text', 'text', 'real'],
  'id': '2-12536732-1',
  'section_title': 'Schedule',
  'caption': 'Schedule',
  'rows': [['1',
    'September 7, 1986',
    'Cincinnati Bengals',
    'W 24–14',
    '43,430'],
   ['2', 'September 14, 1986', 'at Seattle Seahawks', 'L 23–17', '61,068'],
   ['3', 'September 21, 1986', 'Houston Oilers', 'W 27–13', '43,699'],
   ['4', 'September 28, 1986', 'at Buffalo Bills', 'W 20–17', '67,555'],
   ['5', 'October 5, 1986', 'Los Angeles Raiders', 'L 24–17', '74,430'],
   ['6', 'October 12, 1986', 'at Cleveland Browns', 'L 20–7', '71,278'],
   ['7', 'October 19, 1986', 'San Diego Chargers', 'W 42–41', '55,767'],
   ['8', 'October 26, 1986', 'Tampa Bay Buccaneers', 'W 27–20', '36,

In [None]:
train_processed_datasets['input_ids']  

[[1,
  1139,
  584,
  8449,
  2533,
  310,
  4723,
  393,
  750,
  385,
  14333,
  749,
  7200,
  1135,
  29871,
  29945,
  29945,
  29892,
  29955,
  29953,
  29955,
  373,
  3839,
  29871,
  29906,
  29947,
  29892,
  29871,
  29896,
  29929,
  29947,
  29953,
  29973,
  29871,
  13,
  835,
  3562,
  12481,
  29879,
  584,
  15511,
  29901,
  6370,
  29892,
  2539,
  29901,
  726,
  29892,
  29949,
  407,
  265,
  296,
  29901,
  726,
  29892,
  3591,
  29901,
  726,
  29892,
  4165,
  21642,
  29901,
  6370,
  29871],
 [1,
  1139,
  584,
  12317,
  1299,
  8519,
  6093,
  349,
  6992,
  9375,
  22659,
  29871,
  29945,
  29945,
  323,
  3960,
  2890,
  29973,
  29871,
  13,
  835,
  3562,
  12481,
  29879,
  584,
  5977,
  29901,
  726,
  29892,
  13454,
  287,
  29901,
  726,
  29892,
  24515,
  1233,
  29901,
  726,
  29892,
  29931,
  520,
  29901,
  726,
  29892,
  20325,
  363,
  29901,
  726,
  29892,
  20325,
  2750,
  29901,
  726,
  29892,
  29911,
  2722,
  363,
  29901,
 

In [None]:
words=[word for sentence in train_processed_datasets['labels'] for word in sentence]  
  
# Get the set of unique words  
unique_words = set(words)  
  
# Print the number of unique words  
print(len(unique_words))  
print(len(words))  

883
128000


In [None]:
words=[word for sentence in train_processed_datasets['input_ids'] for word in sentence]  
  
# Get the set of unique words  
unique_words = set(words)  
  
# Print the number of unique words  
print(len(unique_words))  
print(len(words))  

5558
128000


In [None]:
words=[word for sentence in test_processed_datasets['labels'] for word in sentence]  
  
# Get the set of unique words  
unique_words = set(words)  
  
# Print the number of unique words  
print(len(unique_words))  
print(len(words))  

349
32000


In [None]:
words=[word for sentence in test_processed_datasets['input_ids'] for word in sentence]  
  
# Get the set of unique words  
unique_words = set(words)  
  
# Print the number of unique words  
print(len(unique_words))  

2805


In [None]:
train_dataloader = DataLoader(  
    train_processed_datasets, shuffle=True, collate_fn=default_data_collator, batch_size=batch_size, pin_memory=True  
)  
eval_dataloader = DataLoader(test_processed_datasets, collate_fn=default_data_collator, batch_size=batch_size, pin_memory=True)  


## Prompt Tuning Implementation

Prompt tuning is a lightweight method to fine-tune large language models on a specific task with minimal data. It involves adjusting the prompts that we provide to the model to steer its predictions in the desired direction.

- **Define Prompts**: Create prompts that are relevant to your task. These prompts should guide the model in generating the correct output for your task.
- **Tune Model with Prompts**: Apply these prompts to your model and adjust the model's responses based on the prompts. This step may involve training the model on a dataset with these prompts.
- **Evaluate**: Test the model's performance on a validation set to see how well it has adapted to the task with the help of the prompts.

In [None]:
from transformers import (  
    AutoModelForCausalLM,  
    AutoTokenizer,  
    BitsAndBytesConfig,  
    TrainingArguments,  
)  
  
bnb_config = BitsAndBytesConfig(  
    load_in_4bit=True,  
    bnb_4bit_quant_type="nf4",  
    bnb_4bit_compute_dtype=torch.float16,  
)  
  
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,trust_remote_code=True, device_map="auto",quantization_config=bnb_config, torch_dtype=torch.float16)  
model = get_peft_model(model, peft_config)  
print(model.print_trainable_parameters())  

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

trainable params: 32,768 || all params: 6,738,448,384 || trainable%: 0.0004862840543203603
None


In [None]:
optimizer = torch.optim.AdamW(model.parameters(), lr=lr)  
lr_scheduler = get_linear_schedule_with_warmup(  
    optimizer=optimizer,  
    num_warmup_steps=0,  
    num_training_steps=(len(train_dataloader) * num_epochs),  
)  

In [None]:
#export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:32  

In [None]:
device = "cuda"  

In [5]:
model = model.to(device)  
#Training the model  
for epoch in range(num_epochs):  
    model.train()  
    total_loss = 0  
    for step, batch in enumerate(tqdm(train_dataloader)):  
        batch = {k: v.to(device) for k, v in batch.items()}  
        outputs = model(**batch)  
        loss = outputs.loss  
        total_loss += loss.detach().float()  
        loss.backward()  
        optimizer.step()  
        lr_scheduler.step()  
        optimizer.zero_grad()  
  
    model.eval()  
    eval_loss = 0  
    eval_preds = []  
    for step, batch in enumerate(tqdm(eval_dataloader)):  
        batch = {k: v.to(device) for k, v in batch.items()}  
        with torch.no_grad():  
            outputs = model(**batch)  
        loss = outputs.loss  
        eval_loss += loss.detach().float()  
        eval_preds.extend(  
            tokenizer.batch_decode(torch.argmax(outputs.logits, -1).detach().cpu().numpy(), skip_special_tokens=True)  
        )  
  
    eval_epoch_loss = eval_loss / len(eval_dataloader)  
    eval_ppl = torch.exp(eval_epoch_loss)  
    train_epoch_loss = total_loss / len(train_dataloader)  
    train_ppl = torch.exp(train_epoch_loss)  
    print(f"{epoch=}: {train_ppl=} {train_epoch_loss=} {eval_ppl=} {eval_epoch_loss=}")  

In [None]:
dbutils.fs.mkdirs("save_model_path")  

True

In [None]:
model.save_pretrained("save_model_path")  


## Inference and Conclusion

In this tutorial, we've covered the basics of prompt tuning, from setting up our environment, preparing our data, loading a pre-trained model, to applying prompt tuning techniques. The aim was to demonstrate how prompt tuning can be utilized to adapt large language models to specific tasks with relatively little data.

Prompt tuning represents a powerful technique in the NLP toolkit, allowing for flexible and efficient model adaptation. We encourage you to experiment with different prompts and tasks to explore the full potential of this approach.


In [None]:
 
from huggingface_hub import notebook_login  
notebook_login()  
  

In [7]:
from transformers import AutoModelForCausalLM, AutoTokenizer, default_data_collator, get_linear_schedule_with_warmup,BitsAndBytesConfig  
from peft import PeftModel,get_peft_config, get_peft_model, PromptTuningInit, PromptTuningConfig, TaskType, PeftType, PeftConfig  
import torch  
  
device = "cuda"  
peft_model_id = "save_model_path"
model_name_or_path = "meta-llama/Llama-2-7b-hf"  
  
bnb_config = BitsAndBytesConfig(  
    load_in_4bit=True,  
    bnb_4bit_quant_type="nf4",  
    bnb_4bit_compute_dtype=torch.float16,  
)  
  
  
config = PeftConfig.from_pretrained(peft_model_id)  
org_model = AutoModelForCausalLM.from_pretrained(model_name_or_path,trust_remote_code=True, device_map="auto",quantization_config=bnb_config)  
model = PeftModel.from_pretrained(org_model, peft_model_id)  
model.eval()  
model.to(device)  

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)  
if tokenizer.pad_token_id is None:  
    tokenizer.pad_token_id = tokenizer.eos_token_id  

In [None]:
  
dataset['test'][40]['question'], fetch_table_context(dataset['test'][40]['table'])  

[0;31m---------------------------------------------------------------------------[0m
[0;31mNameError[0m                                 Traceback (most recent call last)
File [0;32m<command-2407677650979138>, line 1[0m
[0;32m----> 1[0m [43mdataset[49m[[38;5;124m'[39m[38;5;124mtest[39m[38;5;124m'[39m][[38;5;241m40[39m][[38;5;124m'[39m[38;5;124mquestion[39m[38;5;124m'[39m], fetch_table_context(dataset[[38;5;124m'[39m[38;5;124mtest[39m[38;5;124m'[39m][[38;5;241m40[39m][[38;5;124m'[39m[38;5;124mtable[39m[38;5;124m'[39m])

[0;31mNameError[0m: name 'dataset' is not defined

In [None]:
stop_words_ids = [tokenizer.encode(stop_word) for stop_word in ["\n"]]  
  

In [None]:
def infer(model,input_text):  
    inputs = tokenizer(input_text,return_tensors="pt")  
    with torch.no_grad():  
        inputs = {k: v.to(device) for k, v in inputs.items()}  
        outputs = model.generate(  
            input_ids=inputs["input_ids"], attention_mask=inputs["attention_mask"], max_new_tokens=40)  
        print(outputs)  
      
    return tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)  
  
def parse(text):  
    return text.split("###SQL :")[1]  

In [None]:
input_text = """###question : What is the average score of students in maths   
 ###Table Columns : id:INT,name:text,subject:Text,score:INT  
 ###SQL : """  
   
predictions = infer(model,input_text)  
print(parse(predictions[0]))  



tensor([[    1,   835, 12470,   584,  1724,   338,   278,  6588,  8158,   310,
          8041,   297,  5844, 29879, 29871,    13,   835,  3562, 12481, 29879,
           584,  1178, 29901, 10192, 29892,   978, 29901,   726, 29892, 16009,
         29901,  1626, 29892, 13628, 29901, 10192,    13,   835,  4176,   584,
         29871,     1, 14262,  7228,    13, 26077,    13, 23196,  4345,  7228,
          4345,    13, 25145,    13, 23196,    13, 14332,    13, 26077,  4345,
          7228, 29905,    13, 26077,    13, 26077,  4345,    13, 26502,  4345,
          7228,  4345,    13, 23196,    13, 23196,    13, 19379,  4345,    13,
         27581]], device='cuda:0')
  февPA
 everybody
 nobodyMSPAMS
 kwiet
 nobody
 everyone
 everybodyMSPA\
 everybody
 everybodyMS
 вересняMSPAMS
 nobody
 nobody
 BegriffeMS
 hopefully


In [None]:
input_text = """###question : What is the highest score of dhoni in a match in chennai  
 ###Table Columns : id:INT,player:text,runs:INT,match:INT,year:INT,city:text  
 ###SQL : """  
   
predictions = infer(model,input_text)  
print(parse(predictions[0]))  

  sierpP. kwietPAMS
 hopefullyMS
 ultimately
 surelyMS
 HinweisMS
 nobody. броја 
 січняOPAMS 
 nobodyMS
 sierp. sierp
 paździer
 nobody


In [None]:
from transformers import (  
    AutoModelForCausalLM,  
    AutoTokenizer,  
    BitsAndBytesConfig,  
    TrainingArguments,  
)  
from transformers import AutoModelForCausalLM, AutoTokenizer, default_data_collator, get_linear_schedule_with_warmup  
from peft import PeftModel,get_peft_config, get_peft_model, PromptTuningInit, PromptTuningConfig, TaskType, PeftType, PeftConfig  
import torch  
  
device = "cuda"  
peft_model_id = "save_model_path"
model_name_or_path = "meta-llama/Llama-2-7b-hf"  
  
bnb_config = BitsAndBytesConfig(  
    load_in_4bit=True,  
    bnb_4bit_quant_type="nf4",  
    bnb_4bit_compute_dtype=torch.float16,  
)  
  
model_untuned = AutoModelForCausalLM.from_pretrained(model_name_or_path,trust_remote_code=True, device_map="auto",quantization_config=bnb_config,use_flash_attention_2=True)  
# model = get_peft_model(model, peft_config)  
# print(model.print_trainable_parameters())  

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [None]:
input_text = """###Instruction Convert question in natural language to SQL  
###question : What is the average score of students in maths   
 ###Table Columns : id:INT,name:text,subject:Text,score:INT  
 ###SQL : """  
   
predictions = infer(model_untuned,input_text)  
print(parse(predictions[0]))  

tensor([[    1,   835,  3379,  4080, 14806,  1139,   297,  5613,  4086,   304,
          3758,    13,  2277, 29937, 12470,   584,  1724,   338,   278,  6588,
          8158,   310,  8041,   297,  5844, 29879, 29871,    13,   835,  3562,
         12481, 29879,   584,  1178, 29901, 10192, 29892,   978, 29901,   726,
         29892, 16009, 29901,  1626, 29892, 13628, 29901, 10192,    13,   835,
          4176,   584, 29871,    13, 29871, 29914, 29991, 29871,   306, 29892,
            13,  8778,   313,    13,   435, 29892,   313, 29896, 29933, 29874,
         29871, 29871, 29892,    13,  2648, 29906, 29889,  8778,    13, 29871,
         29892,   491,   491, 29871,   313,   313, 29892,   304,   491, 29889,
           922,    13,  3148]], device='cuda:0')
 
 /!  I,
 Home (
 J, (1Ba  ,
 By2. Home
 , by by  ( (, to by. Se
 US


In [None]:
input_text = """###Instruction Convert question in natural language to SQL  
###question : What is the highest score of dhoni in a match in chennai  
 ###Table Columns : id:INT,player:text,runs:INT,match:INT,year:INT,city:text  
 ###SQL : """  
   
predictions = infer(model_untuned,input_text)  
print(parse(predictions[0]))  

**Input Question and Table Schema**\
 What is the average score of students in maths \
 (id:INT, name:text, subject:Text, score:INT)

**Actual Output**\
SELECT AVG(score) FROM table WHERE subject = 'maths'

**Hard Prompt Output**\
select avg(score) from 
  (select id,name,subject,score from 
  (select id,name,subject,score from 
  (select

 **Prompt Tuned Output**\
 'SELECT AVG score FROM table WHERE subject = maths'

**Input Question and Table Schema**\

 What is the highest score of dhoni in a match in chennai \
 (id:INT,player:text,runs:INT,match:INT,year:INT,city:text)

**Actual Output**\
SELECT MAX(runs) FROM table WHERE match = 'chennai' AND player = 'dhoni'

**Hard Prompt Output**\
SELECT player,runs,year,city FROM cricket_matches WHERE player='Dhoni' AND year='2010' AND city='Chennai' AND runs='

 **Prompt Tuned Output**\
 'SELECT MAX runs  FROM table WHERE match = chennai AND player = dhoni '

**Parameter:** \
 trainable params: 32,768 || all params: 6,738,448,384 || trainable%: 0.0004862840543203603
 
**Computation Summary** :\
Worker Type: g4dn.xlarge(16GB Memory 1GPU) 2-8 Workers\
Driver Type: g4dn.xlarge(16GB Memory 1GPU)

**Time Took to train**
15 Minutes

**Unique Tokens**
15k Tokens

**Non-Unique Tokens**
150k Tokens