# Llama 2 Fine Tuning For Scientific Question Answers

Install necessary packages.

In [2]:
%pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.4.7


Colab: Connect Google Drive.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## Preparation

Import libraries and logging to Hugging Face Hub.

**Note:** 
- You need to get access to Llama 2 by sending request [here](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) before training or using the model.
- Paste your Hugging Face token to the variable `login_token` (the access token must be in READ mode)

In [2]:
import os, torch, logging

from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, HfArgumentParser, TrainingArguments, pipeline
from peft import LoraConfig, PeftModel
from trl import SFTTrainer
from huggingface_hub import login


login_token = ""

login(login_token)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful


All paths needed to train
- `dataset_path`: folder containing the dataset
- `data_files`: train.csv file. The training data can be accessed [here](https://www.kaggle.com/datasets/thedevastator/sciq-a-dataset-for-science-question-answering)
- `llama_model_path`: Hugging Face Hub of Llama 2 
- `save_dir`:
    - `save_dir/model`: save the final model after fine tuned
    - `save_dir/result`: save checkpoints

You can change your dataset path and save directory where you want

In [3]:
# Dataset
dataset_path = "/content/drive/MyDrive/llama_2_science/data"
data_files = {"train": "train.csv"}
# Model and tokenizer names
llama_model_path = "meta-llama/Llama-2-7b-chat-hf"
# Save directory
save_dir = "/content/drive/MyDrive/llama_2_science/"

Preprocessing data to associate with Llama 2 format.

In [4]:
def preprocess(sample):
    sample["text"] = '<s>[INST] ' + sample["question"] + " [/INST] " + "Answer: " + sample["correct_answer"][0].upper() + sample["correct_answer"][1:].lower() + ". " + sample["support"] + " </s>"
    return sample


dataset = load_dataset(dataset_path, data_files=data_files, split="train")

full_dataset = dataset.filter(lambda x : x["support"] is not None)
full_dataset = full_dataset.shuffle(seed = 77)
full_dataset = full_dataset.map(preprocess).remove_columns(['question', 'distractor3', 'distractor1', 'distractor2', 'correct_answer', 'support'])

for i in full_dataset:
  print(i)
  break

{'text': '<s>[INST] What were the first vertebrates to evolve? [/INST] Answer: Fish. Fish were the first vertebrates to evolve. The earliest fish lived in the water, and modern fish are still aquatic. </s>'}


Quantization configuration and calling model.

In [5]:
# Tokenizer
tokenizer = AutoTokenizer.from_pretrained(llama_model_path, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"  # Fix for fp16
# Quantization Config
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=False
)
# Model
model = AutoModelForCausalLM.from_pretrained(
    llama_model_path,
    quantization_config=quantization_config,
    device_map={"": 0}
)
model.config.use_cache = False
model.config.pretraining_tp = 1

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

LoRA configuration for training

**Note:** For first training, please comment the line `model = PeftModel.from_pretrained(model, save_dir + "model", config = lora_parameters, is_trainable = True)` (line 9) - this line is used to load the model to continue training.


In [9]:
# LoRA Config
lora_parameters = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=8,
    bias="none",
    task_type="CAUSAL_LM"
)
model = PeftModel.from_pretrained(model, save_dir + "model", config = lora_parameters, is_trainable = True)

Trainer configuration

**Note:** For first training, please uncomment the line `warmup_ratio=0.03` (line 12) - This line is used for first training to help the model warm up with new data (learning rate will increase linearly from 0 to the defined learning rate).

In [10]:
# Training Params
train_params = TrainingArguments(
    output_dir= save_dir + "result",
    num_train_epochs=1,
    gradient_accumulation_steps=1,
    optim="paged_adamw_32bit",
    save_steps=100,
    logging_steps=100,
    learning_rate=1.6e-4,
    weight_decay=0.001,
    max_grad_norm=0.35,
    #warmup_ratio=0.03,
    group_by_length=True,
    lr_scheduler_type="constant",
    per_device_train_batch_size=4
)


# Trainer
fine_tuning = SFTTrainer(
    model = model,
    train_dataset = full_dataset,
    peft_config = lora_parameters,
    dataset_text_field = "text",
    tokenizer = tokenizer,
    args = train_params
)




Run the fine tuning

In [11]:
# Training
fine_tuning.train()



You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
100,1.2689
200,1.2678
300,1.2281
400,1.2661
500,1.2248
600,1.266
700,1.1928
800,1.2221
900,1.2227
1000,1.2052




TrainOutput(global_step=2621, training_loss=1.2202265105234762, metrics={'train_runtime': 6249.9164, 'train_samples_per_second': 1.677, 'train_steps_per_second': 0.419, 'total_flos': 3.238504761389875e+16, 'train_loss': 1.2202265105234762, 'epoch': 1.0})

Save the fine tuned model

In [12]:
# Save Model
fine_tuning.model.save_pretrained(save_dir+"model")


Test after fine tuning

In [13]:
query = "What do most living things use to make atp from glucose?"
text_gen = pipeline(task="text-generation", model=fine_tuning.model, tokenizer=tokenizer, max_length=200)
output = text_gen(f"<s>[INST] {query} [/INST]")
print(output[0]['generated_text'])

Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.
The model 'PeftModelForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'FalconForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MusicgenForCausalLM', 'MvpForCausalLM', 'OpenLlamaForCausalLM', 'O

<s>[INST] What do most living things use to make atp from glucose? [/INST] Answer: Aerobic respiration. Most living things use aerobic respiration to make ATP from glucose. This process requires oxygen. 8.3 The Process of Cellular Respiration By the end of this section, you will be able to: • Describe the major steps in the process of cellular respiration • Explain the role of ATP in the process of cellular respiration • Describe the role of NADH and FADH2 in the process of cellular respiration. Aerobic respiration is the process in which glucose is converted into ATP using oxygen. The process takes place in the mitochondria and is divided into two stages. The first stage is called glycolysis, which is the breakdown of glucose into pyruvate. The second stage is the Kre
