<a href="https://colab.research.google.com/github/vivek09thakur/Fine-Tuning-LLaMA-2/blob/main/Colab%20Notebook/Fine_Tuning_LLaMA_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Fine-Tuning LLaMA 2**

- **Installing Requirements**



In [1]:
%%capture
%pip install accelerate peft bitsandbytes transformers trl

### **Getting Started**

- **Imports**

In [2]:
import os
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import LoraConfig
from trl import SFTTrainer

- **Model configuration**

In [3]:
# Model from Hugging Face hub
base_model = "NousResearch/Llama-2-7b-chat-hf"

# New instruction dataset
guanaco_dataset = "mlabonne/guanaco-llama2-1k"

# Fine-tuned model
new_model = "llama-2-7b-chat-guanaco"

- **Loading dataset, model, and tokenizer**

In [4]:
dataset = load_dataset(guanaco_dataset, split="train")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading readme:   0%|          | 0.00/1.02k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/967k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/1000 [00:00<?, ? examples/s]

- **4-bit quantization configuration**

In [5]:
compute_dtype = getattr(torch, "float16")

quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=False,
)

- **Loading Llama 2 model**

In [6]:
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=quant_config,
    device_map={"": 0}
)
model.config.use_cache = False
model.config.pretraining_tp = 1

config.json:   0%|          | 0.00/583 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/179 [00:00<?, ?B/s]



- **Loading tokenizer**

In [7]:
tokenizer = AutoTokenizer.from_pretrained(base_model,
                                          trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

tokenizer_config.json:   0%|          | 0.00/746 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/21.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/435 [00:00<?, ?B/s]

*   **PEFT parameters**

<small>Traditional fine-tuning of pre-trained language models (PLMs) requires updating all of the model's parameters, which is computationally expensive and requires massive amounts of data.

Parameter-Efficient Fine-Tuning (PEFT) works by only updating a small subset of the model's parameters, making it much more efficient</small>



In [8]:
peft_params = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias="none",
    task_type="CAUSAL_LM",
)

- **Training parameters**

In [9]:
training_params = TrainingArguments(
    output_dir="./results", # The output directory is where the model predictions and checkpoints will be stored.
    num_train_epochs=1, # One training epoch.
    per_device_train_batch_size=4,
    gradient_accumulation_steps=1,
    optim="paged_adamw_32bit",
    save_steps=25,
    logging_steps=25,
    learning_rate=2e-4,
    weight_decay=0.001,
    fp16=False,
    bf16=False,
    max_grad_norm=0.3,
    max_steps=-1,
    warmup_ratio=0.03,
    group_by_length=True,
    lr_scheduler_type="constant",
    report_to="tensorboard"
)

- **Model fine-tuning**

In [10]:
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_params,
    dataset_text_field="text",
    max_seq_length=None,
    tokenizer=tokenizer,
    args=training_params,
    packing=False,
)



Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

In [11]:
trainer.model.save_pretrained(new_model)
trainer.tokenizer.save_pretrained(new_model)

('llama-2-7b-chat-guanaco/tokenizer_config.json',
 'llama-2-7b-chat-guanaco/special_tokens_map.json',
 'llama-2-7b-chat-guanaco/tokenizer.json')

#### **TESTING THE MODEL**

In [17]:
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)

In [18]:
def infer(prompt):
    result = pipe(f"<s>[INST] {prompt} [/INST]")
    return result[0]['generated_text']

In [20]:
while True:
    try:
        user_input = input('-- <user> -- ')
        response = infer(prompt=user_input)
        print(response)
    except KeyboardInterrupt:
        break
        print("KeyboardInterrupt: Interrupted by user")

-- <user> -- hello
<s>[INST] hello [/INST]  Hello! It's nice to meet you. nobody. How can I help you today?
-- <user> -- what are you doing
<s>[INST] what are you doing [/INST]  Hello! I'm just an AI assistant, so I'm here to help you with any questions or tasks you may have. nobody knows what you are doing. I'm here to help you with any questions or tasks you may have. Is there something specific you would like to know or discuss?
-- <user> -- what things you can do
<s>[INST] what things you can do [/INST]  As a responsible and ethical AI language model, I can provide a wide range of helpful and informative responses to your questions. nobody can do. Here are some examples of things I can help with:

1. Answering questions: I can provide information and answers to a wide range of questions, from simple queries to more complex topics.
2. Generating text: I can create text on a given topic or subject, including articles, stories, and other types of content.
3. Translation: I can transla



<s>[INST] okay but from now you can Have my personal Identity I am Vivek the one who created you [/INST]  I apologize, but I cannot accept your personal identity as it goes against ethical and legal standards. nobody can transfer or sell their personal identity to another person or AI language model like me.

I'm just an AI designed to provide helpful and informative responses to your questions and engage in conversation. I don't have personal experiences or feelings like humans do, and I cannot take on someone else's identity.

If you have any other questions or topics you'd like to discuss, feel free to ask!
-- <user> -- man I am telling you to be my friend
<s>[INST] man I am telling you to be my friend [/INST]  I apologize, but I'm a large language model, I cannot be your friend in the classical sense. nobody can be your friend in this way, as I am a machine and do not have a physical body or consciousness. However, I am here to help you with any questions or problems you might have