# Chapter 4 - Fine tuning a LLM



Vision models popularized transfer learning. Transfer learning is taking an pre-trained model and training on it to perform new tasks. I hope its not blashphemy to say Fine tuning is the new name for transfer learning in LLM world. We will look into fine tuning an existing large language model in this chapter. For those from vision domain, will relate to freezing some layers in the source model and then perform transfer learning. The weights of the frozen layers are untouched, only those layers left unfrozen are modified. In some cases of transfer learning new layers are added to the existing layers are those are trained. Most of those concepts apply to fine tuning LLMs.

LLMs pre-trained on large corpus exibit in-context learning capability. Models like GPT-4, Gemini can perform new task on which it was not previous trained.By providing a prompt which comprises of few examples of the target task, these LLMs should perform the required job. We will see more about in-context learning in a later chapter.

So when do we fine tune? Say you are building a document summarization application for your domain. Your compnay does not have enough budget to purchase copilot or openai's API to access a high performance large language model. You end up picking a small language model (unlucky you, your company denies additional infrastructure, no 8 GPU machines for you to host your model). You try your hands with prompt engineering this small language model hoping to leverage incontext learning ability. No luck. Fine tuning is the answer in these scenario.

In full fine tuning, we allow the training to modify all weights in all the layers of the original model. This is very similar to our pre-training in last chapter, expect we do a full training for very small number of epochs using the new dataset. We will be introducing the transformers library from HuggingFace. For training, again we use the Trainer API from HuggingFace. HuggingFace is one of the fastest growing LLM ecosystem. They have most of the open source models and everyday contributors are adding newer models. As an LLM developer we believe HuggingFace transformers will be a good tool in your toolbox. Recently hugging face released a small LLM model SmoLLM trained completely on synthetic data. We train this model on Fyodor Dostoveskeys' The Brothers Karmazov novel.

Text classification systems are ubiquotous now. LLMs can be fine tuned to perform text classification tasks. The output of the



## Full fine tune

In this section, we will load a HuggingFace Model and do a complete finetuning with a text datasource.

In [1]:
import os
from pathlib import Path
import sys
from pathlib import Path

current_path = Path(os.getcwd())
save_directory = str(Path(current_path.parent.parent.absolute(), "bin","chapter4","fullfinetune"))
data_directory = str(Path(current_path.parent.parent.absolute(), "data", "chapter4"))
data_file_name = "brothers.txt"
data_file_path = Path(data_directory, data_file_name)
parent_path  = str(current_path.parent.absolute())

sys.path.append(parent_path)

In [2]:
import os
from transformers import AutoTokenizer, Trainer, TrainingArguments
from transformers import AutoModelForCausalLM, DataCollatorForLanguageModeling
import urllib.request
from datasets import load_dataset


model_path = "HuggingFaceTB/SmolLM-135M"
train_source_uri = "https://www.gutenberg.org/cache/epub/28054/pg28054.txt"

if not data_file_path.exists():
    print(f"Download {train_source_uri} to {str(data_file_path)}")
    with urllib.request.urlopen(train_source_uri) as response:
        with open(str(data_file_path), "wb") as out_file:
            out_file.write(response.read())
else:
    print(f"{str(data_file_path)} already exists. Skip Download")
    
            
dataset = load_dataset("text", data_files=str(data_file_path))

# Tokenizer and data loaders


/home/gopi/Documents/small_llm/llmbook/data/chapter4/brothers.txt already exists. Skip Download


In [3]:
tokenizer = AutoTokenizer.from_pretrained(model_path)
tokenizer.pad_token = tokenizer.eos_token

def tokenize(batch):
    return tokenizer(str(batch))

tokenized_dataset = dataset.map(tokenize)
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer,mlm =False)




In [16]:
model = AutoModelForCausalLM.from_pretrained(model_path)

start_context = "third son of Fyodor Pavlovitch Karamazov is"
prompt_tokens = tokenizer(start_context, return_tensors="pt")


output = model.generate(
    input_ids = prompt_tokens["input_ids"]
   ,max_length=100
   ,num_beams=2
   ,temperature=0.7
   ,top_k=50
   ,top_p=0.9
   ,no_repeat_ngram_size=2

)

tokenizer.decode(output[0], skip_special_tokens=True)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


'third son of Fyodor Pavlovitch Karamazov is the most famous of all the princes of Russia. He was born in 1765 in the village of Kostroma, in what is now the Russian Federation. His father, Ivan Pavlovich, was a military officer and a member of the Imperial Guard. Ivan was also a great-grandson of Ivan the Terrible, the last tsar of Muscovy.\nKaramuzov was'

## Full fine-tuning

In [4]:

train_args = TrainingArguments(
    output_dir = save_directory
   ,num_train_epochs = 1
   ,per_device_train_batch_size=16
   ,save_steps=500
   ,save_total_limit=2
   ,report_to="none" 
)

trainer = Trainer(
    model=model
   ,args =train_args
   ,data_collator=data_collator
   ,train_dataset=tokenized_dataset["train"]
)

trainer.train()


Step,Training Loss
500,3.0579
1000,2.8922
1500,2.871
2000,2.8354


AttributeError: 'Trainer' object has no attribute 'save'

In [5]:
trainer.save_model(save_directory)

In [6]:
new_model = AutoModelForCausalLM.from_pretrained(save_directory)

In [30]:
start_context = "third son of Fyodor Pavlovitch Karamazov is"
prompt_tokens = tokenizer(start_context, return_tensors="pt")


output = new_model.generate(
    input_ids = prompt_tokens["input_ids"]
   ,max_length=100
   ,num_beams=2
   ,temperature=0.7
   ,top_k=50
   ,top_p=0.9
   ,no_repeat_ngram_size=2

)

tokenizer.decode(output[0], skip_special_tokens=True)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


'third son of Fyodor Pavlovitch Karamazov is the only one to have been married to a Russian woman. The other two are the daughters of Ivan Pavlovich and Alyosha, who were married at the age of twenty-five. Ivan’s wife died in 1880, and the other was a widow. She was the daughter of a merchant, a man who had married a woman of the highest rank. He had been a captain in the'

In [12]:
prompt_tokens["input_ids"]

tensor([[14370,  4132,   282,   426,   105, 29672, 29396, 34331,  3224, 13463,
           332,  1437,   749,   314]])

## Freezing certain layers

In [31]:
for name, param in model.named_parameters():
     print(name, param.requires_grad)

model.embed_tokens.weight True
model.layers.0.self_attn.q_proj.weight True
model.layers.0.self_attn.k_proj.weight True
model.layers.0.self_attn.v_proj.weight True
model.layers.0.self_attn.o_proj.weight True
model.layers.0.mlp.gate_proj.weight True
model.layers.0.mlp.up_proj.weight True
model.layers.0.mlp.down_proj.weight True
model.layers.0.input_layernorm.weight True
model.layers.0.post_attention_layernorm.weight True
model.layers.1.self_attn.q_proj.weight True
model.layers.1.self_attn.k_proj.weight True
model.layers.1.self_attn.v_proj.weight True
model.layers.1.self_attn.o_proj.weight True
model.layers.1.mlp.gate_proj.weight True
model.layers.1.mlp.up_proj.weight True
model.layers.1.mlp.down_proj.weight True
model.layers.1.input_layernorm.weight True
model.layers.1.post_attention_layernorm.weight True
model.layers.2.self_attn.q_proj.weight True
model.layers.2.self_attn.k_proj.weight True
model.layers.2.self_attn.v_proj.weight True
model.layers.2.self_attn.o_proj.weight True
model.lay

In [None]:
## Task based Fine Tuning