# **Political Concept Classifer Fine Tuning**

### The Logic 

1. We will be using the sloth to finetune the model 
2. PEFT will allow us to add LoRA adapters which will allow us to finetune the model on a smaller dataset
3. TRT will allow us to add the techinical configurations needed
4. We will download the GUFF file and save it in the political_concept_classifer folder
5. Upload the downloaded model to Hugging Face

### GPU check
Ensure that CUDA is available with a performant GPU for fast fine tuning

In [None]:
# For GPU check
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'None'}")

### Download all required dependencies

In [None]:
%pip uninstall -y unsloth peft
%pip install unsloth trl peft accelerate bitsandbytes

### Load pretrained model 

In [None]:
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = 'unsloth/Phi-3-mini-4k-instruct-bnb-4bit',
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True
)


### Preprocess data 
Tokenize data into vector embeddings and program the expected input (text) and output (category)

In [None]:
import json
from datasets import Dataset

with open("training.json", "r", encoding="utf-8") as f:
    data = json.load(f)

ds = Dataset.from_list(data)

def to_text(ex):
    resp = ex["category"]
    
    if not isinstance(resp, str):
        resp = json.dumps(resp, ensure_ascii=False)

    msgs = [
        {"role": "user", "content": ex["text"]},
        {"role": "assistant", "content": resp},
    ]
    return {
        "text": tokenizer.apply_chat_template(
            msgs, tokenize=False, add_generation_prompt=False
        )
    }

dataset = ds.map(to_text, remove_columns=ds.column_names)

### Add PEFT LoRA adapter
This modifies specific layers instead of the whole model, allowing us to fine tune a LLM with a small dataset

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 64,  # rank of matrices (for LoRA)
    target_modules=[
        'q_proj', 'k_proj', 'v_proj', 'o_proj',
        'gate_proj', 'up_proj', 'down_proj',
    ],  # which layers to inject LoRA into
    lora_alpha = 64 * 2,  # scaling factor, usually 2x rank
    lora_dropout = 0,  # no dropout, increase for regularizaiton
    bias = 'none',  # bias stays frozen, only learn the low-rank matrices
    use_gradient_checkpointing = 'unsloth',  # activate custom checkpointing scheme of Unsloth -> higher compute but less GPU memory when backpropagating
)

### Add training configurations
Training configurations needed to complete fine tuning process

In [None]:
from trl import SFTTrainer, SFTConfig

trainer = SFTTrainer(  # supervised fine-tuning trainer
    model = model,
    train_dataset = dataset,
    tokenizer = tokenizer,
    dataset_text_field = 'text',
    max_seq_length = 2048,
    args = SFTConfig(
        per_device_train_batch_size = 2,  # each GPU reads 2 tokenized sequences at once
        gradient_accumulation_steps = 4,  # accumulate loss for 4 iterations before optimizer step -> effective batch 2 * 4 = 8
        warmup_steps = 10,  # linearly "climb" to the learning rate from 0 in the first 10 steps
        max_steps = 60,  # max steps before stopping (unless epochs out before that)
        logging_steps = 1,  # log every single step
        output_dir = "outputs",  # where to store checkpoints, logs etc.
        optim = "adamw_8bit",  # 8-bit AdamW optimizer
        num_train_epochs = 3  # number of epochs, unless we reach 60 steps first
    ),
)

### Fine tune the model
This one-liner will do all the fine tuning computation needed

In [None]:
trainer.train()

### Test model 
Check for its accuracy on the `test.json` file

In [None]:
import json

FastLanguageModel.for_inference(model)

with open("training.json", "r", encoding="utf-8") as f:
    data = json.load(f)

correct = 0
total = data.__len__()

for item in data:
    messages = [
        {
            "role": "user",
            "content": item["text"]
        },
    ]

    # Turn messages to tensor and send to GPU
    inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")

    # Generate model response with max 512 tokens and 0.7 temperature, smallest set of tokens with cumulative probability of >= 0.9 are kept for random sampling
    outputs = model.generate(input_ids=inputs, max_new_tokens=512, use_cache=True, temperature=0.7, do_sample=True, top_p=0.9)

    response = tokenizer.batch_decode(outputs)[0]

    if (response == item["category"]):
        correct += 1

print(f"Accuracy: {correct}/{total} = {correct/total:.2%}")

### Save model as GGUF file

In [None]:
model.save_pretrained_gguf(
    "gguf_model_scratch_fixed", 
    tokenizer, 
    quantization_method="q4_k_m", 
    maximum_memory_usage = 0.3
)