# Fine-Tuning a Qwen 1.5 Model and Logging to a Model Registry

This notebook demonstrates the process of fine-tuning a small-scale Qwen model (`Qwen/Qwen1.5-0.5B-Chat`) on a public instruction-based dataset. We will use Parameter-Efficient Fine-Tuning (PEFT) with LoRA to make the process memory-efficient.

**Key Steps:**
1.  **Setup**: Install required libraries and import necessary modules.
2.  **Configuration**: Define all parameters for the model, dataset, and training.
3.  **Data Preparation**: Load and prepare the dataset for instruction fine-tuning.
4.  **Model Loading and Fine-Tuning**: Load the pre-trained model and tokenizer, and then fine-tune it using `trl`'s `SFTTrainer`.
5.  **Evaluation**: Compare the performance of the base model with the fine-tuned model.
6.  **Model Logging**: Log the fine-tuned model and its metrics to a model registry.

## 1. Imports

First, once we installed the necessary Python libraries we will import all the required modules for the entire workflow.

In [None]:
import torch
from datasets import load_dataset
from peft import LoraConfig, get_peft_model, PeftModel
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    TrainingArguments,
)
from trl import SFTTrainer
import frogml # Assuming frogml is the library for your JFrog integration

## 2. Configuration

We'll define all our configurations in one place. This makes the notebook cleaner and easier to modify for future experiments.

In [None]:
# Model and tokenizer configuration
model_id = "Qwen/Qwen1.5-0.5B-Chat"
new_model_adapter = "qwen-0.5b-devops-adapter"

# Dataset configuration
dataset_name = "Szaid3680/Devops"

# LoRA configuration
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)

# Training arguments
training_args = TrainingArguments(
    output_dir="./qwen-finetuned",
    per_device_train_batch_size=1,
    gradient_accumulation_steps=8,
    learning_rate=2e-4,
    logging_steps=10,
    max_steps=1,
    fp16=False, # Ensure this is False for CPU/MPS
)

## 3. Data Preparation

We will load the `Szaid3680/Devops` dataset, split it into training and evaluation sets, and define a formatting function for instruction-based fine-tuning.

In [None]:
dataset = load_dataset(dataset_name, split="train")
dataset = dataset.train_test_split(test_size=0.1)
train_dataset = dataset["train"]
eval_dataset = dataset["test"]

# For a quick demo, we'll use a small subset of the data
train_dataset = train_dataset.select(range(2))
eval_dataset = eval_dataset.select(range(2))

def format_instruction(example):
    """Formats the dataset examples into a structured prompt."""
    instruction = example.get('Instruction', '')
    inp = example.get('Prompt', '')
    response = example.get('Response', '')
    
    full_prompt = f"<s>[INST] {instruction}\n{inp} [/INST] {response} </s>"
    return full_prompt

# Let's look at a sample from the training set
print("Sample from the training dataset:")
print(train_dataset[0])

## 4. Model Loading and Fine-Tuning

Now, we'll load the base model and tokenizer. Then, we will apply the LoRA configuration and start the fine-tuning process.

In [None]:

tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token


In [None]:
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="cpu" # Use CPU for local demo
)
# Apply LoRA configuration to the model
model = get_peft_model(model, lora_config)

# Create the SFTTrainer
trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    peft_config=lora_config,
    formatting_func=format_instruction,
    args=training_args,
)

print("--- Starting Fine-Tuning ---")
trainer.train()
print("--- Fine-Tuning Complete ---")

## 5. Evaluation

Let's evaluate the fine-tuned model and compare its response to the base model's response for a sample DevOps-related prompt.

In [None]:
metrics = trainer.evaluate()
print("--- Evaluation Metrics ---")
print(metrics)

In [None]:
# Save the trained model adapter
trainer.model.save_pretrained(new_model_adapter)

In [None]:
# Merge the LoRA adapter with the base model for easy inference
base_model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cpu")
finetuned_model = PeftModel.from_pretrained(base_model, new_model_adapter)
finetuned_model = finetuned_model.merge_and_unload()

# Define a prompt for evaluation
prompt = "How do I expose a deployment in Kubernetes using a service?"
messages = [
    {"role": "system", "content": "You are a helpful DevOps assistant."},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages, 
    tokenize=False, 
    add_generation_prompt=True
)


# Generate response from the fine-tuned model
print("------------------- FINE-TUNED MODEL RESPONSE -------------------")
model_inputs = tokenizer([text], return_tensors="pt").to("cpu")
# Store the length of the input prompt tokens
input_ids_len = model_inputs['input_ids'].shape[1]
generated_ids = finetuned_model.generate(model_inputs.input_ids, max_new_tokens=256)

# We keep all batch items (:) and slice each one from the end of the input length onwards.
response_only_ids = generated_ids[:, input_ids_len:]
response_finetuned = tokenizer.decode(response_only_ids[0], skip_special_tokens=True)
print(response_finetuned)

# Generate response from the original base model for comparison
print("\n------------------- BASE MODEL RESPONSE -------------------")
original_model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cpu")
generated_ids_base = original_model.generate(model_inputs.input_ids, max_new_tokens=256)
response_only_ids_base = generated_ids_base[:, input_ids_len:]
response_base = tokenizer.decode(response_only_ids_base[0], skip_special_tokens=True)
print(response_base)

## 6. Model Logging

Finally, we log our fine-tuned model, its tokenizer, and the evaluation metrics to the model registry.

In [None]:
# REPLACE WITH YOUR OWN FILESYSTEM BASE PATH WHERE THE PROJECTS RESIDE
base_projects_directory = "/root/jfrog"  # change it to your own projects path where the examples repo was cloned to

try:
    import frogml

    frogml.huggingface.log_model(   
    model= finetuned_model,
        tokenizer= tokenizer,
        repository="llm",    # The JFrog repository to upload the model to.
        model_name="devops_helper",     # The uploaded model name
        version="",     # Optional. The uploaded model version
        parameters={"finetuning-dataset": dataset_name},
        code_dir=f"{base_projects_directory}/qwak-examples/devops_helper/code_dir",
        dependencies=[f"{base_projects_directory}/qwak-examples/devops_helper/main/conda.yaml"],
        metrics = metrics,
        predict_file=f"{base_projects_directory}/qwak-examples/devops_helper/code_dir/predict.py"
        )
    print("--- Model Logged Successfully ---")
except Exception as e:
    print(f"An error occurred during model logging: {e}")