# Low-Rank Adaptation (LoRA) Fine-Tuning Walkthrough

This notebook demonstrates an end-to-end workflow for adapting a transformer-based text classifier with LoRA (Low-Rank Adaptation). We'll cover dataset preparation, applying LoRA to the model, training, evaluation, and ideas for further optimization.

## Notebook Roadmap
- **Setup:** install required libraries and initialize the environment.
- **Step 1 ? Dataset preparation:** load a sentiment dataset (with a synthetic fallback) and prepare it for training.
- **Step 2 ? Apply LoRA:** wrap a base transformer classifier with a LoRA adapter.
- **Step 3 ? Fine-tune:** train the LoRA-augmented model efficiently.
- **Step 4 ? Evaluate:** measure accuracy, F1, and inspect predictions.
- **Step 5 ? Optimize:** explore levers for improving LoRA performance and portability.

### Environment & Dependency Setup
Run the next cell if you need to install packages. Restart the kernel after installing to ensure newly installed libraries are picked up.

In [1]:
%pip install -q --upgrade accelerate datasets evaluate peft transformers scikit-learn

Note: you may need to restart the kernel to use updated packages.


### Imports and configuration
We collect every dependency in one place and set deterministic behavior for reproducibility.

In [2]:
import logging
import os
import random
from dataclasses import dataclass

os.environ['USE_TF'] = '0'
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

import numpy as np
import torch
import evaluate
from datasets import Dataset, DatasetDict, load_dataset
from transformers import (
    AutoModelForSequenceClassification,
    AutoTokenizer,
    DataCollatorWithPadding,
    Trainer,
    TrainingArguments,
    set_seed,
)
from peft import LoraConfig, TaskType, get_peft_model
from sklearn.metrics import classification_report

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

MODEL_NAME = "distilbert-base-uncased"
SEED = 42
set_seed(SEED)
np.random.seed(SEED)
random.seed(SEED)
torch.manual_seed(SEED)

label_names = ["negative", "positive"]  # default; will update once data is loaded


  from .autonotebook import tqdm as notebook_tqdm


## Step 1 ? Prepare your dataset
We'll start with IMDb sentiment data. If it's unavailable (e.g., no network access), we automatically fall back to a small synthetic dataset so the end-to-end flow still works.

In [3]:
def build_synthetic_dataset(num_samples: int = 200):
    """Create a tiny sentiment dataset when external downloads are blocked."""
    positives = [
        "I absolutely loved this movie, it was fantastic!",
        "Great performances and an uplifting story.",
        "The product quality exceeded my expectations.",
        "Customer support was helpful and quick to respond.",
    ]
    negatives = [
        "This was a waste of time, I hated it.",
        "Terrible experience, would not recommend to anyone.",
        "Quality was disappointing and the item broke quickly.",
        "Customer service never replied to my emails.",
    ]
    texts, labels = [], []
    for _ in range(num_samples // 2):
        texts.append(random.choice(positives))
        labels.append(1)
        texts.append(random.choice(negatives))
        labels.append(0)
    data = {"text": texts, "label": labels}
    full_dataset = Dataset.from_dict(data)
    return full_dataset.shuffle(seed=SEED)


try:
    raw_datasets = load_dataset("imdb")
    logger.info("IMDb dataset loaded successfully.")
    # Downsample for faster experiments.
    raw_datasets["train"] = raw_datasets["train"].shuffle(seed=SEED).select(range(600))
    raw_datasets["test"] = raw_datasets["test"].shuffle(seed=SEED).select(range(240))
    label_names = raw_datasets["train"].features["label"].names or label_names
except Exception as exc:  # noqa: BLE001
    logger.warning("Falling back to synthetic dataset because of: %s", exc)
    synthetic = build_synthetic_dataset(num_samples=160)
    split = synthetic.train_test_split(test_size=0.2, seed=SEED)
    raw_datasets = DatasetDict({"train": split["train"], "test": split["test"]})

num_labels = len(set(raw_datasets["train"]["label"]))
label_names = label_names[:num_labels]
print(raw_datasets)


INFO:__main__:IMDb dataset loaded successfully.


DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 600
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 240
    })
    unsupervised: Dataset({
        features: ['text', 'label'],
        num_rows: 50000
    })
})


### Inspect a sample
It's good practice to sanity check a few examples before training.

In [None]:
raw_datasets["train"][:3]

{'text': ['There is no relation at all between Fortier and Profiler but the fact that both are police series about violent crimes. Profiler looks crispy, Fortier looks classic. Profiler plots are quite simple. Fortier\'s plot are far more complicated... Fortier looks more like Prime Suspect, if we have to spot similarities... The main character is weak and weirdo, but have "clairvoyance". People like to compare, to judge, to evaluate. How about just enjoying? Funny thing too, people writing Fortier looks American but, on the other hand, arguing they prefer American series (!!!). Maybe it\'s the language, or the spirit, but I think this series is more English than American. By the way, the actors are really good and funny. The acting is not superficial at all...',
  'This movie is a great. The plot is very true to the book which is a classic written by Mark Twain. The movie starts of with a scene where Hank sings a song with a bunch of kids called "when you stub your toe on the moon" It

## Step 1b ? Tokenize and prepare features
We tokenize the text, keep attention masks, and retain label ids. Padding is deferred to the data collator for efficiency.

In [5]:
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=True)

max_length = 256

def tokenize_function(batch):
    return tokenizer(
        batch["text"],
        padding=False,
        truncation=True,
        max_length=max_length,
    )


tokenized_datasets = raw_datasets.map(tokenize_function, batched=True, remove_columns=["text"])

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

print(tokenized_datasets)



Map:   0%|          | 0/600 [00:00<?, ? examples/s]


Map: 100%|██████████| 600/600 [00:00<00:00, 3463.09 examples/s]


Map: 100%|██████████| 600/600 [00:00<00:00, 3038.55 examples/s]





Map:   0%|          | 0/240 [00:00<?, ? examples/s]


Map: 100%|██████████| 240/240 [00:00<00:00, 2759.50 examples/s]

DatasetDict({
    train: Dataset({
        features: ['label', 'input_ids', 'attention_mask'],
        num_rows: 600
    })
    test: Dataset({
        features: ['label', 'input_ids', 'attention_mask'],
        num_rows: 240
    })
    unsupervised: Dataset({
        features: ['label', 'input_ids', 'attention_mask'],
        num_rows: 50000
    })
})





### Metrics helper
We combine accuracy and weighted F1. Evaluate will download metric definitions once and cache them.

In [6]:
accuracy_metric = evaluate.load("accuracy")
f1_metric = evaluate.load("f1")


def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return {
        "accuracy": accuracy_metric.compute(predictions=predictions, references=labels)["accuracy"],
        "f1_weighted": f1_metric.compute(predictions=predictions, references=labels, average="weighted")["f1"],
    }


## Step 2 ? Apply LoRA to the model
We load a compact base model and attach LoRA adapters on the attention projections. Only the low-rank adapter parameters will be trainable.

In [7]:
base_model = AutoModelForSequenceClassification.from_pretrained(
    MODEL_NAME,
    num_labels=num_labels,
)

lora_config = LoraConfig(
    task_type=TaskType.SEQ_CLS,
    r=16,
    lora_alpha=32,
    lora_dropout=0.15,
    target_modules=["q_lin", "v_lin"],
)

model = get_peft_model(base_model, lora_config)
model.print_trainable_parameters()


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 887,042 || all params: 67,842,052 || trainable%: 1.3075


## Step 3 ? Fine-tune the model with LoRA
Training arguments are purposely lightweight so you can iterate quickly on a CPU or single GPU machine. Adjust batch size and epochs as your hardware allows.

In [8]:
output_dir = "artifacts/distilbert-imdb-lora"
os.makedirs(output_dir, exist_ok=True)

training_args = TrainingArguments(
    output_dir=output_dir,
    learning_rate=2e-4,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=16,
    num_train_epochs=1,
    weight_decay=0.01,
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    metric_for_best_model="accuracy",
    greater_is_better=True,
    logging_steps=50,
    gradient_accumulation_steps=1,
    warmup_ratio=0.1,
    fp16=torch.cuda.is_available(),
    push_to_hub=False,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)


  trainer = Trainer(


INFO:azureml.core.run:Could not load the run context. Logging offline


### Train
Execute the cell below to start LoRA fine-tuning. Training artifacts land under `artifacts/` by default.

In [9]:
train_result = trainer.train()
metrics = train_result.metrics
metrics["train_samples"] = len(tokenized_datasets["train"])
trainer.log_metrics("train", metrics)
trainer.save_metrics("train", metrics)
trainer.save_state()




Epoch,Training Loss,Validation Loss,Accuracy,F1 Weighted
1,0.6564,0.58722,0.7125,0.702703


Attempted to log scalar metric loss:
0.6564
Attempted to log scalar metric grad_norm:
2.9516141414642334
Attempted to log scalar metric learning_rate:
7.761194029850747e-05
Attempted to log scalar metric epoch:
0.6666666666666666


Attempted to log scalar metric eval_loss:
0.5872203707695007
Attempted to log scalar metric eval_accuracy:
0.7125
Attempted to log scalar metric eval_f1_weighted:
0.7027028834323008
Attempted to log scalar metric eval_runtime:
48.024
Attempted to log scalar metric eval_samples_per_second:
4.998
Attempted to log scalar metric eval_steps_per_second:
0.312
Attempted to log scalar metric epoch:
1.0


Attempted to log scalar metric train_runtime:
430.3851
Attempted to log scalar metric train_samples_per_second:
1.394
Attempted to log scalar metric train_steps_per_second:
0.174
Attempted to log scalar metric total_flos:
40557717504000.0
Attempted to log scalar metric train_loss:
0.6436113866170248
Attempted to log scalar metric epoch:
1.0


***** train metrics *****
  epoch                    =        1.0
  total_flos               =    37772GF
  train_loss               =     0.6436
  train_runtime            = 0:07:10.38
  train_samples            =        600
  train_samples_per_second =      1.394
  train_steps_per_second   =      0.174


## Step 4 ? Evaluate the LoRA-fine-tuned model
We evaluate on the hold-out split, compute classification metrics, and inspect a few predictions to ensure qualitative quality.

In [10]:
eval_metrics = trainer.evaluate(tokenized_datasets["test"])
trainer.log_metrics("eval", eval_metrics)
trainer.save_metrics("eval", eval_metrics)
print(eval_metrics)




Attempted to log scalar metric eval_loss:
0.5872203707695007
Attempted to log scalar metric eval_accuracy:
0.7125
Attempted to log scalar metric eval_f1_weighted:
0.7027028834323008
Attempted to log scalar metric eval_runtime:
47.3901
Attempted to log scalar metric eval_samples_per_second:
5.064
Attempted to log scalar metric eval_steps_per_second:
0.317
Attempted to log scalar metric epoch:
1.0
***** eval metrics *****
  epoch                   =        1.0
  eval_accuracy           =     0.7125
  eval_f1_weighted        =     0.7027
  eval_loss               =     0.5872
  eval_runtime            = 0:00:47.39
  eval_samples_per_second =      5.064
  eval_steps_per_second   =      0.317
{'eval_loss': 0.5872203707695007, 'eval_accuracy': 0.7125, 'eval_f1_weighted': 0.7027028834323008, 'eval_runtime': 47.3901, 'eval_samples_per_second': 5.064, 'eval_steps_per_second': 0.317, 'epoch': 1.0}


### Classification report and examples

In [11]:
predictions = trainer.predict(tokenized_datasets["test"])
logits = predictions.predictions
labels = predictions.label_ids
pred_ids = np.argmax(logits, axis=-1)

print(classification_report(labels, pred_ids, target_names=label_names))

sample_texts = raw_datasets["test"]["text"][::max(1, len(raw_datasets["test"]) // 5)]
model.eval()
for text in sample_texts:
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=max_length).to(model.device)
    with torch.no_grad():
        outputs = model(**inputs)
        probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
    pred_label = label_names[probs.argmax(dim=-1).item()]
    confidence = probs.max().item()
    print(f"\nText: {text}\nPrediction: {pred_label} (confidence={confidence:.2%})")




              precision    recall  f1-score   support

         neg       0.87      0.53      0.66       125
         pos       0.64      0.91      0.75       115

    accuracy                           0.71       240
   macro avg       0.75      0.72      0.70       240
weighted avg       0.76      0.71      0.70       240




Text: <br /><br />When I unsuspectedly rented A Thousand Acres, I thought I was in for an entertaining King Lear story and of course Michelle Pfeiffer was in it, so what could go wrong?<br /><br />Very quickly, however, I realized that this story was about A Thousand Other Things besides just Acres. I started crying and couldn't stop until long after the movie ended. Thank you Jane, Laura and Jocelyn, for bringing us such a wonderfully subtle and compassionate movie! Thank you cast, for being involved and portraying the characters with such depth and gentleness!<br /><br />I recognized the Angry sister; the Runaway sister and the sister in Denial. I recognized the Abusive Husband and why he was there and then the Father, oh oh the Father... all superbly played. I also recognized myself and this movie was an eye-opener, a relief, a chance to face my OWN truth and finally doing something about it. I truly hope A Thousand Acres has had the same effect on some others out there.<br /><br /


Text: I saw Le Conseguenze Dell'Amore on the 2005 Rotterdam Filmfestival, It was the first of ten films I saw there.<br /><br />Le Conseguenze has left the most powerful impression of the ten films. From the first shot, you know the movie is going to be something special. The beautiful cinematography left me in awe of what can be done with a camera. The music is also on par with the visuals, complementing the colorful and stylish architecture-like images.<br /><br />Toni Servillo plays the main character in the film, Titta. He's a tax expert gone wrong who lives in a hotel. Every week, he brings a suitcase with money to a bank and the story plays around this.<br /><br />He is always very controlled and shows almost no emotion to anyone; Looks calculated and well-dressed. He has a habit of ignoring people who are of no significance to him. For example Sofia (played very nicely by Olivia Magnani), who works as a barmaid in the hotel where he lives. Although she's been working in the hot


Text: I see that C. Thomas Howell has appeared in many movies since his heyday in the 80s as an accomplished young actor.<br /><br />I bought this DVD because it was cheap and in part for the internet-related plot and to see how much older C. Thomas Howell is; I do not recall seeing him in any movies since the 1980s.<br /><br />In just a few words: what a very big disappointment. I give some low budget movies a chance, but this one started out lame. Within the first 15 minutes of the movie, this elusive woman is chatting with an Asian guy in a chatroom. They basically stimulate themselves to their own chat, she then insists on meeting the participant in person. She meets him, has sex, ties him up and then murders him in cold blood. The plot then deteriorates further.<br /><br />The plot is thin and flimsy and the acting is very stiff. Do not bother renting it much less purchasing it, even if it is in the $1 DVD bin. I plan to take my copy of the DVD to Goodwill. I am truly amazed that


Text: This is a cute little French silent comedy about a man who bets another that he can't stay in this castle for one hour due to its being haunted. And, once the guy enters the house, it looks much more like a crazed fun house or maybe like the after-effects of LSD!! While there ARE ghosts and skeletons, there is a weird menagerie of animals, odd special effects and gags as well. It's awfully hard to describe but the visuals alone make the film worth seeing. HOWEVER, understand that the self-indulgent director also had many "funny gags" that totally fell flat and hurt the movie. His "camera tricks" weren't so much tricky but annoying and stupid. IGNORE THESE AND KEEP WATCHING--it does get better. The film is fast paced, funny and worth seeing. In particular, I really liked watching the acting and mugging of Max Linder--he was so expressive and funny! Too bad he is virtually forgotten today. For an interesting but very sad read, check out the IMDb biography on him.
Prediction: pos (


Text: Like "The Blair Witch Project" before it, "Hatchet" has garnered its own fair share of publicity from the bottom-on-up (as an avid reader of Fangoria Magazine, the full-page ads are hard to miss); even after its middling theatrical run, the film is bound to subsist solely on the hype surrounding it, and will probably turn into a cult item at some point. With a MySpace URL and a mighty (if puzzlingly subjective) promise of preserving so-called "old school American horror," "Hatchet" will draw a lot of curiosity seekers with its DVD release (where that claim is emblazoned on the disc itself). Perhaps it was the large-print blurb from Ain't It Cool News on the ads that caused me to approach the film with some trepidation (it seems that Harry Knowles and his minions will approve of any film for VIP passes and free food), but "Hatchet" makes me question what writer-director Adam Green's idea of "old school American horror" really is: based on the evidence here, it means the insipid, 

## Step 5 ? Optimize LoRA for your task
- **Tune adapter rank (`r`) and scaling (`alpha`):** higher ranks boost capacity but increase parameters.
- **Target additional modules:** for encoder-only models you can adapt feed-forward layers by adding their linear module names to `target_modules`.
- **Adjust dropout:** `lora_dropout` helps regularize small datasets; try values between 0.0?0.3.
- **Freeze embeddings or layer norms:** combine LoRA with parameter freezing by toggling `.requires_grad` on selected modules.
- **Merge adapters for inference:** once satisfied, call `model.merge_and_unload()` to consolidate LoRA weights into the base model before exporting.
- **Monitor resource usage:** the `print_trainable_parameters()` output is a quick guardrail to ensure fine-tuning stays lightweight.

### Adapter merging example (optional)
Run this after training if you need a single merged checkpoint for deployment.

In [12]:
merged_output_dir = os.path.join(output_dir, "merged")
os.makedirs(merged_output_dir, exist_ok=True)

merged_model = model.merge_and_unload()
merged_model.save_pretrained(merged_output_dir)
tokenizer.save_pretrained(merged_output_dir)
print(f"Merged model saved to: {merged_output_dir}")


Merged model saved to: artifacts/distilbert-imdb-lora\merged


## Next steps
- Track experiments with logging tools (Weights & Biases, MLflow).
- Evaluate robustness on adversarial or out-of-domain samples.
- Convert the merged adapter to ONNX or TorchScript for production deployment.
- Iterate on LoRA hyperparameters via grid or Bayesian search to balance accuracy and efficiency.