Fine-tuning language models for medical reasoning using TRL (Transformers Reinforcement Learning) and LoRA (Low-Rank Adaptation).
This project demonstrates efficient fine-tuning of small language models on medical reasoning tasks using:
- TRL - Transformers Reinforcement Learning library for supervised fine-tuning
- LoRA - Low-Rank Adaptation for memory-efficient training
- SmolLM2-135M-Instruct - Lightweight base model
Important
Chain of Thought (CoT): SmolLM2-135M-Instruct does not have chain of thought (CoT) reasoning, but I chose it because it was a very small model that could be fine-tuned on this dataset in a few hours.
trl_medical_reasoning_training.ipynb- Fine-tuning notebook with TRL and LoRAtrl_medical_reasoning_inference.ipynb- Model inference and evaluationresults/- Training checkpoints and model artifactslora_adapter/- Saved LoRA adapters
Tip
Training UI: The training logs are sent to MLFlow.
During training you can start the MLFlow server locally by running
pixi run -e cuda mlflow ui. The UI will be at https://localhost:5000
- Python 3.11+
- CUDA-compatible GPU (recommended)
- Dependencies managed via
pixi.toml
# Install pixi (if not already installed)
curl -fsSL https://pixi.sh/install.sh | bash
# Install project dependencies
pixi installOpen and run trl_medical_reasoning_training.ipynb to:
- Load and preprocess medical reasoning datasets
- Configure LoRA adapters and quantization
- Fine-tune the model using TRL's SFTTrainer
- Save checkpoints and adapters
Use trl_medical_reasoning_inference.ipynb to:
- Load trained adapters
- Run inference on medical reasoning queries
- Evaluate model performance
View the full documentation at the Jupyter Book site.
Apache v2.0. See LICENSE for details.