TRL Medical Reasoning

Fine-tuning language models for medical reasoning using TRL (Transformers Reinforcement Learning) and LoRA (Low-Rank Adaptation).

Overview

This project demonstrates efficient fine-tuning of small language models on medical reasoning tasks using:

TRL - Transformers Reinforcement Learning library for supervised fine-tuning
LoRA - Low-Rank Adaptation for memory-efficient training
SmolLM2-135M-Instruct - Lightweight base model

Important

Chain of Thought (CoT): SmolLM2-135M-Instruct does not have chain of thought (CoT) reasoning, but I chose it because it was a very small model that could be fine-tuned on this dataset in a few hours.

Project Structure

trl_medical_reasoning_training.ipynb - Fine-tuning notebook with TRL and LoRA
trl_medical_reasoning_inference.ipynb - Model inference and evaluation
results/ - Training checkpoints and model artifacts
lora_adapter/ - Saved LoRA adapters

Tip

Training UI: The training logs are sent to MLFlow. During training you can start the MLFlow server locally by running pixi run -e cuda mlflow ui. The UI will be at https://localhost:5000

Getting Started

Prerequisites

Python 3.11+
CUDA-compatible GPU (recommended)
Dependencies managed via pixi.toml

Installation

# Install pixi (if not already installed)
curl -fsSL https://pixi.sh/install.sh | bash

# Install project dependencies
pixi install

Training

Open and run trl_medical_reasoning_training.ipynb to:

Load and preprocess medical reasoning datasets
Configure LoRA adapters and quantization
Fine-tune the model using TRL's SFTTrainer
Save checkpoints and adapters

Inference

Use trl_medical_reasoning_inference.ipynb to:

Load trained adapters
Run inference on medical reasoning queries
Evaluate model performance

Documentation

View the full documentation at the Jupyter Book site.

License

Apache v2.0. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
.nojekyll		.nojekyll
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml
_toc.yml		_toc.yml
config.yaml		config.yaml
pixi.lock		pixi.lock
pixi.toml		pixi.toml
ruff.toml		ruff.toml
trl_medical_reasoning_inference.ipynb		trl_medical_reasoning_inference.ipynb
trl_medical_reasoning_training.ipynb		trl_medical_reasoning_training.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TRL Medical Reasoning

Overview

Project Structure

Getting Started

Prerequisites

Installation

Training

Inference

Documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TRL Medical Reasoning

Overview

Project Structure

Getting Started

Prerequisites

Installation

Training

Inference

Documentation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages