LLMs From Scratch

This repository provides a from-scratch implementation, training, and testing setup for state-of-the-art Large Language Models (LLMs), including LLaMA-2 and Mistral (Base + Mixture of Experts). It also explores advanced optimization and distributed training techniques used in modern LLM pipelines.

Implemented Models

Llama-2
Mistral

Optimization Techniques Used

Model Compilation: For improved runtime performance.
Mixed Precision Training: Reduces memory usage and increases speed using torch.float16/bfloat16.
Flash Attention: High-performance attention mechanism for transformer models.

Distributed Training

DDP: Leveraged for efficient multi-GPU training with PyTorch

Project Structure

llms/
│
├── data/
│   └── input.txt      		# Training Dataset
│
├── results/
│   ├── Llama2.md           # LLaMA-2 results
│   └── Mistral.md          # Mistral results
│
├── src/
│	├── models/
│   │	├── llama2.py   	# LLaMA-2 model implementation
│   │   └── mistral.py 		# Mistral base and MoE models implementation
│   │
│   ├── dataloader.py       # Custom Dataloader   
│   ├── trainer.py        	# Generic training loop
│   └── utils.py       		# Utility functions
│
├── train.py   		   		# Training Script
├── test.py            		# Testing Script
│
└── README.md          		# Project Documentation

Running & Testing Instructions

Running As A Single Process

python3 train.py

NOTE Edit other parameters inside train.py as per the need before going for a run.

Testing

python3 test.py

NOTE Edit other parameters inside test.py as per the need before going for the run.

DDP Based Distributed Training

Single node multi-gpu setup

torchrun --standalone --nproc-per-node=<NUM_GPUS> train.py --ddp

NOTE Edit other parameters inside train.py as per the need before going for a run.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLMs From Scratch

Implemented Models

Optimization Techniques Used

Distributed Training

Project Structure

Running & Testing Instructions

Running As A Single Process

Testing

DDP Based Distributed Training

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
results		results
src		src
.gitignore		.gitignore
README.md		README.md
test.py		test.py
train.py		train.py

chiravdave/llms

Folders and files

Latest commit

History

Repository files navigation

LLMs From Scratch

Implemented Models

Optimization Techniques Used

Distributed Training

Project Structure

Running & Testing Instructions

Running As A Single Process

Testing

DDP Based Distributed Training

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages