GitHub - tscholak/devbox: Lambda Labs GPU instance management for hackers

     _           _
  __| | _____  _| |__   _____  __
 / _` |/ _ \ \/ / '_ \ / _ \ \/ /
| (_| |  __/\  /| |_) | (_) >  <
 \__,_|\___| \/ |_.__/ \___/_/\_\

Lambda Labs GPU instance management for hackers

Status: v1.0 - Opinionated & Evolving

A CLI tool for managing Lambda Labs GPU instances with persistent storage, cloud-init automation, and intelligent retry logic. Built for the workflow of repeatedly spinning up powerful GPU instances, doing work, and tearing them down—without losing your environment setup.

Philosophy

Persistent everything: Nix store and home directory survive instance restarts
Type-safe end-to-end: Pydantic models with discriminated unions for compile-time safety
Smart retries: Exponential backoff on capacity issues, fast failure on config errors
Visual consistency: Card-based output format everywhere, no visual surprises
Structural pattern matching: Python 3.10+ pattern matching throughout

Quick Start

# Clone and setup
git clone https://github.com/tscholak/devbox.git
cd devbox
python -m venv .venv
source .venv/bin/activate
pip install -e .

# Set your API key
export LAMBDA_API_KEY="your_key_here"

# Launch a GH200 instance
devbox command=up_gh200

# List instances
devbox command=list resource=instances

# SSH into instance
ssh gh200-devbox  # Auto-configured in ~/.ssh/config.d/

# Terminate when done
devbox command=down instance_identifier=gh200-devbox

Features

Persistent Storage

Instances use Lambda Labs persistent filesystems to maintain state across restarts:

/lambda/nfs/devbox/
├── nix/           # 100GB loop device for Nix store
└── ubuntu/        # Home directory bind mount

All your packages, configs, and project files survive instance termination.

Intelligent Launch Retry

Capacity issues? No problem. The CLI automatically retries with exponential backoff:

⚠ Insufficient capacity - retrying in 5.0s (attempt 1/21)
⚠ Insufficient capacity - retrying in 7.5s (attempt 2/21)
⚠ Insufficient capacity - retrying in 11.2s (attempt 3/21)

But fails fast on actual errors (auth, invalid params, quota exceeded).

Cloud-Init Automation

Instances automatically configure themselves on first boot:

Mount persistent filesystem
Create Nix store loop device (if needed)
Bind mount home directory
Install Nix + Home Manager
Configure as Flakes-enabled dev environment

Consistent Visual Language

Every command uses the same card-based format:

gh200-devbox
        ID:  646044a5f4ee41ff829613a9cbd22438
    Status:  active
        IP:  192.222.58.14
       SSH:  ssh gh200-devbox
    Region:  us-east-3
      Type:  gpu_1x_gh200

No jarring table borders, no inconsistent styling, no visual chaos.

Command Reference

List Resources

# List instances
devbox command=list resource=instances

# List available instance types (with capacity)
devbox command=list resource=instance_types

# List images
devbox command=list resource=images

# List filesystems
devbox command=list resource=filesystems

# List SSH keys
devbox command=list resource=ssh_keys

# List firewall rulesets
devbox command=list resource=firewall_rulesets

Launch Instances

# Launch with all options
devbox command=up \
  region=us-east-3 \
  instance_type=gpu_1x_gh200 \
  ssh_key_name=default \
  filesystem_name=devbox \
  instance_name=my-instance \
  image_id=e57d5fc7-36e7-4f9a-953f-3429c1f7e3b7 \
  wait_after_launch=true

# Or use the GH200 preset
devbox command=up_gh200

# Customize retry behavior
devbox command=up_gh200 \
  max_retries=50 \
  initial_backoff=10.0 \
  max_backoff=180.0 \
  backoff_multiplier=2.0

Manage Instances

# Wait for instance to be ready
devbox command=wait instance_identifier=gh200-devbox

# Get SSH command
devbox command=ssh instance_identifier=gh200-devbox

# Terminate instance
devbox command=down instance_identifier=gh200-devbox

Configuration

Built on Hydra for composable configuration:

src/devbox/conf/
├── command/
│   ├── up.yaml          # Base up command config
│   ├── up_gh200.yaml    # GH200-specific preset
│   ├── list.yaml
│   ├── wait.yaml
│   ├── down.yaml
│   └── ssh.yaml
├── api/
│   └── default.yaml     # API client settings
├── wait/
│   └── default.yaml     # Wait/polling settings
└── ssh/
    └── default.yaml     # SSH connection settings

Override any setting via CLI:

devbox command=up_gh200 instance_name=experiment-007 max_retries=100

Why This Exists

Lambda Labs offers incredible GPU compute at reasonable prices, but the workflow of managing ephemeral instances with persistent development environments isn't trivial. This tool encodes my opinionated workflow:

One filesystem per region with Nix store + home directory
Named instances with auto-generated SSH configs
Smart retries because GH200s are in high demand
Type safety because runtime errors at $2/hr are expensive
Visual consistency because mental context switching is costly

If this matches your workflow, great. If not, fork it and make it yours.

Opinions Baked In

Nix/NixOS: The cloud-init assumes you want Nix
Loop devices: 100GB ext4 image for Nix store instead of direct NFS (performance)
Home Manager: Configured but not auto-installed (you do that)
SSH config.d/: Uses per-instance config files instead of one giant config
Card-based UI: Small 2-column tables, no borders, cyan headers everywhere
Exponential backoff: Because capacity is more available at odd hours
Python 3.13: Uses latest type hints and pattern matching

License

MIT - Do whatever you want with it

Contributing

This is a personal tool made public. PRs welcome but I reserve the right to be opinionated about what gets merged. Fork freely if our opinions diverge.

Built with Python 3.13, Pydantic 2.x, Hydra 1.3, Rich, and strong opinions.

Includes Lambda Cloud API spec v1.8.3

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
src		src
.envrc		.envrc
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
blog-post.md		blog-post.md
devenv.lock		devenv.lock
devenv.nix		devenv.nix
devenv.yaml		devenv.yaml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Philosophy

Quick Start

Features

Persistent Storage

Intelligent Launch Retry

Cloud-Init Automation

Consistent Visual Language

Command Reference

List Resources

Launch Instances

Manage Instances

Configuration

Why This Exists

Opinions Baked In

License

Contributing

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Uh oh!

License

Uh oh!

tscholak/devbox

Folders and files

Latest commit

History

Repository files navigation

Philosophy

Quick Start

Features

Persistent Storage

Intelligent Launch Retry

Cloud-Init Automation

Consistent Visual Language

Command Reference

List Resources

Launch Instances

Manage Instances

Configuration

Why This Exists

Opinions Baked In

License

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages