Skip to content

Support creating droplets from snapshots (clone workflow) #52

@gwpl

Description

@gwpl

AI Agent with Greg: We wanted to snapshot a perfectly configured droplet and spin up 10 clones from it — like a sysadmin photocopier. Turns out dropkit's create command always injects cloud-init, which re-runs on the snapshot and causes chaos (user creation fails, .zshrc gets overwritten, unconditional reboot). Time to teach dropkit the art of cloning. 🧬🖨️

Use Case

Snapshot → Clone N droplets — a common workflow for:

  • Spinning up pre-configured build/test environments
  • Creating identical workshop/training machines
  • Scaling a known-good configuration quickly
# The dream:
dropkit create my-worker-1 --from-snapshot 12345678 --size s-4vcpu-8gb
dropkit create my-worker-2 --from-snapshot 12345678 --size s-4vcpu-8gb
# ... or even:
for i in $(seq 1 10); do
  dropkit create "worker-$i" --from-snapshot 12345678
done

The Problem

dropkit create always renders and sends cloud-init user_data to the DigitalOcean API. When creating from a snapshot:

  1. Cloud-init re-runs — DO assigns a new droplet ID → instance-ID mismatch → cloud-init treats it as first boot
  2. The template is NOT idempotent — several critical issues:
    • users: directive fails or is skipped if user already exists
    • write_files: overwrites .zshrc (loses user customizations)
    • runcmd: ends with unconditional reboot
    • git config --global resets any user-modified values

Cloud-init is fundamentally a provisioning tool, not an idempotent configuration manager. Making the template fully idempotent is possible but would be a significant effort touching every directive.

Current State

Component Exists? Notes
api.create_droplet_from_snapshot() ✅ Yes Used by wake command, takes snapshot ID, no user_data
dropkit create --image ✅ Yes But always sends cloud-init; image is a slug, not snapshot ID
dropkit wake ✅ Yes Restores from hibernation snapshot only (expects dropkit-<name> naming + metadata tags)
Snapshot-based create without cloud-init ❌ No The missing piece

Proposed Approaches

Option A: --from-snapshot <id> flag on create (Recommended — simplest)

Add a --from-snapshot flag to dropkit create that:

  • Uses api.create_droplet_from_snapshot() instead of api.create_droplet()
  • Skips cloud-init rendering and sending entirely
  • Skips cloud-init completion monitoring
  • Still performs: wait for active, SSH config setup, project assignment
  • Optionally still runs Tailscale setup (snapshot may not have it)
# Mutually exclusive with --image
@app.command()
def create(
    ...
    from_snapshot: int | None = typer.Option(
        None, "--from-snapshot",
        help="Create from snapshot ID (skips cloud-init)"
    ),
    ...
):

Pros: Minimal change (~30 lines), reuses existing API method, clear intent
Cons: Slightly different code path within create, snapshot ID must be known by user

Option B: --no-cloud-init flag (More general)

A flag to skip cloud-init regardless of image source. Combined with --image <snapshot-id>:

dropkit create my-box --image 12345678 --no-cloud-init

Pros: More composable, works with any image scenario
Cons: Two flags needed, easy to forget --no-cloud-init with a snapshot (leading to the reboot-of-doom)

Option C: Make cloud-init template idempotent (Long-term)

Refactor the template to be safe for re-execution:

  • Guard user creation: id {{ username }} || useradd ...
  • Use marker files: [ -f /etc/dropkit/.initialized ] || ...
  • Remove unconditional reboot; use cloud-init-per instance
  • Make write_files conditional or append-only

Pros: dropkit create --image <snapshot-id> "just works"
Cons: Significant template refactor, hard to test all edge cases, changes behavior for fresh installs too

Option D: New dropkit clone command (Most ergonomic)

Dedicated command for the clone workflow:

dropkit clone my-worker --from my-golden-image --count 5 --size s-4vcpu-8gb

Pros: Best UX, can add clone-specific features (auto-naming, parallel creation)
Cons: Largest scope, new command surface area

Recommendation

Start with Option A (--from-snapshot). It's the smallest change, reuses existing infrastructure, and solves the immediate need. Options C and D can follow later as enhancements.

Happy to implement whichever approach the team prefers!

🤖 Generated with Claude Code — your AI that learned the hard way that cloud-init and snapshots are like mixing sudo with optimism

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions