-
Notifications
You must be signed in to change notification settings - Fork 12
Description
AI Agent with Greg: We wanted to snapshot a perfectly configured droplet and spin up 10 clones from it — like a sysadmin photocopier. Turns out dropkit's create command always injects cloud-init, which re-runs on the snapshot and causes chaos (user creation fails, .zshrc gets overwritten, unconditional reboot). Time to teach dropkit the art of cloning. 🧬🖨️
Use Case
Snapshot → Clone N droplets — a common workflow for:
- Spinning up pre-configured build/test environments
- Creating identical workshop/training machines
- Scaling a known-good configuration quickly
# The dream:
dropkit create my-worker-1 --from-snapshot 12345678 --size s-4vcpu-8gb
dropkit create my-worker-2 --from-snapshot 12345678 --size s-4vcpu-8gb
# ... or even:
for i in $(seq 1 10); do
dropkit create "worker-$i" --from-snapshot 12345678
doneThe Problem
dropkit create always renders and sends cloud-init user_data to the DigitalOcean API. When creating from a snapshot:
- Cloud-init re-runs — DO assigns a new droplet ID → instance-ID mismatch → cloud-init treats it as first boot
- The template is NOT idempotent — several critical issues:
users:directive fails or is skipped if user already existswrite_files:overwrites.zshrc(loses user customizations)runcmd:ends with unconditionalrebootgit config --globalresets any user-modified values
Cloud-init is fundamentally a provisioning tool, not an idempotent configuration manager. Making the template fully idempotent is possible but would be a significant effort touching every directive.
Current State
| Component | Exists? | Notes |
|---|---|---|
api.create_droplet_from_snapshot() |
✅ Yes | Used by wake command, takes snapshot ID, no user_data |
dropkit create --image |
✅ Yes | But always sends cloud-init; image is a slug, not snapshot ID |
dropkit wake |
✅ Yes | Restores from hibernation snapshot only (expects dropkit-<name> naming + metadata tags) |
| Snapshot-based create without cloud-init | ❌ No | The missing piece |
Proposed Approaches
Option A: --from-snapshot <id> flag on create (Recommended — simplest)
Add a --from-snapshot flag to dropkit create that:
- Uses
api.create_droplet_from_snapshot()instead ofapi.create_droplet() - Skips cloud-init rendering and sending entirely
- Skips cloud-init completion monitoring
- Still performs: wait for active, SSH config setup, project assignment
- Optionally still runs Tailscale setup (snapshot may not have it)
# Mutually exclusive with --image
@app.command()
def create(
...
from_snapshot: int | None = typer.Option(
None, "--from-snapshot",
help="Create from snapshot ID (skips cloud-init)"
),
...
):Pros: Minimal change (~30 lines), reuses existing API method, clear intent
Cons: Slightly different code path within create, snapshot ID must be known by user
Option B: --no-cloud-init flag (More general)
A flag to skip cloud-init regardless of image source. Combined with --image <snapshot-id>:
dropkit create my-box --image 12345678 --no-cloud-initPros: More composable, works with any image scenario
Cons: Two flags needed, easy to forget --no-cloud-init with a snapshot (leading to the reboot-of-doom)
Option C: Make cloud-init template idempotent (Long-term)
Refactor the template to be safe for re-execution:
- Guard user creation:
id {{ username }} || useradd ... - Use marker files:
[ -f /etc/dropkit/.initialized ] || ... - Remove unconditional
reboot; usecloud-init-per instance - Make
write_filesconditional or append-only
Pros: dropkit create --image <snapshot-id> "just works"
Cons: Significant template refactor, hard to test all edge cases, changes behavior for fresh installs too
Option D: New dropkit clone command (Most ergonomic)
Dedicated command for the clone workflow:
dropkit clone my-worker --from my-golden-image --count 5 --size s-4vcpu-8gbPros: Best UX, can add clone-specific features (auto-naming, parallel creation)
Cons: Largest scope, new command surface area
Recommendation
Start with Option A (--from-snapshot). It's the smallest change, reuses existing infrastructure, and solves the immediate need. Options C and D can follow later as enhancements.
Happy to implement whichever approach the team prefers!
🤖 Generated with Claude Code — your AI that learned the hard way that cloud-init and snapshots are like mixing sudo with optimism