Cut the waffle. Keep the signal. Drop a talk, demo or screen recording in — get a tighter, louder-sounding edit out. One command.
autocut transcribes your video with word-level timestamps, kills silence,
fillers and restated sentences, optionally lets an LLM flag redundant
passages, and renders a clean result with a broadcast-standard audio chain
(EBU R128, −16 LUFS). No Premiere. No DaVinci. No manual scrubbing.
autocut talk.mp4 --speed 1.25 --protect 303:415 --llm
# → output/talk/talk_cut.mp4 · 12:14 → 6:27 · clean audio · usable immediatelyTypical screen recordings are 30–50 % pauses, "ums", and restatements. Manual editing is tedious; AI "magic editors" are opaque black boxes.
autocut is the opposite:
- Deterministic — every cut is in
cut_plan.json, you can inspect it. - Honest defaults — the
talkpreset is tuned on real lecture recordings; nothing touches your video until you like the plan (--dry-run). - Local-first — Parakeet ASR runs on your machine (CoreML on macOS, CUDA/CPU elsewhere). An LLM pass is optional and works with any OpenAI-compatible endpoint, including Ollama, LM Studio, or your own gateway.
- Zero lock-in — MIT, zero Python runtime deps.
pip install -e .and go.
| Workflow | What autocut does for you |
|---|---|
| Conference talks / lectures | Trims dead air, fillers, restarts → 20–40 % shorter, more watchable. |
| Internal demos & screencasts | Speeds up 1.25×, normalizes loudness across recordings. |
| Podcast raw cuts | Builds a first editable cut from raw takes — editors start from 80 %, not 0. |
| Customer calls → shareable clips | Protect a key range (--protect 303:415), tighten everything else. |
| Course creators | Reproducible pipeline — same settings → same style across dozens of videos. |
git clone https://github.com/trsdn/autocut.git
cd autocut
pip install -e . # or: uv pip install -e .That's it on macOS once ffmpeg is installed (brew install ffmpeg).
The FluidAudio Parakeet CLI auto-builds on first run.
| Tool | Purpose | Install |
|---|---|---|
ffmpeg / ffprobe |
required — audio/video pipeline | brew install ffmpeg · apt install ffmpeg |
| FluidAudio CLI | ASR, macOS (CoreML), ~150× realtime | auto-built into ~/.cache/autocut/ on first run |
| NeMo Parakeet | ASR, Windows / Linux / macOS | pip install 'autocut[nemo]' |
whisper.cpp |
ASR, portable minimal-deps fallback | brew install whisper-cpp |
Parakeet-TDT v3 is the same multilingual model across all three backends.
Drop a video into input/ and run:
autocut # picks newest video from ./input/
autocut my_video.mp4 # bare filename resolves against ./input/
autocut /abs/path/clip.mov # absolute paths work tooOutput lands under output/<stem>/<stem>_cut.mp4. Intermediates are cached
in output/<stem>/.work/ so repeat runs (different settings, same source)
skip re-transcription:
output/talk/
├── talk_cut.mp4
└── .work/
├── audio_16k.wav
├── transcript.json ← word-level timings
└── cut_plan.json ← every kept/removed range, inspectable
# Preserve a negotiation block exactly, speed up the rest
autocut talk.mp4 --speed 1.25 --protect 303:415
# Preview the plan without rendering
autocut talk.mp4 --dry-run
# Use an LLM to flag restated sentences (optional, any OpenAI-compatible API)
export AUTOCUT_LLM_API_KEY=sk-...
autocut talk.mp4 --llm
# Meeting preset: keep turn-taking pauses, don't touch fillers
autocut kickoff.mp4 --preset meeting
# Cross-platform (Windows / Linux)
pip install 'autocut[nemo]'
autocut talk.mp4 --asr-backend nemo source.mp4
│
▼
┌────────────────────┐ ┌──────────────────────────┐
│ 1. extract 16k wav │ ───▶ │ 2. ASR (word timestamps) │
└────────────────────┘ │ FluidAudio / NeMo / │
│ whisper.cpp │
└────────────┬──────────────┘
│
┌───────────────────────────┼───────────────────────────┐
▼ ▼ ▼
┌────────────────┐ ┌──────────────────┐ ┌────────────────┐
│ 3. silences │ │ 4. filler regex │ │ 5. LLM pass │
│ −35 / −40 dB │ │ um / äh / ähm │ │ (optional) │
└───────┬────────┘ └─────────┬────────┘ └───────┬────────┘
└──────────────┬─────────────┴──────────────────────────┘
▼
┌──────────────────────┐
│ 6. cut_plan.json │ ← every cut is auditable
│ keep + remove │
└──────────┬───────────┘
▼
┌──────────────────────────┐
│ 7. ffmpeg render │
│ split/trim/atrim · │
│ 40 ms fades · speed · │
│ highpass · comp · │
│ de-ess · loudnorm │
└──────────┬───────────────┘
▼
output.mp4
Every step is a small, independent module. Swap the ASR, tweak the audio chain, bring your own cut plan.
| Preset | Speed | Fillers | Narrator pauses | Protected pauses | Good for |
|---|---|---|---|---|---|
talk |
1.00× | cut | trim >0.35 s → 0.20 s | trim >0.60 s → 0.30 s | lectures, demos, tutorials |
meeting |
1.00× | keep | trim >0.80 s → 0.50 s | trim >1.20 s → 0.80 s | interviews, panels, calls |
vlog |
1.05× | cut | trim >0.25 s → 0.12 s | trim >0.50 s → 0.25 s | punchy solo-to-camera videos |
Every preset knob is also exposed as a flag — override any single value without abandoning the preset.
| Stage | Parameter |
|---|---|
| High-pass | 80 Hz |
| Compressor | thresh −22 dB · ratio 2.5 · atk 10 / rel 200 ms · makeup 2 |
| De-ess | 6500 Hz · −2.5 dB · Q=3 |
| Loudness norm. | EBU R128 · I = −16 LUFS · TP = −1.5 dBTP · LRA = 11 |
| Per-segment | 40 ms triangular fades (no clicks on cuts) |
Values come from iterating on real bilingual screen recordings — not from a textbook. Tweak freely via flags.
The LLM pass flags semantic cuts — restated sentences, aborted restarts, redundant examples — that regex can't catch. Config precedence:
- CLI flags:
--llm-base-url,--llm-api-key,--llm-model,--llm-max-tokens,--llm-temperature - Env vars:
AUTOCUT_LLM_BASE_URL,AUTOCUT_LLM_API_KEY,AUTOCUT_LLM_MODEL,AUTOCUT_LLM_MAX_TOKENS,AUTOCUT_LLM_TEMPERATURE - Config file:
~/.config/autocut/config.json(seeconfig.example.json)
Works with anything that speaks /v1/chat/completions:
OpenAI · Azure OpenAI · Ollama · LM Studio · together.ai · Groq · your own gateway.
Default: gpt-5.4-mini, 16 384 output tokens, temperature 0.1.
autocut [INPUT] [-o OUTPUT] [--root DIR]
[--preset talk|meeting|vlog]
[--speed FLOAT] [--fade-ms INT]
[--protect START:END ...] [--no-fillers] [--language en|de]
[--pause-above SEC] [--pause-keep SEC]
[--silence-db DB] [--silence-db-fine DB]
[--asr-backend fluidaudio|nemo|whisper-cpp]
[--whisper-model PATH] [--nemo-model HF_ID] [--transcript JSON]
[--llm] [--llm-base-url URL] [--llm-api-key KEY]
[--llm-model NAME] [--llm-max-tokens INT] [--llm-temperature FLOAT]
[--crf INT] [--preset-enc NAME]
[--loudnorm-i LUFS] [--loudnorm-tp DB]
[--compressor-ratio FLOAT] [--deess-db DB] [--highpass-hz HZ]
[--workdir DIR] [--keep-intermediates] [--dry-run]
Run autocut --help for the full list with descriptions.
- macOS (recommended) — FluidAudio Parakeet-TDT runs on CoreML at ~150× realtime. First run builds the CLI (~1 min) and downloads the model (~600 MB).
- Windows / Linux —
pip install 'autocut[nemo]', then--asr-backend nemo. Parakeet-TDT v3 via NeMo with chunked (30 s) inference; uses CUDA if available, falls back to CPU. First run downloads the model (~2.5 GB). - Minimal-deps fallback (any OS) —
--asr-backend whisper-cpp --whisper-model /path/to/ggml-large-v3.bin.
src/autocut/
├── cli.py # argparse entry point, presets, orchestration
├── audio.py # wav extraction / probe helpers
├── deps.py # external tool detection + FluidAudio bootstrap
├── transcribe.py # ASR backend dispatcher
├── nemo_backend.py # cross-platform Parakeet via NeMo (optional extra)
├── silence.py # silencedetect parsing + snap-to-boundary
├── plan.py # cut plan: fillers, pauses, redundancies
├── render.py # ffmpeg filter_complex render
└── llm.py # OpenAI-compatible redundancy detection (stdlib urllib)
Every module is under 200 lines. Read it. Fork it. Make it yours.
Issues and PRs welcome. This started as a tool to shorten a 12-minute negotiation demo; it generalised from there. If you hit an edge case — unusual language, weird microphone, exotic codec — open an issue with the source clip (or a redacted transcript) and I'll take a look.
MIT — see LICENSE.
- FluidAudio — CoreML Swift ASR runner
- NVIDIA Parakeet-TDT — the excellent multilingual ASR model
- NeMo toolkit — cross-platform inference
- whisper.cpp — portable fallback
- FFmpeg — without which none of this would exist