autocut

Cut the waffle. Keep the signal. Drop a talk, demo or screen recording in — get a tighter, louder-sounding edit out. One command.

autocut transcribes your video with word-level timestamps, kills silence, fillers and restated sentences, optionally lets an LLM flag redundant passages, and renders a clean result with a broadcast-standard audio chain (EBU R128, −16 LUFS). No Premiere. No DaVinci. No manual scrubbing.

autocut talk.mp4 --speed 1.25 --protect 303:415 --llm
# → output/talk/talk_cut.mp4 · 12:14 → 6:27 · clean audio · usable immediately

Why autocut

Typical screen recordings are 30–50 % pauses, "ums", and restatements. Manual editing is tedious; AI "magic editors" are opaque black boxes.

autocut is the opposite:

Deterministic — every cut is in cut_plan.json, you can inspect it.
Honest defaults — the talk preset is tuned on real lecture recordings; nothing touches your video until you like the plan (--dry-run).
Local-first — Parakeet ASR runs on your machine (CoreML on macOS, CUDA/CPU elsewhere). An LLM pass is optional and works with any OpenAI-compatible endpoint, including Ollama, LM Studio, or your own gateway.
Zero lock-in — MIT, zero Python runtime deps. pip install -e . and go.

Use cases

Workflow	What autocut does for you
Conference talks / lectures	Trims dead air, fillers, restarts → 20–40 % shorter, more watchable.
Internal demos & screencasts	Speeds up 1.25×, normalizes loudness across recordings.
Podcast raw cuts	Builds a first editable cut from raw takes — editors start from 80 %, not 0.
Customer calls → shareable clips	Protect a key range (`--protect 303:415`), tighten everything else.
Course creators	Reproducible pipeline — same settings → same style across dozens of videos.

Install

git clone https://github.com/trsdn/autocut.git
cd autocut
pip install -e .       # or: uv pip install -e .

That's it on macOS once ffmpeg is installed (brew install ffmpeg). The FluidAudio Parakeet CLI auto-builds on first run.

External dependencies

Tool	Purpose	Install
`ffmpeg` / `ffprobe`	required — audio/video pipeline	`brew install ffmpeg` · `apt install ffmpeg`
FluidAudio CLI	ASR, macOS (CoreML), ~150× realtime	auto-built into `~/.cache/autocut/` on first run
NeMo Parakeet	ASR, Windows / Linux / macOS	`pip install 'autocut[nemo]'`
`whisper.cpp`	ASR, portable minimal-deps fallback	`brew install whisper-cpp`

Parakeet-TDT v3 is the same multilingual model across all three backends.

Quick start

Drop a video into input/ and run:

autocut                       # picks newest video from ./input/
autocut my_video.mp4          # bare filename resolves against ./input/
autocut /abs/path/clip.mov    # absolute paths work too

Output lands under output/<stem>/<stem>_cut.mp4. Intermediates are cached in output/<stem>/.work/ so repeat runs (different settings, same source) skip re-transcription:

output/talk/
├── talk_cut.mp4
└── .work/
    ├── audio_16k.wav
    ├── transcript.json      ← word-level timings
    └── cut_plan.json        ← every kept/removed range, inspectable

Common recipes

# Preserve a negotiation block exactly, speed up the rest
autocut talk.mp4 --speed 1.25 --protect 303:415

# Preview the plan without rendering
autocut talk.mp4 --dry-run

# Use an LLM to flag restated sentences (optional, any OpenAI-compatible API)
export AUTOCUT_LLM_API_KEY=sk-...
autocut talk.mp4 --llm

# Meeting preset: keep turn-taking pauses, don't touch fillers
autocut kickoff.mp4 --preset meeting

# Cross-platform (Windows / Linux)
pip install 'autocut[nemo]'
autocut talk.mp4 --asr-backend nemo

How it works

 source.mp4
     │
     ▼
 ┌────────────────────┐      ┌──────────────────────────┐
 │ 1. extract 16k wav │ ───▶ │ 2. ASR (word timestamps) │
 └────────────────────┘      │  FluidAudio / NeMo /      │
                             │  whisper.cpp              │
                             └────────────┬──────────────┘
                                          │
              ┌───────────────────────────┼───────────────────────────┐
              ▼                           ▼                           ▼
     ┌────────────────┐         ┌──────────────────┐         ┌────────────────┐
     │ 3. silences    │         │ 4. filler regex  │         │ 5. LLM pass    │
     │ −35 / −40 dB   │         │ um / äh / ähm    │         │ (optional)     │
     └───────┬────────┘         └─────────┬────────┘         └───────┬────────┘
             └──────────────┬─────────────┴──────────────────────────┘
                            ▼
                 ┌──────────────────────┐
                 │ 6. cut_plan.json     │  ← every cut is auditable
                 │    keep + remove     │
                 └──────────┬───────────┘
                            ▼
                 ┌──────────────────────────┐
                 │ 7. ffmpeg render         │
                 │    split/trim/atrim ·    │
                 │    40 ms fades · speed · │
                 │    highpass · comp ·     │
                 │    de-ess · loudnorm     │
                 └──────────┬───────────────┘
                            ▼
                      output.mp4

Every step is a small, independent module. Swap the ASR, tweak the audio chain, bring your own cut plan.

Presets

Preset	Speed	Fillers	Narrator pauses	Protected pauses	Good for
`talk`	1.00×	cut	trim >0.35 s → 0.20 s	trim >0.60 s → 0.30 s	lectures, demos, tutorials
`meeting`	1.00×	keep	trim >0.80 s → 0.50 s	trim >1.20 s → 0.80 s	interviews, panels, calls
`vlog`	1.05×	cut	trim >0.25 s → 0.12 s	trim >0.50 s → 0.25 s	punchy solo-to-camera videos

Every preset knob is also exposed as a flag — override any single value without abandoning the preset.

Audio chain (`talk` preset)

Stage	Parameter
High-pass	80 Hz
Compressor	thresh −22 dB · ratio 2.5 · atk 10 / rel 200 ms · makeup 2
De-ess	6500 Hz · −2.5 dB · Q=3
Loudness norm.	EBU R128 · I = −16 LUFS · TP = −1.5 dBTP · LRA = 11
Per-segment	40 ms triangular fades (no clicks on cuts)

Values come from iterating on real bilingual screen recordings — not from a textbook. Tweak freely via flags.

LLM config (optional)

The LLM pass flags semantic cuts — restated sentences, aborted restarts, redundant examples — that regex can't catch. Config precedence:

CLI flags: --llm-base-url, --llm-api-key, --llm-model, --llm-max-tokens, --llm-temperature
Env vars: AUTOCUT_LLM_BASE_URL, AUTOCUT_LLM_API_KEY, AUTOCUT_LLM_MODEL, AUTOCUT_LLM_MAX_TOKENS, AUTOCUT_LLM_TEMPERATURE
Config file: ~/.config/autocut/config.json (see config.example.json)

Works with anything that speaks /v1/chat/completions: OpenAI · Azure OpenAI · Ollama · LM Studio · together.ai · Groq · your own gateway.

Default: gpt-5.4-mini, 16 384 output tokens, temperature 0.1.

CLI reference

autocut [INPUT] [-o OUTPUT] [--root DIR]
        [--preset talk|meeting|vlog]
        [--speed FLOAT] [--fade-ms INT]
        [--protect START:END ...] [--no-fillers] [--language en|de]
        [--pause-above SEC] [--pause-keep SEC]
        [--silence-db DB] [--silence-db-fine DB]
        [--asr-backend fluidaudio|nemo|whisper-cpp]
        [--whisper-model PATH] [--nemo-model HF_ID] [--transcript JSON]
        [--llm] [--llm-base-url URL] [--llm-api-key KEY]
        [--llm-model NAME] [--llm-max-tokens INT] [--llm-temperature FLOAT]
        [--crf INT] [--preset-enc NAME]
        [--loudnorm-i LUFS] [--loudnorm-tp DB]
        [--compressor-ratio FLOAT] [--deess-db DB] [--highpass-hz HZ]
        [--workdir DIR] [--keep-intermediates] [--dry-run]

Run autocut --help for the full list with descriptions.

Platform notes

macOS (recommended) — FluidAudio Parakeet-TDT runs on CoreML at ~150× realtime. First run builds the CLI (~1 min) and downloads the model (~600 MB).
Windows / Linux — pip install 'autocut[nemo]', then --asr-backend nemo. Parakeet-TDT v3 via NeMo with chunked (30 s) inference; uses CUDA if available, falls back to CPU. First run downloads the model (~2.5 GB).
Minimal-deps fallback (any OS) — --asr-backend whisper-cpp --whisper-model /path/to/ggml-large-v3.bin.

Project layout

src/autocut/
├── cli.py           # argparse entry point, presets, orchestration
├── audio.py         # wav extraction / probe helpers
├── deps.py          # external tool detection + FluidAudio bootstrap
├── transcribe.py    # ASR backend dispatcher
├── nemo_backend.py  # cross-platform Parakeet via NeMo (optional extra)
├── silence.py       # silencedetect parsing + snap-to-boundary
├── plan.py          # cut plan: fillers, pauses, redundancies
├── render.py        # ffmpeg filter_complex render
└── llm.py           # OpenAI-compatible redundancy detection (stdlib urllib)

Every module is under 200 lines. Read it. Fork it. Make it yours.

Contributing

Issues and PRs welcome. This started as a tool to shorten a 12-minute negotiation demo; it generalised from there. If you hit an edge case — unusual language, weird microphone, exotic codec — open an issue with the source clip (or a redacted transcript) and I'll take a look.

License

MIT — see LICENSE.

Acknowledgements

FluidAudio — CoreML Swift ASR runner
NVIDIA Parakeet-TDT — the excellent multilingual ASR model
NeMo toolkit — cross-platform inference
whisper.cpp — portable fallback
FFmpeg — without which none of this would exist

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
input		input
output		output
src/autocut		src/autocut
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
config.example.json		config.example.json
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

autocut

Why autocut

Use cases

Install

External dependencies

Quick start

Common recipes

How it works

Presets

Audio chain (`talk` preset)

LLM config (optional)

CLI reference

Platform notes

Project layout

Contributing

License

Acknowledgements

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

autocut

Why autocut

Use cases

Install

External dependencies

Quick start

Common recipes

How it works

Presets

Audio chain (talk preset)

LLM config (optional)

CLI reference

Platform notes

Project layout

Contributing

License

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Audio chain (`talk` preset)

Packages