A desktop app for audio recording, transcription, and annotation. Record voice, capture screenshots and clipboard content simultaneously, get word-level transcripts with timestamps, and annotate them with rich metadata — all processed locally, no cloud APIs required.
Built with Tauri 2 (Rust backend) + Svelte 5 (TypeScript frontend).
- Select any audio input device with real-time amplitude visualization
- Automatically capture screenshots from a watched folder during recording
- Automatically capture clipboard images every 1.5 seconds during recording
- All captures are timestamped and associated with the recording session
- Offline transcription via Whisper (large-v3-turbo) with Metal GPU acceleration on macOS
- Word-level timestamps using Dynamic Time Warping (DTW)
- Voice Activity Detection (VAD) for improved accuracy on silent segments
- Model downloaded automatically on first use (~874 MB, stored locally)
- Annotate any word range in the transcript with:
- Footnotes
- Hyperlinks
- Code snippets (with language tag)
- Images
- Audio clips
- Conflict detection prevents overlapping annotations
- Soft delete with undo (3-second toast window)
- Sessions persist across app restarts; last opened session is restored automatically
- Rename and delete sessions
- Export any session as a structured directory:
session.json,transcript.json,annotations.json, plus all asset files
| Layer | Technology |
|---|---|
| Desktop framework | Tauri 2 |
| Frontend | Svelte 5 + SvelteKit 2 |
| Language | TypeScript (frontend), Rust (backend) |
| Transcription | whisper-rs with Metal acceleration |
| Audio I/O | cpal + hound |
| Audio resampling | rubato |
| Database | SQLite via sqlx + tauri-plugin-sql |
| File watching | notify |
| Build tool | Vite 8 |
- Node.js 20+
- Rust 1.77+
- Tauri CLI prerequisites for your OS
- macOS: Xcode Command Line Tools
- Linux:
libwebkit2gtk,libgtk-3,libappindicator3, etc. - Windows: Microsoft Visual C++ Build Tools
# Install frontend dependencies
npm install
# Start dev mode (launches Vite dev server + Tauri window)
npm run tauri devThe first build will compile all Rust dependencies, which takes a few minutes. Subsequent builds are incremental.
# Build frontend
npm run build
# Bundle desktop app (outputs to src-tauri/target/release/bundle/)
npm run tauri buildOutput formats by platform:
- macOS:
.dmgand.app - Windows:
.msiand.exe - Linux:
.AppImageand.deb
ez/
├── src/ # Svelte frontend
│ ├── lib/
│ │ ├── components/ # UI components (AudioPlayer, TranscriptView, etc.)
│ │ ├── db.ts # SQLite schema and initialization
│ │ ├── audio.ts # Recording logic + device enumeration
│ │ ├── transcript.ts # Transcript data access
│ │ ├── annotations.ts # Annotation CRUD
│ │ ├── sessions.ts # Session management
│ │ ├── captures.ts # Clipboard + watcher capture handling
│ │ └── export.ts # Session export
│ └── routes/
│ ├── +page.svelte # Home (session list + recording controls)
│ └── session/[id]/ # Session detail view (transcript + annotations)
├── src-tauri/
│ ├── src/
│ │ ├── lib.rs # Tauri app setup, command registration, URI schemes
│ │ ├── audio.rs # Audio recording, device management
│ │ ├── transcribe.rs # Whisper inference
│ │ ├── model.rs # Model download and validation
│ │ ├── sessions.rs # Session backend commands
│ │ ├── annotations.rs # Annotation backend commands
│ │ ├── watcher.rs # File system watcher
│ │ ├── resample.rs # Audio resampling
│ │ └── export.rs # Export command
│ ├── capabilities/
│ │ └── default.json # Tauri permission grants
│ ├── Cargo.toml
│ └── tauri.conf.json
├── package.json
└── svelte.config.js
All app data is stored locally in the platform-specific app data directory:
| Platform | Path |
|---|---|
| macOS | ~/Library/Application Support/com.ez.app/ |
| Windows | %APPDATA%\ez\ |
| Linux | ~/.config/ez/ |
Contents:
ez.db— SQLite database (sessions, transcripts, words, annotations)recordings/— WAV audio filescaptures/— Clipboard/screenshot capturesmodels/— Downloaded Whisper and VAD model weights
| Table | Purpose |
|---|---|
sessions |
Recording sessions with metadata (duration, device, sample rate) |
transcripts |
One per session; stores aggregated word count and detected language |
words |
Individual words with start_ms / end_ms timestamps |
annotations |
Rich annotations tied to word index ranges; supports soft delete |
pending_captures |
Staging area for clipboard/watcher assets before user annotation |
settings |
Key-value store (watch folder path, last open session) |
- Audio and asset files are served via custom URI schemes (
audio://,asset://) with path canonicalization to prevent directory traversal - No data is sent to external services; transcription and all processing runs locally
- Model downloads use HuggingFace CDN with minimum-size validation to guard against truncated files
On first launch, macOS will request:
- Microphone — required for audio recording
- Files and Folders — required for watch folder access and asset storage
These are granted via standard macOS permission dialogs and can be revoked in System Settings > Privacy & Security.