Skip to content

th1lo/ez

Repository files navigation

EZ

A desktop app for audio recording, transcription, and annotation. Record voice, capture screenshots and clipboard content simultaneously, get word-level transcripts with timestamps, and annotate them with rich metadata — all processed locally, no cloud APIs required.

Built with Tauri 2 (Rust backend) + Svelte 5 (TypeScript frontend).


Features

Recording & Capture

  • Select any audio input device with real-time amplitude visualization
  • Automatically capture screenshots from a watched folder during recording
  • Automatically capture clipboard images every 1.5 seconds during recording
  • All captures are timestamped and associated with the recording session

Transcription

  • Offline transcription via Whisper (large-v3-turbo) with Metal GPU acceleration on macOS
  • Word-level timestamps using Dynamic Time Warping (DTW)
  • Voice Activity Detection (VAD) for improved accuracy on silent segments
  • Model downloaded automatically on first use (~874 MB, stored locally)

Annotation

  • Annotate any word range in the transcript with:
    • Footnotes
    • Hyperlinks
    • Code snippets (with language tag)
    • Images
    • Audio clips
  • Conflict detection prevents overlapping annotations
  • Soft delete with undo (3-second toast window)

Session Management

  • Sessions persist across app restarts; last opened session is restored automatically
  • Rename and delete sessions
  • Export any session as a structured directory: session.json, transcript.json, annotations.json, plus all asset files

Tech Stack

Layer Technology
Desktop framework Tauri 2
Frontend Svelte 5 + SvelteKit 2
Language TypeScript (frontend), Rust (backend)
Transcription whisper-rs with Metal acceleration
Audio I/O cpal + hound
Audio resampling rubato
Database SQLite via sqlx + tauri-plugin-sql
File watching notify
Build tool Vite 8

Prerequisites

  • Node.js 20+
  • Rust 1.77+
  • Tauri CLI prerequisites for your OS
    • macOS: Xcode Command Line Tools
    • Linux: libwebkit2gtk, libgtk-3, libappindicator3, etc.
    • Windows: Microsoft Visual C++ Build Tools

Development

# Install frontend dependencies
npm install

# Start dev mode (launches Vite dev server + Tauri window)
npm run tauri dev

The first build will compile all Rust dependencies, which takes a few minutes. Subsequent builds are incremental.


Building

# Build frontend
npm run build

# Bundle desktop app (outputs to src-tauri/target/release/bundle/)
npm run tauri build

Output formats by platform:

  • macOS: .dmg and .app
  • Windows: .msi and .exe
  • Linux: .AppImage and .deb

Project Structure

ez/
├── src/                    # Svelte frontend
│   ├── lib/
│   │   ├── components/     # UI components (AudioPlayer, TranscriptView, etc.)
│   │   ├── db.ts           # SQLite schema and initialization
│   │   ├── audio.ts        # Recording logic + device enumeration
│   │   ├── transcript.ts   # Transcript data access
│   │   ├── annotations.ts  # Annotation CRUD
│   │   ├── sessions.ts     # Session management
│   │   ├── captures.ts     # Clipboard + watcher capture handling
│   │   └── export.ts       # Session export
│   └── routes/
│       ├── +page.svelte    # Home (session list + recording controls)
│       └── session/[id]/   # Session detail view (transcript + annotations)
├── src-tauri/
│   ├── src/
│   │   ├── lib.rs          # Tauri app setup, command registration, URI schemes
│   │   ├── audio.rs        # Audio recording, device management
│   │   ├── transcribe.rs   # Whisper inference
│   │   ├── model.rs        # Model download and validation
│   │   ├── sessions.rs     # Session backend commands
│   │   ├── annotations.rs  # Annotation backend commands
│   │   ├── watcher.rs      # File system watcher
│   │   ├── resample.rs     # Audio resampling
│   │   └── export.rs       # Export command
│   ├── capabilities/
│   │   └── default.json    # Tauri permission grants
│   ├── Cargo.toml
│   └── tauri.conf.json
├── package.json
└── svelte.config.js

Data Storage

All app data is stored locally in the platform-specific app data directory:

Platform Path
macOS ~/Library/Application Support/com.ez.app/
Windows %APPDATA%\ez\
Linux ~/.config/ez/

Contents:

  • ez.db — SQLite database (sessions, transcripts, words, annotations)
  • recordings/ — WAV audio files
  • captures/ — Clipboard/screenshot captures
  • models/ — Downloaded Whisper and VAD model weights

Database Schema

Table Purpose
sessions Recording sessions with metadata (duration, device, sample rate)
transcripts One per session; stores aggregated word count and detected language
words Individual words with start_ms / end_ms timestamps
annotations Rich annotations tied to word index ranges; supports soft delete
pending_captures Staging area for clipboard/watcher assets before user annotation
settings Key-value store (watch folder path, last open session)

Security

  • Audio and asset files are served via custom URI schemes (audio://, asset://) with path canonicalization to prevent directory traversal
  • No data is sent to external services; transcription and all processing runs locally
  • Model downloads use HuggingFace CDN with minimum-size validation to guard against truncated files

macOS Permissions

On first launch, macOS will request:

  • Microphone — required for audio recording
  • Files and Folders — required for watch folder access and asset storage

These are granted via standard macOS permission dialogs and can be revoked in System Settings > Privacy & Security.

About

Desktop app for local audio recording, Whisper transcription, and word-level annotation. Built with Tauri 2 + Svelte 5.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors