Skip to content

ucpresearch/ozen

Repository files navigation

Ozen

A Python-based acoustic analysis and annotation tool inspired by Praat, built for rapid waveform annotation with extended acoustic measurements.

Authors

  • Uriel Cohen Priva (@ucpresearch) - Design, testing, and vibe-coding
  • Claude (Anthropic) - Implementation

Features

  • Waveform and Spectrogram Display - Synchronized views with zoom/pan
  • Acoustic Overlays - Pitch, formants, intensity, center of gravity, HNR
  • Audio Playback - Play selections, visual cursor tracking
  • Annotation System - Multiple tiers, Praat TextGrid import/export
  • Data Collection Points - Click to mark positions and capture acoustic measurements
  • Undo Support - Ctrl+Z for boundary and label changes

Installation

macOS / Linux

# Clone the repository
git clone https://github.com/ucpresearch/ozen.git
cd ozen

# Create and activate virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

Windows

# Clone the repository
git clone https://github.com/ucpresearch/ozen.git
cd ozen

# Create and activate virtual environment
python -m venv .venv
.venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Note: On Windows, sounddevice includes PortAudio automatically. On macOS/Linux, you may need to install it separately:

  • macOS: brew install portaudio
  • Ubuntu/Debian: sudo apt install portaudio19-dev

Updating

cd ozen
git pull
pip install -r requirements.txt  # if dependencies changed

On Windows, remember to activate the virtual environment first with .venv\Scripts\activate.

Usage

Basic Usage

# Open the application
python -m ozen

# Open with an audio file
python -m ozen audio.wav

With TextGrid File

# Import existing TextGrid annotations
python -m ozen audio.wav annotations.TextGrid

With Predefined Tier Names

# Create tiers automatically when audio loads
python -m ozen audio.wav -t words,phones

With Custom Config File

# Use a custom configuration file
python -m ozen audio.wav -c myconfig.yaml

Config files can customize colors, formant presets, default tiers, and more. See ozen/config.py for available options.

Keyboard Shortcuts

Playback

Key Action
Space Play selection / pause
Escape Stop playback
Tab Play visible window

Navigation

Key Action
↑ (Up arrow) Zoom in
↓ (Down arrow) Zoom out
← (Left arrow) Pan left
→ (Right arrow) Pan right
Scroll wheel Zoom in/out (centered on cursor)
Horizontal scroll Pan left/right

Annotation

Key Action
Double-click Add boundary at position
Enter Add boundary at cursor position
Delete Delete hovered boundary (highlighted in orange)
Ctrl+Z Undo (add/delete boundary, text changes)
Escape Deselect interval / close text editor
1-5 Switch to annotation tier 1-5

Annotation Workflow

  1. Select an interval - Click on a tier to select an interval
  2. Edit text - Type to add/edit the interval label
  3. Add boundaries - Double-click or press Enter to split intervals
  4. Delete boundaries - Hover over a boundary (turns orange) and press Delete
  5. Play interval - Click the green play button on selected intervals

Data Collection Points

Action Method
Add point Double-click on spectrogram
Move point Click and drag
Remove point Right-click → "Remove"
Copy all points Ctrl+C (copies visible measurements as TSV)
Export points File > Export Point Information...
Import points File > Import Point Information...

Data points capture acoustic measurements at specific time-frequency positions on the spectrogram. When you press Ctrl+C, all points are copied to the clipboard as tab-separated values, including only the measurements that are currently visible (checked in the overlay toggles). This allows quick data export to spreadsheets.

File Operations

Key Action
Ctrl+O Open audio file
Ctrl+S Save TextGrid (to current path, or prompts if none)
Ctrl+Shift+S Save TextGrid as...
Ctrl+C Copy all data points to clipboard (visible measurements only)

Save Behavior

  • Ctrl+S saves to the current TextGrid path if one exists (from opening a file or previous save)
  • If no path is set, Ctrl+S prompts for a location (same as Save As)
  • Auto-save: Every 60 seconds, annotations are saved to a .autosave backup file
  • Exit confirmation: If you have unsaved changes, you'll be prompted to save before closing
  • When starting with a non-existing TextGrid path, you'll be asked if you want to create it

Offline Rendering

Ozen includes a headless spectrogram renderer for generating publication-quality figures without the GUI. Useful for batch processing, scripting, and paper figures.

python -m ozen.render recording.wav -o fig.png --overlays pitch,formants --legend

Quick Examples

# Windowed view with annotations
python -m ozen.render recording.wav -o fig.pdf \
    --start 0.5 --end 2.0 \
    --overlays pitch,formants,intensity \
    --textgrid recording.TextGrid --tiers words,phones

# Wideband spectrogram with custom colormap
python -m ozen.render recording.wav -o fig.png \
    --bandwidth wideband --colormap inferno \
    --overlays pitch,formants --preset male

# Multiple colored data point sets
python -m ozen.render recording.wav -o fig.png \
    --overlays pitch,formants \
    --points red=midvowels.tsv --points blue=pitch-peaks.tsv

# Points as markers only (no vertical lines), semi-transparent
python -m ozen.render recording.wav -o fig.png \
    --overlays pitch,formants \
    --points "#4488CC"=vowels.tsv --point-markers-only --point-alpha 0.6

# Custom font for publication
python -m ozen.render recording.wav -o fig.pdf \
    --overlays pitch,formants --legend --font "Times New Roman"

Available Overlays

Name Description Display Range
pitch Fundamental frequency (F0) pitch-floor–ceiling Hz (log scale)
formants Formant frequencies F1–F4 direct Hz (red=narrow, pink=wide bandwidth)
intensity Sound pressure level 30–90 dB
cog Center of gravity (spectral centroid) direct Hz
hnr Harmonics-to-noise ratio −10–40 dB
spectral_tilt Low vs high frequency energy −20–+40 dB
a1p0 A1–P0 nasal ratio −20–+20 dB
nasal_murmur Low-frequency energy ratio 0–1

Data Point Options

Option Description
--points [COLOR=]FILE Data point TSV file (repeatable). Colors: names (red), hex (#4488CC, #FF880080 with alpha), grayscale (0.6)
--point-markers-only Draw only circle markers, omit vertical lines
--point-alpha FLOAT Opacity for all points, 0.0–1.0 (default: 1.0)

Figure Options

Option Description
--font FAMILY Font family for all text (e.g., Times New Roman, Helvetica)
--legend Show legend with overlay names, colors, and value ranges
--title TEXT Figure title
--width, --height Figure dimensions in inches
--dpi NUMBER DPI for raster output (default: 300)

Output Formats

PNG, PDF, SVG, and EPS.

Python API

from ozen.render import render_spectrogram

render_spectrogram(
    'recording.wav', 'fig.png',
    overlays=['pitch', 'formants'],
    textgrid_path='recording.TextGrid',
    legend=True,
    font='Helvetica',
    point_markers_only=True,
    point_alpha=0.8,
)

For the full manual with all options: python -m ozen.render --man

Supported Formats

Audio

  • WAV, FLAC, OGG, MP3

Annotations

  • Praat TextGrid (.TextGrid, .txt)

Requirements

  • Python 3.9+
  • PyQt6
  • pyqtgraph
  • praatfan (acoustic analysis - MIT licensed)
  • sounddevice
  • numpy, scipy
  • soundfile

Optional Acoustic Backends

Ozen supports multiple acoustic analysis backends. The default (praatfan) is pure Python and works everywhere. For better performance or compatibility, install additional backends:

Backend Install License Notes Repository
Praatfan (slow) Included MIT Pure Python, portable Praatfan
Praatfan (fast) use the release page MIT Rust, ~10x faster Praatfan
Praatfan (GPL) use the release page GPL Rust, from praatfan-core-rs Praatfan GPL
Praat (via Parselmouth) pip install praat-parselmouth GPL Original Praat bindings Praat, Parselmouth Website

Switch backends in the UI via the Backend dropdown, or set analysis.acoustic_backend in your config file.

Known Issues

macOS: Audio noise during playback

Setting waveform_line_width to greater than 1 in the config causes audio static/noise during playback on macOS. This appears to be a bug in Qt/pyqtgraph's rendering interaction with CoreAudio, not an issue with Ozen itself. The default is 1, which works fine. If you customize colors via a config file, keep this value at 1.

Acknowledgments

Ozen relies on the following projects for acoustic analysis:

praatfan - Clean-room reimplementation of Praat's acoustic algorithms in Parselmouth:

https://github.com/ucpresearch/praatfan-core-clean

Praat - The gold standard for phonetic analysis:

Boersma, Paul & Weenink, David (2024). Praat: doing phonetics by computer [Computer program]. Retrieved from http://www.praat.org/

Parselmouth - Python bindings for Praat (optional backend):

Jadoul, Y., Thompson, B., & de Boer, B. (2018). Introducing Parselmouth: A Python interface to Praat. Journal of Phonetics, 71, 1-15. https://doi.org/10.1016/j.wocn.2018.07.001

License

MIT - see LICENSE for details.

Note: The default acoustic backend (praatfan) is MIT-licensed, making Ozen fully MIT-compatible out of the box. If you install optional GPL backends (praat-parselmouth or praatfan_gpl), your deployment becomes GPL-licensed when using those backends.

About

Acoustic analysis and annotation tool based on Parselmouth and inspired by Praat

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages