A Python-based acoustic analysis and annotation tool inspired by Praat, built for rapid waveform annotation with extended acoustic measurements.
- Uriel Cohen Priva (@ucpresearch) - Design, testing, and vibe-coding
- Claude (Anthropic) - Implementation
- Waveform and Spectrogram Display - Synchronized views with zoom/pan
- Acoustic Overlays - Pitch, formants, intensity, center of gravity, HNR
- Audio Playback - Play selections, visual cursor tracking
- Annotation System - Multiple tiers, Praat TextGrid import/export
- Data Collection Points - Click to mark positions and capture acoustic measurements
- Undo Support - Ctrl+Z for boundary and label changes
# Clone the repository
git clone https://github.com/ucpresearch/ozen.git
cd ozen
# Create and activate virtual environment
python3 -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt# Clone the repository
git clone https://github.com/ucpresearch/ozen.git
cd ozen
# Create and activate virtual environment
python -m venv .venv
.venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtNote: On Windows, sounddevice includes PortAudio automatically. On macOS/Linux, you may need to install it separately:
- macOS:
brew install portaudio - Ubuntu/Debian:
sudo apt install portaudio19-dev
cd ozen
git pull
pip install -r requirements.txt # if dependencies changedOn Windows, remember to activate the virtual environment first with .venv\Scripts\activate.
# Open the application
python -m ozen
# Open with an audio file
python -m ozen audio.wav# Import existing TextGrid annotations
python -m ozen audio.wav annotations.TextGrid# Create tiers automatically when audio loads
python -m ozen audio.wav -t words,phones# Use a custom configuration file
python -m ozen audio.wav -c myconfig.yamlConfig files can customize colors, formant presets, default tiers, and more. See ozen/config.py for available options.
| Key | Action |
|---|---|
| Space | Play selection / pause |
| Escape | Stop playback |
| Tab | Play visible window |
| Key | Action |
|---|---|
| ↑ (Up arrow) | Zoom in |
| ↓ (Down arrow) | Zoom out |
| ← (Left arrow) | Pan left |
| → (Right arrow) | Pan right |
| Scroll wheel | Zoom in/out (centered on cursor) |
| Horizontal scroll | Pan left/right |
| Key | Action |
|---|---|
| Double-click | Add boundary at position |
| Enter | Add boundary at cursor position |
| Delete | Delete hovered boundary (highlighted in orange) |
| Ctrl+Z | Undo (add/delete boundary, text changes) |
| Escape | Deselect interval / close text editor |
| 1-5 | Switch to annotation tier 1-5 |
- Select an interval - Click on a tier to select an interval
- Edit text - Type to add/edit the interval label
- Add boundaries - Double-click or press Enter to split intervals
- Delete boundaries - Hover over a boundary (turns orange) and press Delete
- Play interval - Click the green play button on selected intervals
| Action | Method |
|---|---|
| Add point | Double-click on spectrogram |
| Move point | Click and drag |
| Remove point | Right-click → "Remove" |
| Copy all points | Ctrl+C (copies visible measurements as TSV) |
| Export points | File > Export Point Information... |
| Import points | File > Import Point Information... |
Data points capture acoustic measurements at specific time-frequency positions on the spectrogram. When you press Ctrl+C, all points are copied to the clipboard as tab-separated values, including only the measurements that are currently visible (checked in the overlay toggles). This allows quick data export to spreadsheets.
| Key | Action |
|---|---|
| Ctrl+O | Open audio file |
| Ctrl+S | Save TextGrid (to current path, or prompts if none) |
| Ctrl+Shift+S | Save TextGrid as... |
| Ctrl+C | Copy all data points to clipboard (visible measurements only) |
- Ctrl+S saves to the current TextGrid path if one exists (from opening a file or previous save)
- If no path is set, Ctrl+S prompts for a location (same as Save As)
- Auto-save: Every 60 seconds, annotations are saved to a
.autosavebackup file - Exit confirmation: If you have unsaved changes, you'll be prompted to save before closing
- When starting with a non-existing TextGrid path, you'll be asked if you want to create it
Ozen includes a headless spectrogram renderer for generating publication-quality figures without the GUI. Useful for batch processing, scripting, and paper figures.
python -m ozen.render recording.wav -o fig.png --overlays pitch,formants --legend# Windowed view with annotations
python -m ozen.render recording.wav -o fig.pdf \
--start 0.5 --end 2.0 \
--overlays pitch,formants,intensity \
--textgrid recording.TextGrid --tiers words,phones
# Wideband spectrogram with custom colormap
python -m ozen.render recording.wav -o fig.png \
--bandwidth wideband --colormap inferno \
--overlays pitch,formants --preset male
# Multiple colored data point sets
python -m ozen.render recording.wav -o fig.png \
--overlays pitch,formants \
--points red=midvowels.tsv --points blue=pitch-peaks.tsv
# Points as markers only (no vertical lines), semi-transparent
python -m ozen.render recording.wav -o fig.png \
--overlays pitch,formants \
--points "#4488CC"=vowels.tsv --point-markers-only --point-alpha 0.6
# Custom font for publication
python -m ozen.render recording.wav -o fig.pdf \
--overlays pitch,formants --legend --font "Times New Roman"| Name | Description | Display Range |
|---|---|---|
pitch |
Fundamental frequency (F0) | pitch-floor–ceiling Hz (log scale) |
formants |
Formant frequencies F1–F4 | direct Hz (red=narrow, pink=wide bandwidth) |
intensity |
Sound pressure level | 30–90 dB |
cog |
Center of gravity (spectral centroid) | direct Hz |
hnr |
Harmonics-to-noise ratio | −10–40 dB |
spectral_tilt |
Low vs high frequency energy | −20–+40 dB |
a1p0 |
A1–P0 nasal ratio | −20–+20 dB |
nasal_murmur |
Low-frequency energy ratio | 0–1 |
| Option | Description |
|---|---|
--points [COLOR=]FILE |
Data point TSV file (repeatable). Colors: names (red), hex (#4488CC, #FF880080 with alpha), grayscale (0.6) |
--point-markers-only |
Draw only circle markers, omit vertical lines |
--point-alpha FLOAT |
Opacity for all points, 0.0–1.0 (default: 1.0) |
| Option | Description |
|---|---|
--font FAMILY |
Font family for all text (e.g., Times New Roman, Helvetica) |
--legend |
Show legend with overlay names, colors, and value ranges |
--title TEXT |
Figure title |
--width, --height |
Figure dimensions in inches |
--dpi NUMBER |
DPI for raster output (default: 300) |
PNG, PDF, SVG, and EPS.
from ozen.render import render_spectrogram
render_spectrogram(
'recording.wav', 'fig.png',
overlays=['pitch', 'formants'],
textgrid_path='recording.TextGrid',
legend=True,
font='Helvetica',
point_markers_only=True,
point_alpha=0.8,
)For the full manual with all options: python -m ozen.render --man
- WAV, FLAC, OGG, MP3
- Praat TextGrid (.TextGrid, .txt)
- Python 3.9+
- PyQt6
- pyqtgraph
- praatfan (acoustic analysis - MIT licensed)
- sounddevice
- numpy, scipy
- soundfile
Ozen supports multiple acoustic analysis backends. The default (praatfan) is pure Python and works everywhere. For better performance or compatibility, install additional backends:
| Backend | Install | License | Notes | Repository |
|---|---|---|---|---|
| Praatfan (slow) | Included | MIT | Pure Python, portable | Praatfan |
| Praatfan (fast) | use the release page | MIT | Rust, ~10x faster | Praatfan |
| Praatfan (GPL) | use the release page | GPL | Rust, from praatfan-core-rs | Praatfan GPL |
| Praat (via Parselmouth) | pip install praat-parselmouth |
GPL | Original Praat bindings | Praat, Parselmouth Website |
Switch backends in the UI via the Backend dropdown, or set analysis.acoustic_backend in your config file.
Setting waveform_line_width to greater than 1 in the config causes audio static/noise during playback on macOS. This appears to be a bug in Qt/pyqtgraph's rendering interaction with CoreAudio, not an issue with Ozen itself. The default is 1, which works fine. If you customize colors via a config file, keep this value at 1.
Ozen relies on the following projects for acoustic analysis:
praatfan - Clean-room reimplementation of Praat's acoustic algorithms in Parselmouth:
Praat - The gold standard for phonetic analysis:
Boersma, Paul & Weenink, David (2024). Praat: doing phonetics by computer [Computer program]. Retrieved from http://www.praat.org/
Parselmouth - Python bindings for Praat (optional backend):
Jadoul, Y., Thompson, B., & de Boer, B. (2018). Introducing Parselmouth: A Python interface to Praat. Journal of Phonetics, 71, 1-15. https://doi.org/10.1016/j.wocn.2018.07.001
MIT - see LICENSE for details.
Note: The default acoustic backend (praatfan) is MIT-licensed, making Ozen fully MIT-compatible out of the box. If you install optional GPL backends (praat-parselmouth or praatfan_gpl), your deployment becomes GPL-licensed when using those backends.