Skip to content

thiswillbeyourgithub/AudioCrowd

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AudioCrowd

⚠️ WARNING — project unreliable, no longer used

AudioCrowd ended up being unreliable in practice. I have switched to my own fork of Mozilla Common Voice, with my upstream changes to make self-hosting easier: https://github.com/thiswillbeyourgithub/common-voice/tree/enh-self-hosting

(Note: this is my personal forkthiswillbeyourgithub/common-voice, branch enh-self-hosting — not the official Mozilla repo.)

This repository is left here for reference but is not recommended for new deployments.

A collaborative Gradio web UI where multiple volunteers record themselves speaking sentences to build an ASR (Automatic Speech Recognition) fine-tuning dataset.

Features

  • Multi-user support: volunteers authenticate via a simple CSV file and record simultaneously without conflicts
  • Automatic sentence assignment: each user gets 5 sentences from a shared pool; new sentences are drawn automatically as recordings are completed
  • Auto-save: recordings are saved as soon as the user stops recording -- no manual save button
  • Audio processing: recordings are converted to 16 kHz mono WAV with silence trimming
  • NeMo-compatible output: metadata is appended to a JSONL manifest compatible with NVIDIA NeMo
  • Flagging: users can flag problematic samples for later review
  • Skip & discard: skip unwanted sentences or discard mispronounced recordings
  • Bilingual UI: English and French, auto-detected from browser or forced via config

Keyboard shortcuts

Key Action
Space Start/stop recording
R Reset and restart recording
S Skip current sentence
D Discard last recording
F Flag current sample (toggle)
G Flag previous sample (toggle)

Quick start

With uv (no Docker)

# Prepare a JSONL file with one {"text": "..."} per line
# Prepare a CSV file with username,password rows (no header)
uv run AudioCrowd.py sentences.jsonl --users-csv users.csv

Full options:

uv run AudioCrowd.py sentences.jsonl \
  --users-csv users.csv \
  --salt mysalt \
  --output-dir ./recordings/ \
  --output-jsonl ./output.jsonl \
  --port 7860 \
  --share \
  --lang fr

With Docker

cd ./docker
cp env_file.example env_file
# Edit env_file with your settings (JSONL_PATH, USERS_CSV, etc.)
docker compose up --build

The app is exposed on port 7760 by default (mapped to 7860 inside the container). Mount your dataset directory and recordings are persisted to ./recordings/ on the host.

Input format

A JSONL file with at least a text field per line:

{"text": "The patient presents with acute symptoms."}
{"text": "Administer 500mg of amoxicillin twice daily."}

NeMo-format lines with audio_filepath/duration fields are also accepted; only text is used.

Output format

WAV files are saved as {userid}_{uuid4[:8]}.wav in the output directory. The JSONL manifest contains:

{"audio_filepath": "recordings/f3a1b2c3d4e5_a1b2c3d4.wav", "text": "The patient presents with...", "duration": 3.42, "timestamp": "2026-03-06T14:23:01+00:00", "userid": "f3a1b2c3d4e5", "sentence_index": 42}
{"audio_filepath": "recordings/f3a1b2c3d4e5_b2c3d4e5.wav", "text": "Flagged example...", "duration": 2.10, "timestamp": "2026-03-06T14:24:00+00:00", "userid": "f3a1b2c3d4e5", "sentence_index": 43, "flagged": true}

Tech stack

  • Python + Gradio -- single-file app launched via uv run (PEP 723 inline metadata)
  • click for CLI argument parsing
  • loguru for logging (stderr + audiodataset.log)
  • soundfile + numpy for audio processing
  • fcntl.flock for cross-process file locking (concurrent multi-user safety)

Alternatives

If you need a more full-featured, production-ready crowdsourcing platform, consider Mozilla Common Voice — an open-source initiative for collecting speech data in many languages. It can also be self-hosted via its Docker Compose setup.

AudioCrowd is intentionally simpler: a single-file app for quickly spinning up a private recording session with a specific sentence list and a known group of volunteers.

License

AGPLv3


Built with Claude Code.

About

A collaborative platform enabling volunteers to record audio samples for improving ASR datasets using Gradio.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors