This is the official implementation of "DNSMOS Pro: A Reduced-Size DNN for Probabilistic MOS of Speech". DNSMOS Pro is a model that takes as input a speech clip, and outputs a Gaussian mean opinion score (MOS) distribution.
Authors: Fredrik Cumlin, Xinyu Liang
Emails: fcumlin@gmail.com, hopeliang990504@gmail.com
The easiest path is:
git clone <this-repo-url>
cd DNSMOSPro
./install.sh
source .venv/bin/activate
dnsmospro path/to/audio.wavThat installs a terminal command called dnsmospro, so students do not need to remember python cli.py ....
On Windows:
install.bat
.venv\Scripts\activate
dnsmospro path\to\audio.wavStudents who do not want to use the terminal can launch the desktop app:
source .venv/bin/activate
dnsmospro-guiOr directly:
python gui.pyThe GUI lets them:
- Add individual audio files.
- Add a whole folder of recordings.
- Score everything with one click.
- View the MOS output in a results panel.
If you want students to avoid installing Python entirely, build native artifacts for each operating system and distribute those instead.
Recommended builder Python:
Python 3.12Use Python 3.11 only if 3.12 gives you a wheel or packaging issue on a target platform. Do not use Apple Command Line Tools Python for GUI development or standalone builds.
macOS or Linux builder:
./build_standalone.shWindows builder:
build_standalone.batThis creates a standalone artifact in dist/.
Outputs:
macOS: dist/DNSMOS Pro.app
Windows: dist/DNSMOS Pro/
Linux: dist/DNSMOS Pro/Recommended distribution flow:
- Build on each operating system you want to support.
- Zip the finished app or
dist/output for that operating system. - Send students the app directly instead of the source repo.
- Keep the CLI install path only for advanced users.
Notes:
- PyInstaller builds are platform-specific. One build does not run everywhere.
- The bundled app includes the pretrained model files, so students do not need the repo checkout.
- If macOS warns that the app is from an unidentified developer, you will need to codesign/notarize it for the smoothest distribution.
- A GitHub Actions workflow is included at
.github/workflows/build-standalone.ymlto generate macOS, Windows, and Linux artifacts from tags or manual runs.
Before you cut a prerelease:
- Test the CLI on a real audio file.
- Test the GUI from source with Python 3.12.
- Run
./build_standalone.shon macOS and confirm the app launches. - Push the repo and run the GitHub Actions matrix for macOS, Windows, and Linux.
- Download each artifact and smoke-test it on the matching operating system.
- Tag the repo with a prerelease version such as
v0.1.0-beta.1.
Score one file:
dnsmospro speech.wavScore several files:
dnsmospro samples/*.wavScore a whole folder:
dnsmospro recordings/Score a folder and its subfolders:
dnsmospro recordings/ --recursiveShow mean and variance:
dnsmospro speech.wav --verbosePick a different pretrained model:
dnsmospro speech.wav --model bvccThe CLI automatically picks cuda, mps, or cpu, in that order, unless --device is supplied manually.
Use Python 3.9 or newer for CLI work. Use Python 3.11 or 3.12 for GUI and standalone builds.
macOS and Linux:
./install.shWindows:
install.batThe installer creates .venv, installs PyTorch, installs the project, and exposes both dnsmospro and dnsmospro-gui.
If you prefer manual setup:
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install torch
pip install -e .There are three pretrained DNSMOS Pro ready to be used, trained on three datasets respectively. For inference, one can do the following (all paths are relative to this directory):
import numpy as np
import torch
import utils # Python file containing the STFT.
model = torch.jit.load('runs/NISQA/model_best.pt', map_location=torch.device('cpu'))
samples = np.ones(160_000)
# Defaults in `utils.stft` correspond to training values.
spec = torch.FloatTensor(utils.stft(samples))
with torch.no_grad():
prediction = model(spec[None, None, ...])
mean = prediction[:, 0]
variance = prediction[:, 1]
print(f'{mean=}, {variance=}')
The mean can be used as a scalar prediction of MOS. Recommended input duration is 10 s, and should be of 16 kHz sample rate.
The main usability rules for a class setting are:
- Give them one command to run after install:
dnsmospro file.wav. - Avoid making them edit code or open Python.
- Prefer a copy-paste setup block using a virtual environment.
- Keep install commands platform-neutral unless you control the hardware.
- Accept folders and globs so they can grade many files with one command.
- Give non-technical users a GUI path as well as a CLI path.
The framework is Gin configurable; specifying model and dataset is done with a Gin config. See examples in configs/*.gin.
Example launch:
python train.py --gin_path "configs/vcc2018.gin" --save_path "runs/VCC2018"