WORK IN PROGRESS
Please contact @sholokovalexey and @underdogliu if having any question.
This is the associated baseline system for our work on Speaker Odyssey, focusing on household speaker recognition.
Code in this repo is subjected to baseline experiments with limited number of protocols. Therefore, it needs re-factoring and incremental updates as the research proceeds.
Python3.8+. We tested our code on python 3.8 and 3.9.
Run pip install -r requirements.txt
to config the environment. For python virtual environment, please check related instructions in Virtualenv or Conda.
- Download the speaker embeddings from this link and store them in
${YOUR_PATH}/embeddings
(we will employ git LFS later). - Config the path of embeddings in
config.yaml
to${YOUR_PATH}/embeddings
. - Run
scripts/run_all.sh
for empirical experiments across all baseline configurations, including active and passive enrollments. There are multiple other scripts for individual experiments. You can have a check on the scripts and related config files in./configs
for more.
Whether we go for active or passive enrollment approach, we include the following recognizing algorithms:
- K-means clustering
- Variational Bayesian (VB) clustering
- Label propagation
- Aggelomerative hierarchical clustering (AHC)
For details about the backend algorithms we used, please read our paper.
We perform threshold centroid-based scoring with a fixed threshold.
We perform training and evaluation on two datasets:
- ASVspoof 2019, physical access (PA)
- VoxCeleb1
For interested users who want to extend the toolkit and test new algorithms, please have a check on:
models.py
- for speaker recognition and scoring backendclustering*.py
- for various clustering algorithms applied
If you would like to use this repo, please cite our work:
@article{alexeyhousehold2022,
title={Baselines and Protocols for Household Speaker Recognition},
author={Alexey Sholokhov, Xuechen Liu, Md Sahidullah and Tomi Kinnunen},
journal={Proc. Speaker Odyssey},
year={2022}
}