This is the repository for GenAID, a Generalisable accent identification (AID) model across speakers. The code is built upon speechbrain v0.5.16 with the original documentation here, and the AID dataset construction and code implementation by CommonAccent.
git clone https://github.com/jzmzhong/GenAID.git
cd GenAID
conda create -n speechbrain python==3.10
conda activate speechbrain
conda install pytorch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt
pip install --editable .
-
Download the dataset, which is available here. The specific data version used by this repo is
Common Voice Corpus 17.0
. -
Unzip the dataset after download.
tar -xvf cv-corpus-17.0-2024-03-15-en.tar.gz
-
Filter out the speech utterances with accent labels and process them.
-
Filter out accents with sufficient data and split into training/validation/testing sets. Note that there are two validation/testing sets, one for seen speakers (these speakers have sufficient data in the training set) and the other for unseen speakers (these speakers do not overlap with any of the speakers in the training set).
-
Processed training/validation/testing sets are available at:
./recipes/CommonAccent/CommonAccent-CV17-spk-resplit
.
-
Please ensure that the
data_folder
field is the Common Voice dataset directory, and thecsv_prepared_folder
field is the Common Accent processed training/validation/testing sets directory, e.g../recipes/CommonAccent/CommonAccent-CV17-spk-resplit
. -
Also set the
output_folder
field to be the directory where you want to store the trained checkpoints and therir_folder
filed to be the directory where you want to store the noise dataset, downloaded and used in training for more robuts accent identification.
cd ./recipes/CommonAccent
python train_GenAID.py train_GenAID_v6.yaml
-
A trained model (with partial files for inference) is available at: https://drive.google.com/file/d/1slGrpZSu5g-nF7R-QMCmtGcjN3kw7lQj/view?usp=sharing
-
Please download and unzip it into the following directory for inference/embeddings extraction.
./recipes/CommonAccent/GenAID_v7
- Please set
pretrained_path
field to be the checkpoint directory you want to inference, and theoutput_folder
field to be the directory where you want to store the inference results (confusion matrices).
cd ./recipes/CommonAccent
python inference_GenAID.py inference_GenAID_v6.yaml
Inference results are available here:
./recipes/CommonAccent/results
- Please process the data to extract embeddings from and place the original data under
data_folder
, and the csv file of all files information undercsv_prepared_folder
. An example of the data processing scripts is at./recipes/CommonAccent/VCTK/data_prep_VCTK.py
, with processed results at./VCTK/all_file_paths.csv
.
-
Please process the data to extract embeddings from and place the original data under
data_folder
, and the csv file of all files information undercsv_prepared_folder
. An example of the data processing scripts is at./recipes/CommonAccent/VCTK/data_prep_VCTK.py
, with processed results at./VCTK/all_file_paths.csv
. -
Please set
pretrained_path
field to be the checkpoint directory you want to use to extract embeddings, and theoutput_folder
field to be the directory where you want to store the extracted embeddings.
cd ./recipes/CommonAccent
python extract_embeddings_GenAID.py extract_embeddings_GenAID_v6.yaml
CommonAccent: Paper, Code, Model
Please cite GenAID (part of the AccentBox paper) if you use it for your research or business.
@inproceedings{zhong2025accentbox,
author = {Zhong, Jinzuomu and Richmond, Korin and Su, Zhiba and Sun, Siqi},
title = {{AccentBox: Towards High-Fidelity Zero-Shot Accent Generation}},
booktitle = {IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
year = {2025},
pages = {1-5},
doi = {10.1109/ICASSP49660.2025.10888332}
}