Skip to content

D-vector trained on VoxCeleb1

Latest
Compare
Choose a tag to compare
@yistLin yistLin released this 12 May 02:39
· 3 commits to master since this release

Pretrained models

The model was trained on VoxCeleb1 dataset.

Model details:

  • 40-dim log mel spectrogram as input
  • 3-layer LSTM with hidden dimensions being 256
  • 256-dim attentive pooled speaker embedding

Training details:

  • 64 speakers, 10 utterances per speaker in a batch
  • 250K steps