How to implement VAD model with GMM? #13

kareemamrr · 2020-06-09T07:52:32Z

The appropriate paper Speaker Diarization with LSTM writes that the Voice Activity Detection model is a GMM model using the same PLP features as the i-vector and two full covariance Gaussians. How can I implement this using scikit-learn's GMM class?

wq2012 · 2020-06-14T14:28:04Z

We used a pretrained ASR model to generate force alignment for the data to get per-frame speech vs non-speech ground truth. Then you can simply fit a Gaussian to all speech frames, then fit another Gaussian to all non-speech frames.

kareemamrr changed the title ~~How to implement VAD model wit GMM?~~ How to implement VAD model with GMM? Jun 9, 2020

wq2012 closed this as completed Jun 14, 2020

wq2012 added the question Further information is requested label Jun 14, 2020

This was referenced Jun 14, 2020

Is forced alignment sufficient for detecting speech segments? #16

Closed

How many frames of speech and non-speech segments are sufficient for fitting the GMM model. #17

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to implement VAD model with GMM? #13

How to implement VAD model with GMM? #13

kareemamrr commented Jun 9, 2020

wq2012 commented Jun 14, 2020

How to implement VAD model with GMM? #13

How to implement VAD model with GMM? #13

Comments

kareemamrr commented Jun 9, 2020

wq2012 commented Jun 14, 2020