Outputs just one cluster. #6

abtExp · 2019-10-29T05:35:55Z

I'm Using this on the outputs from the encoder provided in the voiceFilter repository.
The outputs are 256 dimensions, and i'm using a segment length of 1.97s.
The output of the spectralClusterer is always cluster 0 for every sample.

Here's How i define the SpectralClusterer Object:
SpectralClusterer(min_clusters=1, max_clusters=10, stop_eigenvalue=1e-4)

My input dimension is (36, 256)

Here's a tsne visualization of the encodings

Here's The output From The SpectralClusterer
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

The encodings are in range (-1, 1), am i doing something wrong?

Here's Another Example for a different audio clip

Here's the output
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

The text was updated successfully, but these errors were encountered:

abtExp · 2019-10-29T10:16:00Z

Never mind, I used the TSNE output as the input instead of the embedding vectors from the model and it worked.

davide-scalzo · 2020-10-12T17:06:38Z

@wq2012 I am seeing something similar, if I feed the embeds to the clusterer directly I get a single cluster when max_clusters is enabled, or way too many clusters if not.

However if I get 2d projections using UMAP and pass those to the clusterer I do get some reasonable results. Is this intended?

wq2012 · 2020-10-12T17:30:27Z

@davodesign84 I don't think this is intended. Not sure how you obtained the embeddings. This feels like your embeddings are not good enough. Downward projection somehow compensated to it.

davide-scalzo · 2020-10-12T17:34:15Z

That's what I thought @wq2012 , I just came across this issue and tried it, seemed a bit better. On a side note the embeddings are obtained through https://github.com/resemble-ai/Resemblyzer

def compute_embeddings(wav):
    ## Compute the embeddings
    _, cont_embeds, wav_splits = encoder.embed_utterance(wav, return_partials=True, rate=10)
    utterance_embeds = np.array(cont_embeds)
    return utterance_embeds, wav_splits

By the way, loved how clear your talks are, 3 days ago I knew nothing about diarization and now I have some decent idea about how it's supposed to work in theory, practice is still lacking tho :D

wq2012 · 2020-10-12T17:39:16Z

@davodesign84 I'm not very familiar with 3rd party tools for learning embeddings. I know they exist but never played with them myself.

Also note that this Python implementation has some problem. It uses the k-means from scikit-learn, which is based on Euclidean distance. But if the embeddings are trained with cosine distance, there is a mismatch.

Glad to learn that my talks had been helpful to you. Also excited to see more people getting interested in speaker diarization.

davide-scalzo · 2020-10-12T17:44:07Z

It uses the k-means from scikit-learn, which is based on Euclidean distance. But if the embeddings are trained with cosine distance, there is a mismatch. This is something I missed! Will keep investigating or find a more suitable pretrained model. Thank you!

naymaraq · 2022-05-16T20:07:25Z

@abtExp Hi, I have the same "one-cluster" issue. Did you find the problem?

abtExp closed this as completed Oct 29, 2019

davide-scalzo mentioned this issue Oct 12, 2020

Clarification on embeddings training resemble-ai/Resemblyzer#40

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Outputs just one cluster. #6

Outputs just one cluster. #6

abtExp commented Oct 29, 2019 •

edited

Loading

abtExp commented Oct 29, 2019

davide-scalzo commented Oct 12, 2020

wq2012 commented Oct 12, 2020

davide-scalzo commented Oct 12, 2020

wq2012 commented Oct 12, 2020

davide-scalzo commented Oct 12, 2020

naymaraq commented May 16, 2022

Outputs just one cluster. #6

Outputs just one cluster. #6

Comments

abtExp commented Oct 29, 2019 • edited Loading

abtExp commented Oct 29, 2019

davide-scalzo commented Oct 12, 2020

wq2012 commented Oct 12, 2020

davide-scalzo commented Oct 12, 2020

wq2012 commented Oct 12, 2020

davide-scalzo commented Oct 12, 2020

naymaraq commented May 16, 2022

abtExp commented Oct 29, 2019 •

edited

Loading