Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Outputs just one cluster. #6

Closed
abtExp opened this issue Oct 29, 2019 · 7 comments
Closed

Outputs just one cluster. #6

abtExp opened this issue Oct 29, 2019 · 7 comments

Comments

@abtExp
Copy link

abtExp commented Oct 29, 2019

I'm Using this on the outputs from the encoder provided in the voiceFilter repository.
The outputs are 256 dimensions, and i'm using a segment length of 1.97s.
The output of the spectralClusterer is always cluster 0 for every sample.

Here's How i define the SpectralClusterer Object:
SpectralClusterer(min_clusters=1, max_clusters=10, stop_eigenvalue=1e-4)

My input dimension is (36, 256)

Here's a tsne visualization of the encodings
tsne

Here's The output From The SpectralClusterer
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

The encodings are in range (-1, 1), am i doing something wrong?

Here's Another Example for a different audio clip
tsne2

Here's the output
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

@abtExp
Copy link
Author

abtExp commented Oct 29, 2019

Never mind, I used the TSNE output as the input instead of the embedding vectors from the model and it worked.

@abtExp abtExp closed this as completed Oct 29, 2019
@davide-scalzo
Copy link

@wq2012 I am seeing something similar, if I feed the embeds to the clusterer directly I get a single cluster when max_clusters is enabled, or way too many clusters if not.

However if I get 2d projections using UMAP and pass those to the clusterer I do get some reasonable results. Is this intended?

@wq2012
Copy link
Owner

wq2012 commented Oct 12, 2020

@davodesign84 I don't think this is intended. Not sure how you obtained the embeddings. This feels like your embeddings are not good enough. Downward projection somehow compensated to it.

@davide-scalzo
Copy link

That's what I thought @wq2012 , I just came across this issue and tried it, seemed a bit better. On a side note the embeddings are obtained through https://github.com/resemble-ai/Resemblyzer

def compute_embeddings(wav):
    ## Compute the embeddings
    _, cont_embeds, wav_splits = encoder.embed_utterance(wav, return_partials=True, rate=10)
    utterance_embeds = np.array(cont_embeds)
    return utterance_embeds, wav_splits

By the way, loved how clear your talks are, 3 days ago I knew nothing about diarization and now I have some decent idea about how it's supposed to work in theory, practice is still lacking tho :D

@wq2012
Copy link
Owner

wq2012 commented Oct 12, 2020

@davodesign84 I'm not very familiar with 3rd party tools for learning embeddings. I know they exist but never played with them myself.

Also note that this Python implementation has some problem. It uses the k-means from scikit-learn, which is based on Euclidean distance. But if the embeddings are trained with cosine distance, there is a mismatch.

Glad to learn that my talks had been helpful to you. Also excited to see more people getting interested in speaker diarization.

@davide-scalzo
Copy link

It uses the k-means from scikit-learn, which is based on Euclidean distance. But if the embeddings are trained with cosine distance, there is a mismatch. This is something I missed! Will keep investigating or find a more suitable pretrained model. Thank you!

@naymaraq
Copy link

@abtExp Hi, I have the same "one-cluster" issue. Did you find the problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants