Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about stats pooling in 2D convnet paper #2

Closed
Sreyan88 opened this issue Apr 13, 2022 · 2 comments
Closed

Question about stats pooling in 2D convnet paper #2

Sreyan88 opened this issue Apr 13, 2022 · 2 comments

Comments

@Sreyan88
Copy link

Hi there!

Great paper and great repo. My question is rather related to your paper. In the paper you mention:

The statistics pooling layer in speaker embeddings networks with 2D CNN architectures is a concatenation of the mean and std of each of the F × C frequency-channel pairs

I am a bit confused on this end. In pytorch terms if my resent output is B x C x T x F, how exactly do I implement stats pooling?

would it be:

x = x.permute(0,2,3,1) #B x T x F x C
x = x.reshape(B,T,F x C) # B x T x (FxC)

followed by a stats pooling layer?

Thank You for the help!

@tstafylakis
Copy link
Owner

Hi Sreyan,
Thanks for your interest. Stats pooling is very simple.
Your tensor is x = x.reshape(B,T,F x C),
Do something like:
x = torch.cat((torch.mean(x,dim=1),torch.std(x,dim=1))).reshape(B,FxCx2)
so that each example in your minibatch is a vector of dim = 2xFxC.
Is that clear?

@Sreyan88
Copy link
Author

Hi @tstafylakis ,

Thank You so much for your reply. Yes, that is clear. I am currently trying to implement an attentive stats pooling layer. Will update you here once I am able to find a solution. Thank You!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants