Question about stats pooling in 2D convnet paper #2

Sreyan88 · 2022-04-13T14:42:59Z

Hi there!

Great paper and great repo. My question is rather related to your paper. In the paper you mention:

The statistics pooling layer in speaker embeddings networks with 2D CNN architectures is a concatenation of the mean and std of each of the F × C frequency-channel pairs

I am a bit confused on this end. In pytorch terms if my resent output is B x C x T x F, how exactly do I implement stats pooling?

would it be:

x = x.permute(0,2,3,1) #B x T x F x C
x = x.reshape(B,T,F x C) # B x T x (FxC)

followed by a stats pooling layer?

Thank You for the help!

The text was updated successfully, but these errors were encountered:

tstafylakis · 2022-04-13T14:55:45Z

Hi Sreyan,
Thanks for your interest. Stats pooling is very simple.
Your tensor is x = x.reshape(B,T,F x C),
Do something like:
x = torch.cat((torch.mean(x,dim=1),torch.std(x,dim=1))).reshape(B,FxCx2)
so that each example in your minibatch is a vector of dim = 2xFxC.
Is that clear?

Sreyan88 · 2022-04-13T19:14:20Z

Hi @tstafylakis ,

Thank You so much for your reply. Yes, that is clear. I am currently trying to implement an attentive stats pooling layer. Will update you here once I am able to find a solution. Thank You!

Sreyan88 closed this as completed Apr 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about stats pooling in 2D convnet paper #2

Question about stats pooling in 2D convnet paper #2

Sreyan88 commented Apr 13, 2022

tstafylakis commented Apr 13, 2022

Sreyan88 commented Apr 13, 2022

Question about stats pooling in 2D convnet paper #2

Question about stats pooling in 2D convnet paper #2

Comments

Sreyan88 commented Apr 13, 2022

tstafylakis commented Apr 13, 2022

Sreyan88 commented Apr 13, 2022