About the hierarchical softmax #10

quanpn90 · 2016-01-05T01:11:15Z

Hi,

Thanks for the great model, and happy new year.

I would like to ask about your hierarchical softmax. Is it your intention to equally share the words to the cluster, or to make the implementation easier. I find it hard to understand the way you distribute the words to clusters, did you use a normal distribution ? I tried to group words based on their unigram frequencies (like in Mikolov's model) but the result is very bad.

Also, I guess you have also tried fbnn HSM. I tried to apply it on top of the network (after the final dropout), but it gives very huge loss. Is it possible to improve your HSM to make it work better with asynchronous clusters (some may have several words, while some have a lot of words).

Thank you,

yoonkim · 2016-01-05T07:41:26Z

It's mostly to make the implementation easier, and Ifound it to work surprisingly well.

I did also try fbnn but couldn't get it to work. I am not a 100% sure why, but I think there is an issue with precision: https://groups.google.com/forum/#!searchin/torch7/HSM/torch7/Hq_KL4k69dM/D3lf0r1OAQAJ

quanpn90 closed this as completed Jan 16, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the hierarchical softmax #10

About the hierarchical softmax #10

quanpn90 commented Jan 5, 2016

yoonkim commented Jan 5, 2016

About the hierarchical softmax #10

About the hierarchical softmax #10

Comments

quanpn90 commented Jan 5, 2016

yoonkim commented Jan 5, 2016