You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I like your paper but I think it's confusing that how to tackle multiple embeddings of the same word/token. I wonder is there any chance that different embeddings of the same word are mapped to different clusters and all of them are quite close to the cluster center in the spherical space. How do you deal with that?
The text was updated successfully, but these errors were encountered:
Thanks for the question. You are right that each word can have multiple contextualized embeddings, and they can be mapped to different clusters during the clustering step in our algorithm. However, when deriving the final results, we take the average of the latent contextualized embeddings as the (context-free) representation for each word, which is then used for computing the topic-word distribution.
I hope this helps. Please let me know if anything remains unclear.
So the average step is like in the paper ''Tired of Topic Models Clusters of Pretrained Word Embeddings Make for Fast and Good Topics too''? Did you reweight the averaged token embeddings? Also, how do you deal with subwords?
I like your paper but I think it's confusing that how to tackle multiple embeddings of the same word/token. I wonder is there any chance that different embeddings of the same word are mapped to different clusters and all of them are quite close to the cluster center in the spherical space. How do you deal with that?
The text was updated successfully, but these errors were encountered: