Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embedding Projector: UMAP and TSNE projections broken for embeddings that are not normalized #6271

Open
alicialics opened this issue Mar 25, 2023 · 5 comments · May be fixed by #6293
Open

Comments

@alicialics
Copy link
Contributor

alicialics commented Mar 25, 2023

Chrome 111.0.5563.64

Issue description

When using embeddings that are not normalized and sphereized, the UMAP and T-SNE are incorrect or not simply loading.
See #5547 for a previous bug report.

the reason is that knn expects normalized vectors for cosine distance (cosDistNorm) rather than arbitrary vectors.

Alternative repo:

  • Build and launch projector (must be from master, not https://projector.tensorflow.org)
  • Uncheck "Sphereize data" on the default Word2Vec 10k dataset
  • Switch projection from "PCA" to either t-SNE or UMAP
  • See the UI breaks with "Initializing t-SNE..."/"Initialize UMAP..." modal loading forever

related to: #2421

@alicialics alicialics changed the title UMAP and TSNE projections broken for embeddings that are not normalized Embedding Projector: UMAP and TSNE projections broken for embeddings that are not normalized Mar 25, 2023
@dmfolgado
Copy link

dmfolgado commented May 20, 2023

I'm currently facing this issue. Is there any workaround?
Can I pass a normalized embedding to the checkpoint? If so, what would be the correct normalization? I'm trying some approaches, but I could not work with them.

EDIT: This issue is still persistent in v2.13.0. I'm using Windows 11 and Google Chrome.

@alicialics
Copy link
Contributor Author

@dmfolgado you can either use the built-in "Sphereize data" option or normalize the embedding yourself. I think all you need to make sure each one is a unit vector in euclidean space. Sphereizing does this and in addition normalizes the centroid to origin so it might even work better

@dmfolgado
Copy link

Thank you for the suggestion. I had already tried but I continue to get different projections between the online Projector and the offline.

Local version. Clustering the CBF time series dataset (X_test)
offline

Online version. Clustering the CBF time series dataset (X_test)
online

I attach the data and metadata for reproducibility.
data.zip

@dmfolgado
Copy link

I just found what was causing the discrepancy. The data I was using for the online projection was standard scaled. It seems that using standard scale data and the spherization yields to the same results.

@delale
Copy link

delale commented Sep 11, 2023

Any update on the stuck on "Initialize UMAP..." message issue? I am currently facing the exact same problem which I cannot recreate with the online embedding projector using the same data. Unless "Spherize data" is checked before clicking on UMAP, TensorBoard embedding projector remains stuck on "Initialising UMAP...".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants