-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Embedding projector only loads first 100,000 vectors #773
Comments
It appears that this limit is hardcoded here: |
Everything in the projector is done on the client side. There's a limit to how much the browser can handle. I'd be interested in hearing about whether or not things worked out if you changed the limit by hand. |
I tried to change this limit, but client still said showing first 100k, which made me wonder if |
If we can't make client handle more than 100k what would be really useful is to tell server to sample data instead of returning first 100k. Think of data sorted by popularity, always seeing first 100k out of 1mil is biased towards more popular items. Ideally server would return a stratified sample of random 10k from each 100k and so it would give a good representative sample from 1Mil. |
Up vote for sampling instead of return first 10k. |
We'd welcome a contribution to implement server-side sampling if someone wants to take this on. |
Hi @nfelt I would like to take this. Can you point me to the files corresponding to the embedding projector. Also any suggstions/ideas? |
@vitalyli once we run the projector on 100000+ vectors and metadata, it can limit the vector and sample 100000+ points on it, but the metadata for even the loaded points fail. |
Hello. |
I'm able to increase it in the |
I also have this issue, has anyone found an easy fix? |
Hey guys, any luck on this topic? In my case, it only samples 120 data points. A tip I could perhaps offer to speed things up would be to offer a "PCA + tSNE" option. It could drastically help reduce embedding sizes and reduce the load on RAM via browser. |
you are right ... it look like we have to modify something in although 10k is defined in |
@RSKothari @nlp4whp It might even make a lot of sense to fork the embedding projector component and remove the in-browser interactive dimensionality reduction (to be replaced with whatever dimensionality reduction technique a data scientist wants to use ahead of time. The embedding projector has a lot of value on its own as a high-performance 3d visualization tool with convenient access to metadata. Unless people know a better alternative for pointclouds with metadata? |
You just need to change qO=1e5 to qO=1e6 in /tensorboard/plugins/projector/tf_projector_plugin/projector_binary.js |
Is it possible to change any of these parameters in a colab environment? |
Given that this currently requires modifications to the source code, there is no way to change this behavior with the supported extension from colab. There might be ways to use a custom version of tensorboard with a "local runtime" in colab, but I'm not knowledgeable enough about colab to provide any guidance in that regard. If a locally modified version of a tensorboard would be sufficient (i.e. just running a standalone TB, not in colab), you can take a look at our DEVELOPMENT guide for some pointers on how to run a local instance. With respect to better supporting this as a feature in a feature release, I'm afraid it's unlikely we'll prioritize this, especially because there hasn't been any active development on this area/plugin, there are no people left in the team who are familiar with this part of the code, and it's probably not an easy thing to solve in a generic way (e.g. without affecting performance on some browsers/machines, and/or some UI support to allow users to configure these visualization parameters, etc.). If anybody is interested in contributing, they can get in touch with us. |
Embedding projector only loads first 100000 vectors. In many real world applications, embedding dictionaries are well over 1Mil. Need some way to display vectors from larger sets or at least have a way to configure what the upper limit is.
The text was updated successfully, but these errors were encountered: