Embedding projector only loads first 100,000 vectors #773

vitalyli · 2017-11-26T18:51:46Z

Embedding projector only loads first 100000 vectors. In many real world applications, embedding dictionaries are well over 1Mil. Need some way to display vectors from larger sets or at least have a way to configure what the upper limit is.

vitalyli · 2017-11-26T19:21:10Z

It appears that this limit is hardcoded here:
.//tensorboard/plugins/projector/vz_projector/data-provider-server.ts
export const LIMIT_NUM_POINTS = 100000;

jart · 2017-11-28T23:10:28Z

Everything in the projector is done on the client side. There's a limit to how much the browser can handle. I'd be interested in hearing about whether or not things worked out if you changed the limit by hand.

vitalyli · 2017-11-28T23:41:26Z

I tried to change this limit, but client still said showing first 100k, which made me wonder if
server dictates that limit. Or is that cached somewhere in the browser cache perhaps?
Would be good to be able to send that limit as parameter to the server.
Often it's about searching for a vector by label and looking for closest vectors;
if it simply takes first 100k, that means it's limited in what can be explored given 1mil plus embedding file. May be distance compute can be pushed to the server, thus removing need for client to do the filtering altogether.

vitalyli · 2017-12-18T02:59:15Z

If we can't make client handle more than 100k what would be really useful is to tell server to sample data instead of returning first 100k. Think of data sorted by popularity, always seeing first 100k out of 1mil is biased towards more popular items. Ideally server would return a stratified sample of random 10k from each 100k and so it would give a good representative sample from 1Mil.

Seanspt · 2018-05-28T03:55:42Z

Up vote for sampling instead of return first 10k.
Also, it would be great that a group of wanted IDs could be passed in.

nfelt · 2018-05-29T19:47:20Z

We'd welcome a contribution to implement server-side sampling if someone wants to take this on.

kapilkd13 · 2018-10-05T07:46:14Z

Hi @nfelt I would like to take this. Can you point me to the files corresponding to the embedding projector. Also any suggstions/ideas?

rahulkrishnan98 · 2019-06-14T08:39:45Z

@vitalyli once we run the projector on 100000+ vectors and metadata, it can limit the vector and sample 100000+ points on it, but the metadata for even the loaded points fail.

hvout · 2019-07-24T13:08:20Z

Hello.
Sorry for bringing this up but the folder .//tensorboard/plugins/projector/vz_projector/ does not exist in my installation (installed with pip inside miniconda venv with python 3.6 - latest tensorboard version). Anyone knows where I can find that folder to increase the limit?

hvout · 2019-07-24T13:12:26Z

I'm able to increase it in the projector_plugin.py file under tensorboard/plugins/projector and it does work. But T-SNE and PCA keep sampling data for "faster results" - I believe these limits are set in data.ts but when installed with pip the vz_projector folder does not exist

alexdevmotion · 2020-03-05T11:47:16Z

I also have this issue, has anyone found an easy fix?

RSKothari · 2020-04-01T02:36:08Z

Hey guys, any luck on this topic? In my case, it only samples 120 data points. A tip I could perhaps offer to speed things up would be to offer a "PCA + tSNE" option. It could drastically help reduce embedding sizes and reduce the load on RAM via browser.

nlp4whp · 2020-09-04T07:38:11Z

I'm able to increase it in the projector_plugin.py file under tensorboard/plugins/projector and it does work. But T-SNE and PCA keep sampling data for "faster results" - I believe these limits are set in data.ts but when installed with pip the vz_projector folder does not exist

you are right ... it look like we have to modify something in data.ts for PCA and T-SNE sampling

although 10k is defined in data-provider-server.ts as export const LIMIT_NUM_POINTS = 100000;, but it will be sent to back-end in projector_plugin.py where the final tensor is returned
(see _serve_metadata(self, request) or _serve_tensor(self, request))

GeorgePearse · 2021-11-02T08:04:20Z

@RSKothari @nlp4whp It might even make a lot of sense to fork the embedding projector component and remove the in-browser interactive dimensionality reduction (to be replaced with whatever dimensionality reduction technique a data scientist wants to use ahead of time. The embedding projector has a lot of value on its own as a high-performance 3d visualization tool with convenient access to metadata. Unless people know a better alternative for pointclouds with metadata?

wizz92 · 2023-04-01T16:40:36Z

You just need to change qO=1e5 to qO=1e6 in /tensorboard/plugins/projector/tf_projector_plugin/projector_binary.js
worked fine for me.

saikot-paul · 2023-10-13T16:22:19Z

Is it possible to change any of these parameters in a colab environment?

arcra · 2023-10-24T21:49:38Z

Given that this currently requires modifications to the source code, there is no way to change this behavior with the supported extension from colab. There might be ways to use a custom version of tensorboard with a "local runtime" in colab, but I'm not knowledgeable enough about colab to provide any guidance in that regard.

If a locally modified version of a tensorboard would be sufficient (i.e. just running a standalone TB, not in colab), you can take a look at our DEVELOPMENT guide for some pointers on how to run a local instance.

With respect to better supporting this as a feature in a feature release, I'm afraid it's unlikely we'll prioritize this, especially because there hasn't been any active development on this area/plugin, there are no people left in the team who are familiar with this part of the code, and it's probably not an easy thing to solve in a generic way (e.g. without affecting performance on some browsers/machines, and/or some UI support to allow users to configure these visualization parameters, etc.). If anybody is interested in contributing, they can get in touch with us.

vitalyli changed the title ~~Embedding projector only loads first at 100,000~~ Embedding projector only loads first 100,000 Nov 26, 2017

vitalyli changed the title ~~Embedding projector only loads first 100,000~~ Embedding projector only loads first 100,000 vectors Nov 26, 2017

jart added stat:awaiting response plugin:projector labels Nov 28, 2017

nfelt added stat:contributions welcome and removed stat:awaiting response labels May 29, 2018

bileschi added the theme:performance Performance, scalability, large data sizes, slowness, etc. label Dec 20, 2019

arcra mentioned this issue Oct 24, 2023

Increase the number of points displayed on Embeddings Projector #6635

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Embedding projector only loads first 100,000 vectors #773

Embedding projector only loads first 100,000 vectors #773

vitalyli commented Nov 26, 2017 •

edited

Loading

vitalyli commented Nov 26, 2017

jart commented Nov 28, 2017

vitalyli commented Nov 28, 2017

vitalyli commented Dec 18, 2017 •

edited

Loading

Seanspt commented May 28, 2018

nfelt commented May 29, 2018

kapilkd13 commented Oct 5, 2018

rahulkrishnan98 commented Jun 14, 2019

hvout commented Jul 24, 2019 •

edited

Loading

hvout commented Jul 24, 2019 •

edited

Loading

alexdevmotion commented Mar 5, 2020

RSKothari commented Apr 1, 2020

nlp4whp commented Sep 4, 2020

GeorgePearse commented Nov 2, 2021

wizz92 commented Apr 1, 2023

saikot-paul commented Oct 13, 2023

arcra commented Oct 24, 2023

Embedding projector only loads first 100,000 vectors #773

Embedding projector only loads first 100,000 vectors #773

Comments

vitalyli commented Nov 26, 2017 • edited Loading

vitalyli commented Nov 26, 2017

jart commented Nov 28, 2017

vitalyli commented Nov 28, 2017

vitalyli commented Dec 18, 2017 • edited Loading

Seanspt commented May 28, 2018

nfelt commented May 29, 2018

kapilkd13 commented Oct 5, 2018

rahulkrishnan98 commented Jun 14, 2019

hvout commented Jul 24, 2019 • edited Loading

hvout commented Jul 24, 2019 • edited Loading

alexdevmotion commented Mar 5, 2020

RSKothari commented Apr 1, 2020

nlp4whp commented Sep 4, 2020

GeorgePearse commented Nov 2, 2021

wizz92 commented Apr 1, 2023

saikot-paul commented Oct 13, 2023

arcra commented Oct 24, 2023

vitalyli commented Nov 26, 2017 •

edited

Loading

vitalyli commented Dec 18, 2017 •

edited

Loading

hvout commented Jul 24, 2019 •

edited

Loading

hvout commented Jul 24, 2019 •

edited

Loading