Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search in Embedding Projector using Japanese or Hindi text causes Cannot read property 'toString' of undefined #21891

Open
ToonTalk opened this issue Aug 27, 2018 · 3 comments
Assignees
Labels
comp:tensorboard Tensorboard related issues

Comments

@ToonTalk
Copy link

Please go to Stack Overflow for help and support:

https://stackoverflow.com/questions/tagged/tensorflow

If you open a GitHub issue, here is our policy:

  1. It must be a bug, a feature request, or a significant problem with documentation (for small docs fixes please send a PR instead).
  2. The form below must be filled out.
  3. It shouldn't be a TensorBoard issue. Those go here.

Here's why we have that policy: TensorFlow developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.


System information

Visiting http://projector.tensorflow.org/?config=https://ecraft2learn.github.io/ai/word-embeddings/hi/projector.json using Chrome 68

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow installed from (source or binary):
  • TensorFlow version (use command below):
  • Python version:
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version:
  • GPU model and memory:
  • Exact command to reproduce:

You can collect some of this information using our environment capture script:

https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh

You can obtain the TensorFlow version with

python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"

Describe the problem

After visiting http://projector.tensorflow.org/?config=https://ecraft2learn.github.io/ai/word-embeddings/hi/projector.json using Chrome 68 and then entering any Hindi text in the Search field the console shows

Uncaught TypeError: Cannot read property 'toString' of undefined
at b (?config=https://ecraft2learn.github.io/ai/word-embeddings/hi/projector.json:formatted:60928)
at ?config=https://ecraft2learn.github.io/ai/word-embeddings/hi/projector.json:formatted:61345
at Array.forEach ()
at a.query (?config=https://ecraft2learn.github.io/ai/word-embeddings/hi/projector.json:formatted:61344)
at ?config=https://ecraft2learn.github.io/ai/word-embeddings/hi/projector.json:formatted:66646
at ?config=https://ecraft2learn.github.io/ai/word-embeddings/hi/projector.json:formatted:66171
at Array.forEach ()
at HTMLElement.b.notifyInputChanged (?config=https://ecraft2learn.github.io/ai/word-embeddings/hi/projector.json:formatted:66170)
at HTMLElement.b.onTextChanged (?config=https://ecraft2learn.github.io/ai/word-embeddings/hi/projector.json:formatted:66185)
at HTMLElement. (?config=https://ecraft2learn.github.io/ai/word-embeddings/hi/projector.json:formatted:66151)
b @ ?config=https://ecraft2learn.github.io/ai/word-embeddings/hi/projector.json:formatted:60928
(anonymous) @ ?config=https://ecraft2learn.github.io/ai/word-embeddings/hi/projector.json:formatted:61345
a.query @ ?config=https://ecraft2learn.github.io/ai/word-embeddings/hi/projector.json:formatted:61344
(anonymous) @ ?config=https://ecraft2learn.github.io/ai/word-embeddings/hi/projector.json:formatted:66646
(anonymous) @ ?config=https://ecraft2learn.github.io/ai/word-embeddings/hi/projector.json:formatted:66171
b.notifyInputChanged @ ?config=https://ecraft2learn.github.io/ai/word-embeddings/hi/projector.json:formatted:66170
b.onTextChanged @ ?config=https://ecraft2learn.github.io/ai/word-embeddings/hi/projector.json:formatted:66185
(anonymous) @ ?config=https://ecraft2learn.github.io/ai/word-embeddings/hi/projector.json:formatted:66151

Source code / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.

@asimshankar asimshankar added the comp:tensorboard Tensorboard related issues label Sep 6, 2018
@asimshankar asimshankar assigned jart and nfelt and unassigned asimshankar Sep 6, 2018
@nfelt
Copy link
Contributor

nfelt commented Sep 12, 2018

@ToonTalk this seems to happen for any search input text at all (e.g. "x"), not just japanese or hindi inputs - does that match what you're seeing?

I ran this through the debugger and it seems to be coming from this line triggered when p.index is 19999:
https://github.com/tensorflow/tensorboard/blob/1.10/tensorboard/plugins/projector/vz_projector/util.ts#L157

On loading the page, I noticed a message shows up saying: "Number of tensors (20000) do not match the number of lines in metadata (19999)."

So I'm guessing the issue has something to do with that discrepancy, and the search predicate is being evaluated on a 20,000th tensor entry that has no corresponding metadata and as a result p.metadata[fieldName] returns undefined.

@dsmilkov @nsthorat does this diagnosis seem right? Do you know where the right place would be to add the appropriate guarding logic?

@ToonTalk
Copy link
Author

Thanks for looking into this. Yes just "x" reproduces the problem.

However when I checked the metadataPath in https://ecraft2learn.github.io/ai/word-embeddings/hi/projector.json (which is https://ecraft2learn.github.io/ai/word-embeddings/hi/projector-labels.tsv) it sure seems to be 20000 lines (e.g. opening it in Chrome dev tools shows 20001 as the final blank line). The same for the tensorPath.

I too see the warning about 19999 when I load it but have no memory of seeing that when I posted this.

I just found that line 89 displays as � and seems to be Unicode FFFD. And line 2147 is ​displays as a small red dot in dev tools

I made a new version with a better word filter. http://projector.tensorflow.org/?config=https://ecraft2learn.github.io/ai/word-embeddings/hi/projector_v2.json and the problem went away.

By the way the data came from https://fasttext.cc/docs/en/crawl-vectors.html

Perhaps the projector needs to deal better with unexpected entries such as as � ?

@ToonTalk
Copy link
Author

When I mentioned a line with small red dot it turns out it was a https://en.wikipedia.org/wiki/Zero-width_non-joiner

I finally got the search button to work with 15 languages but the zero-width non-joiner occurred 3 times in the Sinhalese version and that caused the same problem as the original post until I eliminated it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:tensorboard Tensorboard related issues
Projects
None yet
Development

No branches or pull requests

4 participants