New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loading embeddings into the graph fails with: libprotobuf ERROR google/protobuf/io/zero_copy_stream_impl_lite.cc 164 Cannot allocate buffer larger than kint32max for StringOutputStream #31093
Comments
Work around so far was to reduce phrase vocabulary, which reduced graph size from 1.4G to 500Mb and above error doesn't appear, but it's not really scalable solution. |
@vitalyli, Provide us the full minimal code snippet. It will indeed help us to move faster. |
I can't share embeddings file, but the core issue is it's too large for the graph; The file is about 1.5G in size with words and word vectors of dim 100. The error prints while loading w_embeddings matrix. I don't remember seeing this in the past, but that could be because I have not crossed this limit. The TFRecord are parsed this way, which works, but there seem to be no place to have external mapping of word->index->embedding unless it's being done as part of graph.
Below is how graph embeddings are initialized, where:
|
This issue is not related to tf.data. |
Never mind solution to this is to avoid constant initializer and load vectors via placeholder, that keeps protobuf size small. |
System information
MacOS 10.14
Binary
1.13.1
Python 3.6.7 |Anaconda
Describe the current behavior
Loading 1.4mil 100dim embeddings into the graph
words:1457657; dim:100
Describe the expected behavior
Want to be able to use TFRecord data set with words and graph to lookup indexes via TF table
and then parallel lookup imbedding vectors; compute using 3 dim tensor.
This used to work with smaller set of embeddings.
Anyway to overcome this problem without rewriting data feed?
Code to reproduce the issue
w_embedding_vocab = tf.constant(embDic.vocab, dtype=tf.string, shape=[embDic.vocab_size], name="w_embedding_vocab")
Other info / logs
No other logs, just one ERROR message
[libprotobuf ERROR google/protobuf/io/zero_copy_stream_impl_lite.cc:164] Cannot allocate buffer larger than kint32max for StringOutputStream.
The text was updated successfully, but these errors were encountered: