Invalid Argument Error? #4

PankajB1997 · 2019-03-13T09:26:22Z

Hi, thanks for this great work! Can you clarify the below issue? This is the Stack trace for an error I am facing when I run train.sh. Any ideas on why this is happening?

Number of trainable params: 63956672
Initalized variables
Started reader...
2019-03-13 17:19:44.582113: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at sparse_to_dense_op.cc:128 : Invalid argument: indices[305] = [240,9] is out of bounds: need 0 <= index < [300,9]
2019-03-13 17:19:44.585951: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at sparse_to_dense_op.cc:128 : Invalid argument: indices[21] = [12,9] is out of bounds: need 0 <= index < [300,9]
2019-03-13 17:19:44.590843: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at sparse_to_dense_op.cc:128 : Invalid argument: indices[55] = [24,9] is out of bounds: need 0 <= index < [300,9]
2019-03-13 17:19:45.009627: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at sparse_to_dense_op.cc:128 : Invalid argument: indices[125] = [90,9] is out of bounds: need 0 <= index < [300,9]
2019-03-13 17:19:45.012939: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at sparse_to_dense_op.cc:128 : Invalid argument: indices[16] = [7,9] is out of bounds: need 0 <= index < [300,9]
2019-03-13 17:19:45.014349: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at sparse_to_dense_op.cc:128 : Invalid argument: indices[10] = [1,9] is out of bounds: need 0 <= index < [300,9]
2019-03-13 17:19:45.017185: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at sparse_to_dense_op.cc:128 : Invalid argument: indices[296] = [199,9] is out of bounds: need 0 <= index < [300,9]
Traceback (most recent call last):
  File "/home/pankaj/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/home/pankaj/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/pankaj/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[305] = [240,9] is out of bounds: need 0 <= index < [300,9]
         [[{{node SparseToDense_3}} = SparseToDense[T=DT_STRING, Tindices=DT_INT64, validate_indices=true](StringSplit_3, SparseTensor_7/dense_shape, StringSplit_3:1, NotEqual/y)]]
         [[{{node IteratorGetNext}} = IteratorGetNext[output_shapes=[[?,300,9], [?,300], [?,300,5], [?,300], [?,300,1], [?,300,1], [?,300,5], [?,300], [?,300,1], [?,?], [?], [?], [?,300]], output_types=[DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_STRING, DT_STRING, DT_INT32, DT_INT32, DT_STRING, DT_INT32, DT_INT64, DT_STRING, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](IteratorV2)]]
         [[{{node model/dense/Tensordot/GatherV2/_183}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_1720_model/dense/Tensordot/GatherV2", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "code2seq.py", line 33, in <module>
    model.train()
  File "/home/pankaj/pankaj/repos/code2seq/model.py", line 95, in train
    _, batch_loss = self.sess.run([optimizer, train_loss])
  File "/home/pankaj/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/home/pankaj/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/pankaj/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/home/pankaj/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[305] = [240,9] is out of bounds: need 0 <= index < [300,9]
         [[{{node SparseToDense_3}} = SparseToDense[T=DT_STRING, Tindices=DT_INT64, validate_indices=true](StringSplit_3, SparseTensor_7/dense_shape, StringSplit_3:1, NotEqual/y)]]
         [[node IteratorGetNext (defined at /home/pankaj/pankaj/repos/code2seq/reader.py:192)  = IteratorGetNext[output_shapes=[[?,300,9], [?,300], [?,300,5], [?,300], [?,300,1], [?,300,1], [?,300,5], [?,300], [?,300,1], [?,?], [?], [?], [?,300]], output_types=[DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_STRING, DT_STRING, DT_INT32, DT_INT32, DT_STRING, DT_INT32, DT_INT64, DT_STRING, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](IteratorV2)]]
         [[{{node model/dense/Tensordot/GatherV2/_183}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_1720_model/dense/Tensordot/GatherV2", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
2019-03-13 17:19:45.455778: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at sparse_to_dense_op.cc:128 : Invalid argument: indices[10] = [1,9] is out of bounds: need 0 <= index < [300,9]
2019-03-13 17:19:45.456818: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at sparse_to_dense_op.cc:128 : Invalid argument: indices[86] = [30,9] is out of bounds: need 0 <= index < [300,9]
2019-03-13 17:19:45.461185: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at sparse_to_dense_op.cc:128 : Invalid argument: indices[9] = [0,9] is out of bounds: need 0 <= index < [300,9]

The text was updated successfully, but these errors were encountered:

PankajB1997 · 2019-03-13T11:31:48Z

On a related note, could you please explain the role of config.MAX_PATH_LENGTH in a bit more detail? I am not familiar with the model, so still trying to figure out this error, which seems to be related to this constant.

urialon · 2019-03-13T12:11:22Z

Hi Pankaj,
Did you run the model on a dataset that you preprocessed yourself, i.e., not our preprocessed dataset? Did you preprocess your dataset with a non-default max_path_length value? Or did you decrease the default value in config.MAX_PATH_LENGTH?
In general, config.MAX_PATH_LENGTH in the model should be greater by 1 than the max_path_length value of the preprocessing. This is indeed confusing.

config.MAX_PATH_LENGTH is the number of nodes in each "path".
For legacy reasons, in the JavaExtractor, the max_path_length is the number of edges and is set to 8 by default. This is the reasons that the default value for config.MAX_PATH_LENGTH is: 8+1.

PankajB1997 · 2019-03-13T12:46:50Z

Hello, thank you for the response!

Yes, I'm using another dataset for which I wrote another extractor, and then I ran preprocess.sh on just the extracted result (i.e. my self created train.raw.txt, val.raw.txt, test.raw.txt). I guess my mistake is that I did not take into account the max_path_length property in my extraction code.

My understanding of the extraction step is that I specify the target as say, the method name or a caption and in the list of contexts, I can specify any type of component suitable for my problem. My extracted rows deal with code lines individually and are of the form target type_of_statement|token_1|token_2 ..., where type_of_statement is chosen from a set of 25 possible values indicating the type of code statement and tokens are similar to your example.

So just to clarify, how would you suggest me to account for max_path_length in my extraction code?

urialon · 2019-03-13T14:10:03Z

My understanding of the extraction step is that I specify the target as say, the method name or a caption and in the list of contexts, I can specify any type of component suitable for my problem

That's right! There are several things to notice:

The words in target should be split by |, i.e.: print|bmp|to|file
The 3-tuple type_of_statement|token_1|token_2 should be split by comma (,) rather than |, and each of them internally should be split by |.
The network reads the 1st and 3rd fields as a set of subtokens, and the 2nd field as a sequence (using an LSTM). So I would suggest switch the order and make type_of_statement to be the middle field, and set config.MAX_PATH_LENGTH = 1. So finally it will look like:
print|bmp|to|file subtoken1|subtoken2|subtoken3,type_of_statement,subtoken4|subtoken5|subtoken6

Where subtoken1|subtoken2|subtoken3 are the components of token_1 in your example,
and subtoken4|subtoken5|subtoken6 are the components of token_2 from your example.
Since type_of_statement is a single value (rather than a sequence of symbols you can set config.MAX_PATH_LENGTH = 1 and training will be faster because the LSTM will not be used.

PankajB1997 · 2019-03-13T14:43:59Z

Thank you for your help, this clarified a lot!! :)

PankajB1997 · 2019-03-13T18:46:21Z

Btw, wanted to seek understanding on your usage of Abstract Syntax Tree in extraction step. Quoting from the paper:

Given the AST of a code snippet, we consider all pairwise paths between terminals, and represent them as sequences of terminal and nonterminal nodes. We then use these paths with their terminals’ values to represent the code snippet itself.

Does this mean that given the AST, you are extracting all possible terminal-to-terminal paths from the tree and extracting contexts in the form terminal node token, path of intermediate non-terminal nodes, terminal node token?

urialon · 2019-03-13T19:11:16Z

basically yes, see also Section 2 of the code2vec paper, where it is explained more thoroughly:
https://arxiv.org/abs/1803.09473

urialon closed this as completed Mar 20, 2019

yemao616 mentioned this issue Oct 8, 2020

Training Error tech-srl/code2vec#94

Closed

Tamal-Mondal mentioned this issue Jun 11, 2022

Tensorflow out-of-bound error while trying to train the Code2Seq model on our own python dataset #123

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invalid Argument Error? #4

Invalid Argument Error? #4

PankajB1997 commented Mar 13, 2019 •

edited

Loading

PankajB1997 commented Mar 13, 2019

urialon commented Mar 13, 2019 •

edited

Loading

PankajB1997 commented Mar 13, 2019 •

edited

Loading

urialon commented Mar 13, 2019

PankajB1997 commented Mar 13, 2019

PankajB1997 commented Mar 13, 2019

urialon commented Mar 13, 2019

Invalid Argument Error? #4

Invalid Argument Error? #4

Comments

PankajB1997 commented Mar 13, 2019 • edited Loading

PankajB1997 commented Mar 13, 2019

urialon commented Mar 13, 2019 • edited Loading

PankajB1997 commented Mar 13, 2019 • edited Loading

urialon commented Mar 13, 2019

PankajB1997 commented Mar 13, 2019

PankajB1997 commented Mar 13, 2019

urialon commented Mar 13, 2019

PankajB1997 commented Mar 13, 2019 •

edited

Loading

urialon commented Mar 13, 2019 •

edited

Loading

PankajB1997 commented Mar 13, 2019 •

edited

Loading