Tensorflow out-of-bound error while trying to train the Code2Seq model on our own python dataset #123

Tamal-Mondal · 2022-06-11T18:45:30Z

I am trying to deploy code2seq for code summarization task using our own python dataset. For this, I have used the steps mentioned in https://github.com/tech-srl/code2seq/tree/master/Python150kExtractor . I have made the necessary changes in the python extractor to parse our data and the final processed data seems to be correct visually. I am getting some internal error while trying to train the Code2Seq model by running the train_python150k.sh script.

I have attached the training logs below. It would be a great help if you can tell the problem or provide some lead.

code2seq training logs.txt

Thanks And Regards,
Tamal Mondal

Tamal-Mondal · 2022-06-11T19:01:01Z

Closing the issue as similar issues are addressed previously it seems, I will take a look and try to resolve.

Tamal-Mondal · 2022-06-11T19:15:44Z

Hi Team,

I found discussion about this error in some of the previous issues. You mentioned in some cases the issue is with MAX_PATH_LENGTH(#4 , #28 ) and in one case you mentioned the there is extra comma in extractor output(https://githubmemory.com/repo/tech-srl/code2vec/issues/94).

Can you please check and tell me in which way I should check or what's my issue?

Thanks & Regards,
Tamal Mondal

Tamal-Mondal · 2022-06-12T15:20:56Z

UPDATE

I did check if the length of paths is the issue or if there are extra commas or spaces. It turned out that both these cases were there probably. When I took care of extra commas or spaces(verified in the final extracted data for extras), in the extracted data, the maximum length between any two terminals is 8 across the whole dataset and the data is in the format of "target_sequence subtoken1|subtoken2|subtoken3,intermediate_nodes(| separated),subtoken4|subtoken5|subtoken6......"

I am still getting similar errors, but this time I got it after quite some time of starting the training which probably means the issue s in some other datapoint. Also, I did try to run the training script 2 times with 9 and 51 as the MAX_PATH_LENGTH and using the same dataset. For the first case, it gave an error during the first epoch itself and for the second case, EPOCH 0 got completed but gave a similar error in the next epoch(not sure how as during the first epoch only, the whole training dataset should get used). Also as with MAX_PATH_LENGTH = 51, one epoch got finished, not sure why for 9 it's failing as I verified every path length with a script(and the maximum should be 8).

I have attached the training logs for both the 2 cases separately, please have a look.

code2seq training logs - 9 max length.txt
code2seq training logs - 51 max length.txt

Thanks & Regards,
Tamal Mondal

Tamal-Mondal · 2022-06-13T12:56:31Z

UPDATE

One more thing that I noticed is, in every run, the place of invalid argument error is changing even though the dataset is same. Here are some of the examples:

Run 1:

2022-06-12 07:18:28.701173: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at sparse_to_dense_op.cc:128 : Invalid argument: indices[480] = [159,3] is out of bounds: need 0 <= index < [200,3]

Run 2:

2022-06-13 08:32:51.253112: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at sparse_to_dense_op.cc:128 : Invalid argument: indices[477] = [158,3] is out of bounds: need 0 <= index < [200,3]

Run 3:

2022-06-13 08:45:52.382922: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at sparse_to_dense_op.cc:128 : Invalid argument: indices[564] = [187,3] is out of bounds: need 0 <= index < [200,3]

Thanks & Regards,
Tamal Mondal

urialon · 2022-06-13T13:16:12Z

Hi @Tamal-Mondal ,
Thank you for your interest in our work!

Since the error says index < [200,3] , i suspect that you still have extra commas in either your sub tokens or paths.

Can you verify that?
Uri

Tamal-Mondal · 2022-06-14T08:37:06Z

Thanks a lot, @urialon for the quick reply, I really appreciate that. Yes, there was a silly issue and some extra spaces were in the final processed data. After I fixed that, the model is training now.

Will get back to you if any other issues occur.

Regards,
Tamal Mondal

Tamal-Mondal closed this as completed Jun 11, 2022

Tamal-Mondal reopened this Jun 11, 2022

Tamal-Mondal closed this as completed Jun 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensorflow out-of-bound error while trying to train the Code2Seq model on our own python dataset #123

Tensorflow out-of-bound error while trying to train the Code2Seq model on our own python dataset #123

Tamal-Mondal commented Jun 11, 2022 •

edited

Loading

Tamal-Mondal commented Jun 11, 2022

Tamal-Mondal commented Jun 11, 2022

Tamal-Mondal commented Jun 12, 2022

Tamal-Mondal commented Jun 13, 2022

urialon commented Jun 13, 2022

Tamal-Mondal commented Jun 14, 2022

Tensorflow out-of-bound error while trying to train the Code2Seq model on our own python dataset #123

Tensorflow out-of-bound error while trying to train the Code2Seq model on our own python dataset #123

Comments

Tamal-Mondal commented Jun 11, 2022 • edited Loading

Tamal-Mondal commented Jun 11, 2022

Tamal-Mondal commented Jun 11, 2022

Tamal-Mondal commented Jun 12, 2022

Tamal-Mondal commented Jun 13, 2022

urialon commented Jun 13, 2022

Tamal-Mondal commented Jun 14, 2022

Tamal-Mondal commented Jun 11, 2022 •

edited

Loading