Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensorflow out-of-bound error while trying to train the Code2Seq model on our own python dataset #123

Closed
Tamal-Mondal opened this issue Jun 11, 2022 · 6 comments

Comments

@Tamal-Mondal
Copy link

Tamal-Mondal commented Jun 11, 2022

Hello @urialon and @stasbel,

I am trying to deploy code2seq for code summarization task using our own python dataset. For this, I have used the steps mentioned in https://github.com/tech-srl/code2seq/tree/master/Python150kExtractor . I have made the necessary changes in the python extractor to parse our data and the final processed data seems to be correct visually. I am getting some internal error while trying to train the Code2Seq model by running the train_python150k.sh script.

I have attached the training logs below. It would be a great help if you can tell the problem or provide some lead.

code2seq training logs.txt

Thanks And Regards,
Tamal Mondal

@Tamal-Mondal
Copy link
Author

Closing the issue as similar issues are addressed previously it seems, I will take a look and try to resolve.

@Tamal-Mondal
Copy link
Author

Hi Team,

I found discussion about this error in some of the previous issues. You mentioned in some cases the issue is with MAX_PATH_LENGTH(#4 , #28 ) and in one case you mentioned the there is extra comma in extractor output(https://githubmemory.com/repo/tech-srl/code2vec/issues/94).

Can you please check and tell me in which way I should check or what's my issue?

Thanks & Regards,
Tamal Mondal

@Tamal-Mondal Tamal-Mondal reopened this Jun 11, 2022
@Tamal-Mondal
Copy link
Author

UPDATE

I did check if the length of paths is the issue or if there are extra commas or spaces. It turned out that both these cases were there probably. When I took care of extra commas or spaces(verified in the final extracted data for extras), in the extracted data, the maximum length between any two terminals is 8 across the whole dataset and the data is in the format of "target_sequence subtoken1|subtoken2|subtoken3,intermediate_nodes(| separated),subtoken4|subtoken5|subtoken6......"

I am still getting similar errors, but this time I got it after quite some time of starting the training which probably means the issue s in some other datapoint. Also, I did try to run the training script 2 times with 9 and 51 as the MAX_PATH_LENGTH and using the same dataset. For the first case, it gave an error during the first epoch itself and for the second case, EPOCH 0 got completed but gave a similar error in the next epoch(not sure how as during the first epoch only, the whole training dataset should get used). Also as with MAX_PATH_LENGTH = 51, one epoch got finished, not sure why for 9 it's failing as I verified every path length with a script(and the maximum should be 8).

I have attached the training logs for both the 2 cases separately, please have a look.

code2seq training logs - 9 max length.txt
code2seq training logs - 51 max length.txt

Thanks & Regards,
Tamal Mondal

@Tamal-Mondal
Copy link
Author

UPDATE

One more thing that I noticed is, in every run, the place of invalid argument error is changing even though the dataset is same. Here are some of the examples:

Run 1:

2022-06-12 07:18:28.701173: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at sparse_to_dense_op.cc:128 : Invalid argument: indices[480] = [159,3] is out of bounds: need 0 <= index < [200,3]

Run 2:

2022-06-13 08:32:51.253112: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at sparse_to_dense_op.cc:128 : Invalid argument: indices[477] = [158,3] is out of bounds: need 0 <= index < [200,3]

Run 3:

2022-06-13 08:45:52.382922: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at sparse_to_dense_op.cc:128 : Invalid argument: indices[564] = [187,3] is out of bounds: need 0 <= index < [200,3]

Thanks & Regards,
Tamal Mondal

@urialon
Copy link
Contributor

urialon commented Jun 13, 2022

Hi @Tamal-Mondal ,
Thank you for your interest in our work!

Since the error says index < [200,3] , i suspect that you still have extra commas in either your sub tokens or paths.

Can you verify that?
Uri

@Tamal-Mondal
Copy link
Author

Thanks a lot, @urialon for the quick reply, I really appreciate that. Yes, there was a silly issue and some extra spaces were in the final processed data. After I fixed that, the model is training now.

Will get back to you if any other issues occur.

Regards,
Tamal Mondal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants