-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tensorflow out-of-bound error while trying to train the Code2Seq model on our own python dataset #123
Comments
Closing the issue as similar issues are addressed previously it seems, I will take a look and try to resolve. |
Hi Team, I found discussion about this error in some of the previous issues. You mentioned in some cases the issue is with MAX_PATH_LENGTH(#4 , #28 ) and in one case you mentioned the there is extra comma in extractor output(https://githubmemory.com/repo/tech-srl/code2vec/issues/94). Can you please check and tell me in which way I should check or what's my issue? Thanks & Regards, |
UPDATE I did check if the length of paths is the issue or if there are extra commas or spaces. It turned out that both these cases were there probably. When I took care of extra commas or spaces(verified in the final extracted data for extras), in the extracted data, the maximum length between any two terminals is 8 across the whole dataset and the data is in the format of "target_sequence subtoken1|subtoken2|subtoken3,intermediate_nodes(| separated),subtoken4|subtoken5|subtoken6......" I am still getting similar errors, but this time I got it after quite some time of starting the training which probably means the issue s in some other datapoint. Also, I did try to run the training script 2 times with 9 and 51 as the MAX_PATH_LENGTH and using the same dataset. For the first case, it gave an error during the first epoch itself and for the second case, EPOCH 0 got completed but gave a similar error in the next epoch(not sure how as during the first epoch only, the whole training dataset should get used). Also as with MAX_PATH_LENGTH = 51, one epoch got finished, not sure why for 9 it's failing as I verified every path length with a script(and the maximum should be 8). I have attached the training logs for both the 2 cases separately, please have a look. code2seq training logs - 9 max length.txt Thanks & Regards, |
UPDATE One more thing that I noticed is, in every run, the place of invalid argument error is changing even though the dataset is same. Here are some of the examples: Run 1: 2022-06-12 07:18:28.701173: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at sparse_to_dense_op.cc:128 : Invalid argument: indices[480] = [159,3] is out of bounds: need 0 <= index < [200,3] Run 2: 2022-06-13 08:32:51.253112: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at sparse_to_dense_op.cc:128 : Invalid argument: indices[477] = [158,3] is out of bounds: need 0 <= index < [200,3] Run 3: 2022-06-13 08:45:52.382922: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at sparse_to_dense_op.cc:128 : Invalid argument: indices[564] = [187,3] is out of bounds: need 0 <= index < [200,3] Thanks & Regards, |
Hi @Tamal-Mondal , Since the error says Can you verify that? |
Thanks a lot, @urialon for the quick reply, I really appreciate that. Yes, there was a silly issue and some extra spaces were in the final processed data. After I fixed that, the model is training now. Will get back to you if any other issues occur. Regards, |
Hello @urialon and @stasbel,
I am trying to deploy code2seq for code summarization task using our own python dataset. For this, I have used the steps mentioned in https://github.com/tech-srl/code2seq/tree/master/Python150kExtractor . I have made the necessary changes in the python extractor to parse our data and the final processed data seems to be correct visually. I am getting some internal error while trying to train the Code2Seq model by running the train_python150k.sh script.
I have attached the training logs below. It would be a great help if you can tell the problem or provide some lead.
code2seq training logs.txt
Thanks And Regards,
Tamal Mondal
The text was updated successfully, but these errors were encountered: