Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preprocess.sh error (division by Zero) #158

Open
ShaliniR11 opened this issue Jul 18, 2022 · 14 comments
Open

Preprocess.sh error (division by Zero) #158

ShaliniR11 opened this issue Jul 18, 2022 · 14 comments

Comments

@ShaliniR11
Copy link

Hi Dr.Alon,
I have my own Java dataset and I am trying to preprocess it with the given Script. I have changed the path directories in the script.I get the following output:
shali@DESKTOP-JNLA5ED MINGW64 ~/Documents/Git/code2vec (master)
$ sh preprocess.sh
preprocess.sh: line 21: C:/Users/shali/Documents/Git/code2vec/data/javadata/train/: Is a directory
Extracting paths from validation set...
Finished extracting paths from validation set
Extracting paths from test set...
Finished extracting paths from test set
Extracting paths from training set...
Finished extracting paths from training set
Creating histograms from the training data
File: my_dataset.test.raw.txt
Traceback (most recent call last):
File "C:\Users\shali\Documents\Git\code2vec\preprocess.py", line 133, in
num_examples = process_file(file_path=data_file_path, data_file_role=data_role, dataset_name=args.output_name,
File "C:\Users\shali\Documents\Git\code2vec\preprocess.py", line 69, in process_file
print('Average total contexts: ' + str(float(sum_total) / total))
ZeroDivisionError: float division by zero.

My System Requirements:
I am using GitBash on Visual studio code to run the script.
OS: Windows 11
Java : java --version
openjdk 17.0.3 2022-04-19
OpenJDK Runtime Environment Temurin-17.0.3+7 (build 17.0.3+7)
OpenJDK 64-Bit Server VM Temurin-17.0.3+7 (build 17.0.3+7, mixed mode, sharing)
Python: python --version
Python 3.10.4
CUDA:
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_19:00:59_Pacific_Daylight_Time_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0
below is my preprocess.sh in txt format
preprocess.txt

Please let me know how to proceed further.

@urialon
Copy link
Collaborator

urialon commented Jul 21, 2022

Hi @ShaliniR11 ,
Sorry for the delayed response.

Did you notice that you have a "space" token in line 21, right before the path? can you delete this space and see if it helps?

Additionally, do you have subdirectories in the directory C:/Users/shali/Documents/Git/code2vec/data/javadata/train/? The code looks for subdirectories in the training path.

Best,
Uri

@ShaliniR11
Copy link
Author

Hi Dr. Alon, I have tried removing the space, the error still exists.
I have a single sub directory in each of train,test and val like
Screenshot (29)

and inside these subdirectories I have java files like this:
Screenshot (30)

@urialon
Copy link
Collaborator

urialon commented Jul 25, 2022

Can you try running the java process directly, e.g.,:

java -cp JavaExtractor/JPredict/target/JavaExtractor-0.0.1-SNAPSHOT.jar JavaExtractor.App --max_path_length 8 --max_path_width 2 --dir JavaExtractor/JPredict/src/main

?

@ShaliniR11
Copy link
Author

Can you try running the java process directly, e.g.,:

java -cp JavaExtractor/JPredict/target/JavaExtractor-0.0.1-SNAPSHOT.jar JavaExtractor.App --max_path_length 8 --max_path_width 2 --dir JavaExtractor/JPredict/src/main

?
This is the console output:
image

@urialon
Copy link
Collaborator

urialon commented Jul 25, 2022

OK so the base java process is running fine, it looks like the problem is in some input/output redirection because of Windows.
Can you try running that on a linux machine, or BashOnWindows?

@zunairazaman2021
Copy link

zunairazaman2021 commented Feb 8, 2023

Hi @urialon I am facing the same issue. @ShaliniR11 could you solve it?
(env_tensor) zunaira@snps-ubo9fomrduif C2V3 % source preprocess.sh
Extracting paths from validation set...
Finished extracting paths from validation set
Extracting paths from test set...
Finished extracting paths from test set
Extracting paths from training set...
Finished extracting paths from training set
Creating histograms from the training data
File: my_dataset.test.raw.txt
Traceback (most recent call last):
File "/Users/zunaira/Downloads/C2V3/preprocess.py", line 133, in
num_examples = process_file(file_path=data_file_path, data_file_role=data_role, dataset_name=args.output_name,
File "/Users/zunaira/Downloads/C2V3/preprocess.py", line 69, in process_file
print('Average total contexts: ' + str(float(sum_total) / total))
ZeroDivisionError: float division by zero

@urialon
Copy link
Collaborator

urialon commented Feb 8, 2023

Hi @zunairazaman2021 ,
Thank you for your interest in our work!

Can you try running the java process directly, as instructed earlier in this thread?

@zunairazaman2021
Copy link

@urialon Yes I did
Screenshot 2023-02-08 at 11 57 22

@zunairazaman2021
Copy link

zunairazaman2021 commented Feb 8, 2023

@urialon I tried this as well #109 but it didn't work
Note: I am using Mac M1 chip, and I just changed directories here as:
TRAIN_DIR=/Users/zunaira/Downloads/C2V3/tmp/train
VAL_DIR=/Users/zunaira/Downloads/C2V3/tmp/validation
TEST_DIR=/Users/zunaira/Downloads/C2V3/tmp/test

@zunairazaman2021
Copy link

Using #109 I get data stored in a tmp directory but still c2v and raw.txt files are empty. :(
Screenshot 2023-02-08 at 11 57 22

@zunairazaman2021
Copy link

Nevermind, Solved it with #109 :) Thanks

@urialon
Copy link
Collaborator

urialon commented Feb 8, 2023 via email

@Lufedi
Copy link

Lufedi commented Mar 13, 2023

+1 on #109 solution, I had the same issue and the PR from gOATiful made it work

@urialon
Copy link
Collaborator

urialon commented Mar 13, 2023

Thanks @Lufedi and @zunairazaman2021 , I merged that PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants