-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
error of preprocess #50
Comments
I modified a little of java large data set but I did not rewrite anything for code2seq, could you please help me about my issue? Thanks a lot! |
Hi @lizhuo-1994 , What is your Java version? Please run "java --version" |
Additionally - can you try to run the extractor directly, without the python wrapper: java -cp JavaExtractor/JPredict/target/JavaExtractor-0.0.1-SNAPSHOT.jar JavaExtractor.App --max_path_length 8 --max_path_width 2 --dir JavaExtractor/JPredict/src/main |
$ java -version $ java -cp JavaExtractor/JPredict/target/JavaExtractor-0.0.1-SNAPSHOT.jar JavaExtractor.App --max_path_length 8 --max_path_width 2 --dir JavaExtractor/JPredict/src/main Error: Could not find or load main class JavaExtractor.App here is my result, thanks for helping~ |
Did you run this from the main code2seq directory? Does the jar file exist? If the file exists, then please run:
|
it works! it seems that I lost *.jar file and I found it back ,thanks for helping! |
but here is another problem: Extracting paths from validation set... |
maybe it is because of timeout , I will try it again, thanks ~ |
Yes, there are timeouts, and we originally used a 64-cores machine to preprocess the datasets. By default, 6 processes run in parallel (see: https://github.com/tech-srl/code2seq/blob/master/JavaExtractor/extract.py#L66 and each of them runs with 64 threads (see: https://github.com/tech-srl/code2seq/blob/master/preprocess.sh#L32) To verify that preprocessing runs on a small dataset, you can try preprocessing the JavaExtractor itself. I.e., point the training+test+validation paths to |
thanks for the explanation, I re-configured it and now it seems working well. BTW, it is really disk-consuming and time-consuming, so I think it would be running about 2-3days for preprocessing |
Unfortunately, that's right. I'm closing this issue for now, feel free to re-open if you have any additional question. |
Hello, may I ask the specific configuration of your machine and the last parameter you used? Thanks a lot! |
Extracting paths from validation set...
Finished extracting paths from validation set
Extracting paths from test set...
Finished extracting paths from test set
Extracting paths from training set...
dir: data/train was not completed in time
Finished extracting paths from training set
Creating histograms from the training data
subtoken vocab size: 0
node vocab size: 0
target vocab size: 0
File: 1.test.raw.txt
Traceback (most recent call last):
File "preprocess.py", line 115, in
max_contexts=int(args.max_contexts), max_data_contexts=int(args.max_data_contexts))
File "preprocess.py", line 53, in process_file
print('Average total contexts: ' + str(float(sum_total) / total))
ZeroDivisionError: float division by zero
here is my preprocess.sh:
TRAIN_DIR=data/train
VAL_DIR=data/validation
TEST_DIR=data/tes
DATASET_NAME=1
MAX_DATA_CONTEXTS=1000
MAX_CONTEXTS=200
SUBTOKEN_VOCAB_SIZE=186277
TARGET_VOCAB_SIZE=26347
NUM_THREADS=1
PYTHON=python3.7
The text was updated successfully, but these errors were encountered: