Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An error occurs when test with PoinTr on my own dataset:subprocess.CalledProcessError #50

Closed
zura-false opened this issue Apr 27, 2022 · 2 comments

Comments

@zura-false
Copy link

Hi !
I am trying to train the PoinTr pretrained model on my own dataset use the method in #11 but after training to the last epoch I get these error :
` warnings.warn("Unable to load pointnet2_ops cpp extension. JIT Compiling.")
Traceback (most recent call last):
File "main.py", line 68, in
main()
File "main.py", line 64, in main
run_net(args, config, train_writer, val_writer)
File "/home/***/codes/PoinTr/tools/runner.py", line 162, in run_net
train_writer.close()
AttributeError: 'NoneType' object has no attribute 'close'

2022-04-27 12:12:53,925 - PoinTr - INFO - [Validation] EPOCH: 300 Metrics = ['0.3882', '14.8132', '0.7327']
2022-04-27 12:12:53,926 - PoinTr - INFO - ============================ TEST RESULTS ============================
2022-04-27 12:12:53,926 - PoinTr - INFO - Taxonomy #Sample F-Score CDL1 CDL2 #ModelName
2022-04-27 12:12:53,926 - PoinTr - INFO - 02691156 5 0.388 14.813 0.733 airplane
2022-04-27 12:12:53,927 - PoinTr - INFO - Overall 0.388 14.813 0.733
Killing subprocess 19184
Killing subprocess 19185
Traceback (most recent call last):
File "/home//anaconda3/envs/PoinTr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/
/anaconda3/envs/PoinTr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home//anaconda3/envs/PoinTr/lib/python3.7/site-packages/torch/distributed/launch.py", line 340, in
main()
File "/home/
/anaconda3/envs/PoinTr/lib/python3.7/site-packages/torch/distributed/launch.py", line 326, in main
sigkill_handler(signal.SIGTERM, None) # not coming back
File "/home//anaconda3/envs/PoinTr/lib/python3.7/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler
raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/
/anaconda3/envs/PoinTr/bin/python', '-u', 'main.py', '--local_rank=1', '--launcher', 'pytorch', '--sync_bn', '--config', './cfgs/JYG_models/PoinTr.yaml', '--exp_name', 'example']' returned non-zero exit status 1.
by running this code:CUDA_VISIBLE_DEVICES=0,1 bash ./scripts/dist_train.sh 2 13232
--config ./cfgs/MYDATA_models/PoinTr.yaml
--exp_name example`
I use two Nvidia 3060 GPUs
Could you please give some advice,thanks!

@yuxumin
Copy link
Owner

yuxumin commented Apr 27, 2022

Hi,
see #32

@zura-false
Copy link
Author

Hi, see #32

Thanks, this works

@yuxumin yuxumin closed this as completed May 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants