Stuck in the process of compiling C++ extensions #1

WoodsGao · 2019-10-17T06:52:59Z

CUDA VERSION:9.0
Python VERSION:3.6.8
Pytorch VERSION:1.2.0

I downloaded the tmm17 dataset and pre-trained model from Google Drive and used the command

sudo python eval.py -d 0 logs/tmm/config.yaml logs/tmm/checkpoint_latest.pth.tar

to evaluate the tmm17 dataset, but after outputting

Let's use 1 GPU(s)!

, the program has no other output. When I interrupt the program, I can see the program stucked in the "torch.utils.cpp_extension.load" function.
Is there any problem with this operation?

This is the complete output:

{   'io': {   'augmentation_level': 2,
              'datadir': 'data/tmm17',
              'dataset': 'TMM17',
              'focal_length': 1,
              'logdir': 'logs/',
              'num_vpts': 1,
              'num_workers': 4,
              'resume_from': 'logs/ultimate-suw-3xlr-fixdata',
              'tensorboard_port': 0,
              'validation_debug': -1,
              'validation_interval': 24000},
    'model': {   'backbone': 'stacked_hourglass',
                 'batch_size': 6,
                 'cat_vpts': True,
                 'conic_6x': False,
                 'depth': 4,
                 'fc_channel': 1024,
                 'im2col_step': 32,
                 'multires': <BoxList: [0.0013457768043554, 0.0051941870036646, 0.02004838034795, 0.0774278195486317, 0.299564810864565]>,
                 'num_blocks': 1,
                 'num_stacks': 1,
                 'num_steps': 4,
                 'output_stride': 4,
                 'smp_multiplier': 2,
                 'smp_neg': 1,
                 'smp_pos': 1,
                 'smp_rnd': 3,
                 'upsample_scale': 1},
    'optim': {   'amsgrad': True,
                 'lr': 3e-05,
                 'lr_decay_epoch': 365,
                 'max_epoch': 400,
                 'name': 'Adam',
                 'weight_decay': 3e-05}}
Let's use 1 GPU(s)!
^CTraceback (most recent call last):
  File "eval.py", line 179, in <module>
    main()
  File "eval.py", line 83, in main
    model, C.model.output_stride, C.model.upsample_scale
  File "/workspace/neurvps/neurvps/models/vanishing_net.py", line 23, in __init__
    self.anet = ApolloniusNet(output_stride, upsample_scale)
  File "/workspace/neurvps/neurvps/models/vanishing_net.py", line 95, in __init__
    self.conv1 = ConicConv(32, 64)
  File "/workspace/neurvps/neurvps/models/conic.py", line 19, in __init__
    bias=bias,
  File "/workspace/neurvps/neurvps/models/deformable.py", line 132, in __init__
    DCN = load_cpp_ext("DCN")
  File "/workspace/neurvps/neurvps/models/deformable.py", line 29, in load_cpp_ext
    build_directory=tar_dir,
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 649, in load
    is_python_module)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 822, in _jit_compile
    baton.wait()
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/file_baton.py", line 49, in wait
    time.sleep(self.wait_seconds)
KeyboardInterrupt

The text was updated successfully, but these errors were encountered:

zhou13 · 2019-10-18T21:13:50Z

torch.utils.cpp_extension.load is the function to compile the C++/CUDA code. With the provided information, I cannot see what is the problem. Do you have any CPU load when the program stucks? Maybe you can test other PyTorch code with dynamic compilation. Or you could comment out

neurvps/neurvps/models/deformable.py

Line 21 in e2f8d19

warnings.simplefilter("ignore")

to see if you could get more warnings.

zhou13 · 2019-11-02T10:42:29Z

Feel free to reopen this issue if you have more clues and updates.

yashnsn · 2021-03-12T12:20:49Z

hello
I was trying to infer stylegan2 pytorch version model but getting the same issue.. Please help me out if you found the solution.

Agrechka · 2021-03-16T16:38:49Z

hello
I was trying to infer stylegan2 pytorch version model but getting the same issue.. Please help me out if you found the solution.

did you fix the issue? I have the same problem. Thanks

KellyYutongHe · 2021-04-16T04:31:04Z

@Agrechka and @yashnsn I found a solution if you guys still need it: go to your .cache directory, delete the lock file for your cpp extension (it is likely under the directory ~/.cache/torch_extensions/something), and you should be able to run it again.

If you can't find your cache directory, you can run python -m pdb your_program.py and break at your .../lib/python3.X/site-packages/torch/utils/cpp_extension.py line 1179 (specifically the line containing "baton = FileBaton(os.path.join(build_directory, 'lock'))") and then print "build_directory". That should be the cache directory for your programs.

Hope this helps!

SamchungHwang · 2021-09-06T04:23:47Z

I remove the .cache directory. But the same issue occurs.

lzaazl · 2021-12-15T12:43:07Z

Exact the same issue as yashnsn and Agrechka, Thank you so much @KellyYutongHe

biphasic · 2022-01-18T17:14:31Z

@KellyYutongHe you're a hero

JANVI2411 · 2022-01-25T12:47:16Z

@Agrechka and @yashnsn I found a solution if you guys still need it: go to your .cache directory, delete the lock file for your cpp extension (it is likely under the directory ~/.cache/torch_extensions/something), and you should be able to run it again.

If you can't find your cache directory, you can run python -m pdb your_program.py and break at your .../lib/python3.X/site-packages/torch/utils/cpp_extension.py line 1179 (specifically the line containing "baton = FileBaton(os.path.join(build_directory, 'lock'))") and then print "build_directory". That should be the cache directory for your programs.

Hope this helps!

@KellyYutongHe Thank you so much !! You saved my lot of time.

AndyJZhao · 2023-02-13T13:57:51Z

@Agrechka and @yashnsn I found a solution if you guys still need it: go to your .cache directory, delete the lock file for your cpp extension (it is likely under the directory ~/.cache/torch_extensions/something), and you should be able to run it again.

If you can't find your cache directory, you can run python -m pdb your_program.py and break at your .../lib/python3.X/site-packages/torch/utils/cpp_extension.py line 1179 (specifically the line containing "baton = FileBaton(os.path.join(build_directory, 'lock'))") and then print "build_directory". That should be the cache directory for your programs.

Hope this helps!

Thanks for the great answer. Also, for those who have difficulties finding what the "something" is in the "~/.cache/torch_extensions/something". I found it useful to evaluate the expression "os.path.join(build_directory, 'lock')" in some remote debug session (I use Pycharm remote debugging) and you will get what you want. For me, the "something" happens to be the "spmm_0". Therefore, after "rm -rf ~/.cache/torch_extensions/spmm_0", the bug is fixed.

hhuang-code · 2023-03-13T14:23:39Z

@Agrechka and @yashnsn I found a solution if you guys still need it: go to your .cache directory, delete the lock file for your cpp extension (it is likely under the directory ~/.cache/torch_extensions/something), and you should be able to run it again.

If you can't find your cache directory, you can run python -m pdb your_program.py and break at your .../lib/python3.X/site-packages/torch/utils/cpp_extension.py line 1179 (specifically the line containing "baton = FileBaton(os.path.join(build_directory, 'lock'))") and then print "build_directory". That should be the cache directory for your programs.

Hope this helps!

It works!

zhou13 added the help wanted Extra attention is needed label Oct 18, 2019

zhou13 closed this as completed Nov 2, 2019

PeterWang512 mentioned this issue Oct 2, 2021

CUSTOM C++ AND CUDA EXTENSIONS PeterWang512/GANSketching#4

Open

AlexTitovWork mentioned this issue Nov 27, 2021

error in converting .pkl to .pt rosinality/stylegan2-pytorch#250

Open

bamsumit mentioned this issue Mar 22, 2022

Hanging code block issue lava-nc/lava-dl#55

Closed

12 tasks

LongruiDong mentioned this issue Jul 6, 2022

Compilation failed when first run main.py huangjh-pub/di-fusion#8

Closed

liruilong940607 mentioned this issue Oct 15, 2022

Code gets stuck when updating occupancy grid nerfstudio-project/nerfacc#70

Closed

JohanYe mentioned this issue Apr 12, 2023

python build_pkg.py failed nv-tlabs/LION#19

Open

Zhuhh0311 mentioned this issue May 25, 2023

Setting up PyTorch plugin "upfirdn2d_plugin"... Failed! NVlabs/stylegan2-ada-pytorch#215

Open

orrblue mentioned this issue Jul 2, 2023

Not training and Nerfstudio viewer doesn't show scene lsongx/nerfplayer-nerfstudio#3

Closed

JuliaA369 mentioned this issue Jul 21, 2023

Code Hangs Indefinitely When Evaluating Neuron Models on GPU lava-nc/lava-dl#182

Closed

turboderp mentioned this issue Apr 25, 2024

Out of memory turboderp/exui#46

Open

matheusft mentioned this issue Aug 13, 2024

Stuck at " Loading segmented_maxsim_cpp extension (set COLBERT_LOAD_TORCH_EXTENSION_VERBOSE=True for more info)..." AnswerDotAI/RAGatouille#213

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stuck in the process of compiling C++ extensions #1

Stuck in the process of compiling C++ extensions #1

WoodsGao commented Oct 17, 2019

zhou13 commented Oct 18, 2019

zhou13 commented Nov 2, 2019

yashnsn commented Mar 12, 2021

Agrechka commented Mar 16, 2021

KellyYutongHe commented Apr 16, 2021

SamchungHwang commented Sep 6, 2021

lzaazl commented Dec 15, 2021

biphasic commented Jan 18, 2022

JANVI2411 commented Jan 25, 2022

AndyJZhao commented Feb 13, 2023

hhuang-code commented Mar 13, 2023

Stuck in the process of compiling C++ extensions #1

Stuck in the process of compiling C++ extensions #1

Comments

WoodsGao commented Oct 17, 2019

zhou13 commented Oct 18, 2019

zhou13 commented Nov 2, 2019

yashnsn commented Mar 12, 2021

Agrechka commented Mar 16, 2021

KellyYutongHe commented Apr 16, 2021

SamchungHwang commented Sep 6, 2021

lzaazl commented Dec 15, 2021

biphasic commented Jan 18, 2022

JANVI2411 commented Jan 25, 2022

AndyJZhao commented Feb 13, 2023

hhuang-code commented Mar 13, 2023