New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No module named cpp_extension #67
Comments
0.3.1 is way too old. Please install PyTorch master branch > 0.5.0 |
The version of python and torch are updated to 3.6 and 0.4.0 respectively. Follow the link you provided https://www.claudiokuenzler.com/blog/756/install-newer-ninja-build-tools-ubuntu-14.04-trusty#.WxYrvFMvzJw, I install ninja 1.8.2. However, when I run again the quick demo http://hangzh.com/PyTorch-Encoding/experiments/segmentation.html#install-package, I got another error. How can I solve it? I believe your papers and code can make me interested in semantic segmentation tasks. root@hh-Z97X-UD3H:/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master# python quick_demo.py During handling of the above exception, another exception occurred: Traceback (most recent call last): |
This package depend on a slightly higher version than PyTroch 0.4.0. Please follow the instructions to install pytorch from source https://github.com/pytorch/pytorch#from-source |
In your paper, the sentence ''The ground truth labels for SE-loss are generated by “unique” operation finding the categories presented in the given ground-truth segmentation mask.'' means that every input image has multiple labels. As far as I know, the binary cross entroy loss can handle binary class or multi-class task rather than multi-labels. |
I didn’t get the difference between multi class and multi labels. Could you please explain in detail? |
Multiclass classification means a classification task with more than two classes; e.g., classify a set of images of fruits which may be oranges, apples, or pears. Multiclass classification makes the assumption that each sample is assigned to one and only one label: a fruit can be either an apple or a pear but not both at the same time. |
The presence of the object categories is indeed a multi-label task. Each category is predicted independently using a binary prediction. I hope it can address your concern. |
Please refer to the docs for binary cross entropy loss https://pytorch.org/docs/stable/nn.html?highlight=bceloss#torch.nn.BCELoss |
In binary classification, the number of classes equals 2. The object categories in an input image are more than 2 (figure 2 in paper). So I don't understand why binary cross entropy loss is empolyed and ''Each category is predicted independently using a binary prediction. '' |
Each category is a binary classification problem. For 150 categories, there 150 individual binary classification problem. I hope this explanation is clear enough. If you still have difficulties, feel free to ask questions in Chinese. |
Thank you for your patience. Your explanation is clear. The binary cross entropy loss can handle the multi-label classification task. Its target is something like [1,0,0,1,0...]. Sigmoid, unlike softmax don't give probability distribution around NCLASS as output, but independent probabilities. |
You’re welcome. That is correct. |
I am really sorroy for disturbing you again. I shouldn't ask the question about installation PyTorch from source, but I have no idea to solve it. Can you help me to fix it out? System Info: How you installed PyTorch (conda, pip, source): source 3 errors detected in the compilation of "/tmp/tmpxft_00002a14_00000000-7_THCTensorMath.cpp1.ii". make[2]: *** [caffe2/CMakeFiles/caffe2_gpu.dir/__/aten/src/THC/caffe2_gpu_generated_THCTensorMath.cu.o] Error 1 |
Try install the dependencies as following first:
You may want to ask on PyTorch repo for further help |
Are the models you released (model_zoo.py) all trained with two Context Encoding Modules? Can you detail the MS evaluation in the table 1?
|
We only use one Context Encoding Module now, which is more efficient and makes the model compatible with EncNetV2. |
Can Ubuntu, Mac and Windows os all run the released codes? |
It mainly depends on the PyTorch. If the pytorch is compiled successfully on your system, there won't be a problem. I am using both Mac and Ubuntu. Note that PyTorch master branch is required. |
The comand (e.g., CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --dataset PContext --model EncNet --aux --se-loss --backbone resnet101) for training the model means training resnet101 from scratch or finetuning resnet101? |
resnet101 is pretrained from ImageNet. |
I used the comand (CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --dataset PContext --model EncNet --aux --se-loss) for training the model resnet50. However, when it ran to the epoch12, I stopped it. Next, I restart it and find unluckily it has ran from epoch0 rather than epoch12. What should I do to run it from epoch12? |
Please resume by adding command |
Thank you. I have another interest. When does PyTroch 0.4.0 meets the requirements of running released code ? |
This package won't be compatible with PyTroch 0.4.0, but it will be compatible with next stable release. |
Question about selayer, why does the selayer have no sigmoid activation function? (encmodule): EncModule( |
That is the prediction layer for minimizing SE-Loss. |
Sorry for bothering you agian, I have no idea with next errors when I run CUDA_VISIBLE_DEVICES=0,1 python train.py --dataset pcontext --model encnet --aux --se-loss. OS: ubuntu14.04 /usr/local/anaconda3/bin/python3.6 /media/cv-pc-00/QL_480G/sql/pytorch_dir/PyTorch-Encoding/experiments/segmentation/train.py --dataset PContext --model EncNet --se-loss During handling of the above exception, another exception occurred: Traceback (most recent call last): /usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/roi_align_kernel.cu(373): error: class "at::Context" has no member "getCurrentCUDAStream" /usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/roi_align_kernel.cu(420): error: class "at::Context" has no member "getCurrentCUDAStream" /usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/roi_align_kernel.cu(420): error: class "at::Context" has no member "getCurrentCUDAStream" 4 errors detected in the compilation of "/tmp/tmpxft_0000662c_00000000-7_roi_align_kernel.cpp1.ii". /usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/encoding_kernel.cu(341): error: class "at::Context" has no member "getCurrentCUDAStream" /usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/encoding_kernel.cu(364): error: class "at::Context" has no member "getCurrentCUDAStream" /usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/encoding_kernel.cu(391): error: class "at::Context" has no member "getCurrentCUDAStream" 4 errors detected in the compilation of "/tmp/tmpxft_00006623_00000000-7_encoding_kernel.cpp1.ii". /usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/syncbn_kernel.cu(217): error: class "at::Context" has no member "getCurrentCUDAStream" /usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/syncbn_kernel.cu(249): error: class "at::Context" has no member "getCurrentCUDAStream" /usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/syncbn_kernel.cu(272): error: class "at::Context" has no member "getCurrentCUDAStream" 4 errors detected in the compilation of "/tmp/tmpxft_00006627_00000000-7_syncbn_kernel.cpp1.ii". Process finished with exit code 1 |
Hi, That is because the PyTorch updates in backend.
This will be fixed in next version. |
Thanks for your attention. It does work! However, three warnings occur, do that matter?
|
The deprecate warning is okay for now. |
Problem with debugging the backward method of Function class Hi, aggregate(A, X, C) and scaledL2(X, C, S) in encoding.functions.encoding.py implement the forward and backwark of your custom function. I want to debug their forward and backwark and the pycharm-community-2018.1.4 I used on Ubuntu 16.04 LTS has allowed me debug the forward step by step. However, I could not debug backward function like forward equipped with 2 1080 GPU. |
You can directly call the backend function for debugging https://github.com/zhanghang1989/PyTorch-Encoding/blob/master/encoding/functions/encoding.py#L77 |
For my special case, I want to run the codes with one GPU (ps: my machine is equipped with 2 GPUs), for example debugging the codes, etc. |
CUDA_VISIBLE_DEVICES=0 python train.py ... |
Question 1 Connected to pydev debugger (build 181.5087.37) Question 2 Question 3 I am looking forward to your reply. |
Q1: please use the terminal to launch the program. |
For the Q2 above, due to the limited GPU memory, the batch size has to be small (typically less than 16) unfortunately. It means that I have to use smaller LR according to the standard setting, i.e., args.lr = lrs[args.dataset.lower()] / 16 * args.batch_size ? |
Yes. If the batch size is too small, the model will get worse result, because the working batch size for batch normalization is small. |
I only have 2 1080 GPUs with a total of 16G memory. The batch size is small less than 16 in my experiments. Can I alleviate this side effect (the model will get worse result you said) by using larger LR and set args.lr = lrs[args.dataset.lower()], independent of batch size? |
The batch size matters for segmentation task, due to working batch size for the Synchronize Batch Normalization. For batch size =16 yields the best performance. |
What is the main difference between encoding.nn.BatchNorm1d and encoding.nn.BatchNorm2d? |
same as |
I have two questions. |
I'm sorry for disturbing you again. |
We change LR every iter. |
question 1: question 2: |
Q1. For VOC experiments, first pretrained on COCO, then finetune on "pascal_aug" and finally on "pascal_voc". I am releasing the training detail for reproducing VOC experiments this weekend. |
Question 1: |
There are some bugs in existing code. I am updating them soon. |
Question 1: |
The validation during the training is using center crop, only for monitoring the training process. |
Hi, I got the error named No module named cpp_extension (from torch.utils.cpp_extension import load) when I run the quick demo http://hangzh.com/PyTorch-Encoding/experiments/segmentation.html#install-package. The version of python and torch are 2.7 and 0.3.1 respectively. How can I handle it?
The text was updated successfully, but these errors were encountered: