Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No module named cpp_extension #67

Closed
qiulesun opened this issue Jun 9, 2018 · 51 comments
Closed

No module named cpp_extension #67

qiulesun opened this issue Jun 9, 2018 · 51 comments
Labels

Comments

@qiulesun
Copy link

qiulesun commented Jun 9, 2018

Hi, I got the error named No module named cpp_extension (from torch.utils.cpp_extension import load) when I run the quick demo http://hangzh.com/PyTorch-Encoding/experiments/segmentation.html#install-package. The version of python and torch are 2.7 and 0.3.1 respectively. How can I handle it?

@zhanghang1989
Copy link
Owner

0.3.1 is way too old. Please install PyTorch master branch > 0.5.0

@qiulesun
Copy link
Author

The version of python and torch are updated to 3.6 and 0.4.0 respectively. Follow the link you provided https://www.claudiokuenzler.com/blog/756/install-newer-ninja-build-tools-ubuntu-14.04-trusty#.WxYrvFMvzJw, I install ninja 1.8.2. However, when I run again the quick demo http://hangzh.com/PyTorch-Encoding/experiments/segmentation.html#install-package, I got another error. How can I solve it? I believe your papers and code can make me interested in semantic segmentation tasks.

root@hh-Z97X-UD3H:/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master# python quick_demo.py
Traceback (most recent call last):
File "/usr/anaconda3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 576, in _build_extension_module
['ninja', '-v'], stderr=subprocess.STDOUT, cwd=build_directory)
File "/usr/anaconda3/lib/python3.6/subprocess.py", line 336, in check_output
**kwargs).stdout
File "/usr/anaconda3/lib/python3.6/subprocess.py", line 418, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "demo.py", line 2, in
import encoding
File "/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/init.py", line 13, in
from . import nn, functions, dilated, parallel, utils, models, datasets
File "/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/nn/init.py", line 12, in
from .encoding import *
File "/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/nn/encoding.py", line 18, in
from ..functions import scaledL2, aggregate
File "/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/functions/init.py", line 2, in
from .encoding import *
File "/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/functions/encoding.py", line 13, in
from .. import lib
File "/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/init.py", line 12, in
], build_directory=cpu_path, verbose=False)
File "/usr/anaconda3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 501, in load
_build_extension_module(name, build_directory)
File "/usr/anaconda3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 582, in _build_extension_module
name, error.output.decode()))
RuntimeError: Error building extension 'enclib_cpu': [1/2] c++ -MMD -MF roi_align_cpu.o.d -DTORCH_EXTENSION_NAME=enclib_cpu -I/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/anaconda3/include/python3.6m -fPIC -std=c++11 -c /media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp -o roi_align_cpu.o
FAILED: roi_align_cpu.o
c++ -MMD -MF roi_align_cpu.o.d -DTORCH_EXTENSION_NAME=enclib_cpu -I/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/anaconda3/include/python3.6m -fPIC -std=c++11 -c /media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp -o roi_align_cpu.o
In file included from /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/ArrayRef.h:18:0,
from /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/ScalarType.h:5,
from /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Scalar.h:11,
from /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/ATen.h:6,
from /media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:1:
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp: In function ‘at::Tensor ROIAlignForwardCPU(const at::Tensor&, const at::Tensor&, int64_t, int64_t, double, int64_t)’:
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:18: error: expected primary-expression before ‘(’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:388:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(input.is_contiguous());
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:62: error: expected primary-expression before ‘)’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:388:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(input.is_contiguous());
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:18: error: expected primary-expression before ‘(’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:389:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(bottom_rois.is_contiguous());
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:62: error: expected primary-expression before ‘)’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:389:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(bottom_rois.is_contiguous());
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:18: error: expected primary-expression before ‘(’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:390:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(input.ndimension() == 4);
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:62: error: expected primary-expression before ‘)’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:390:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(input.ndimension() == 4);
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:18: error: expected primary-expression before ‘(’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:391:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(bottom_rois.ndimension() == 2);
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:62: error: expected primary-expression before ‘)’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:391:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(bottom_rois.ndimension() == 2);
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:18: error: expected primary-expression before ‘(’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:392:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(bottom_rois.size(1) == 5);
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:62: error: expected primary-expression before ‘)’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:392:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(bottom_rois.size(1) == 5);
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:18: error: expected primary-expression before ‘(’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:404:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(roi_cols == 4 || roi_cols == 5);
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:62: error: expected primary-expression before ‘)’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:404:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(roi_cols == 4 || roi_cols == 5);
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:18: error: expected primary-expression before ‘(’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:409:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(input.is_contiguous());
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:62: error: expected primary-expression before ‘)’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:409:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(input.is_contiguous());
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:18: error: expected primary-expression before ‘(’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:410:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(bottom_rois.is_contiguous());
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:62: error: expected primary-expression before ‘)’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:410:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(bottom_rois.is_contiguous());
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp: In function ‘at::Tensor ROIAlignBackwardCPU(const at::Tensor&, const at::Tensor&, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, double, int64_t)’:
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:18: error: expected primary-expression before ‘(’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:444:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(bottom_rois.is_contiguous());
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:62: error: expected primary-expression before ‘)’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:444:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(bottom_rois.is_contiguous());
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:18: error: expected primary-expression before ‘(’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:445:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(bottom_rois.ndimension() == 2);
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:62: error: expected primary-expression before ‘)’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:445:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(bottom_rois.ndimension() == 2);
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:18: error: expected primary-expression before ‘(’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:446:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(bottom_rois.size(1) == 5);
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:62: error: expected primary-expression before ‘)’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:446:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(bottom_rois.size(1) == 5);
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:18: error: expected primary-expression before ‘(’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:451:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(roi_cols == 4 || roi_cols == 5);
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:62: error: expected primary-expression before ‘)’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:451:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(roi_cols == 4 || roi_cols == 5);
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:18: error: expected primary-expression before ‘(’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:456:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(bottom_rois.is_contiguous());
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:62: error: expected primary-expression before ‘)’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:456:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(bottom_rois.is_contiguous());
^
ninja: build stopped: subcommand failed.

@zhanghang1989
Copy link
Owner

This package depend on a slightly higher version than PyTroch 0.4.0. Please follow the instructions to install pytorch from source https://github.com/pytorch/pytorch#from-source

@qiulesun
Copy link
Author

In your paper, the sentence ''The ground truth labels for SE-loss are generated by “unique” operation finding the categories presented in the given ground-truth segmentation mask.'' means that every input image has multiple labels. As far as I know, the binary cross entroy loss can handle binary class or multi-class task rather than multi-labels.

@zhanghang1989
Copy link
Owner

zhanghang1989 commented Jun 19, 2018

I didn’t get the difference between multi class and multi labels. Could you please explain in detail?
Btw, the NN already has sigmoid activation

@qiulesun
Copy link
Author

Multiclass classification means a classification task with more than two classes; e.g., classify a set of images of fruits which may be oranges, apples, or pears. Multiclass classification makes the assumption that each sample is assigned to one and only one label: a fruit can be either an apple or a pear but not both at the same time.
Multilabel classification assigns to each sample a set of target labels. This can be thought as predicting properties of a data-point that are not mutually exclusive, such as topics that are relevant for a document. A text might be about any of religion, politics, finance or education at the same time or none of these.
I note that the NN has sigmoid activation. I hold the question that, in your case, the input image has multiple labels or one.

@zhanghang1989
Copy link
Owner

The presence of the object categories is indeed a multi-label task. Each category is predicted independently using a binary prediction. I hope it can address your concern.

@zhanghang1989
Copy link
Owner

zhanghang1989 commented Jun 20, 2018

Please refer to the docs for binary cross entropy loss https://pytorch.org/docs/stable/nn.html?highlight=bceloss#torch.nn.BCELoss

@qiulesun
Copy link
Author

qiulesun commented Jun 20, 2018

In binary classification, the number of classes equals 2. The object categories in an input image are more than 2 (figure 2 in paper). So I don't understand why binary cross entropy loss is empolyed and ''Each category is predicted independently using a binary prediction. ''

@zhanghang1989
Copy link
Owner

Each category is a binary classification problem. For 150 categories, there 150 individual binary classification problem. I hope this explanation is clear enough. If you still have difficulties, feel free to ask questions in Chinese.

@qiulesun
Copy link
Author

Thank you for your patience. Your explanation is clear. The binary cross entropy loss can handle the multi-label classification task. Its target is something like [1,0,0,1,0...]. Sigmoid, unlike softmax don't give probability distribution around NCLASS as output, but independent probabilities.

@zhanghang1989
Copy link
Owner

You’re welcome. That is correct.

@qiulesun
Copy link
Author

qiulesun commented Jun 24, 2018

I am really sorroy for disturbing you again. I shouldn't ask the question about installation PyTorch from source, but I have no idea to solve it. Can you help me to fix it out?

System Info:

How you installed PyTorch (conda, pip, source): source
Build command you used (if compiling from source): python setup.py install
OS: ubuntu14.04
PyTorch version: master
Python version: 3.6
CUDA/cuDNN version: cuda8.0+cudnn5.0
GPU models and configuration: GTX1080Ti
GCC version (if compiling from source): 4.9.4
CMake version: 3.7.2
############################################################
Issue description:

3 errors detected in the compilation of "/tmp/tmpxft_00002a14_00000000-7_THCTensorMath.cpp1.ii".
CMake Error at caffe2_gpu_generated_THCTensorMath.cu.o.Release.cmake:279 (message):
Error generating file
/media/hh/pytorch_dir/pytorch/build/caffe2/CMakeFiles/caffe2_gpu.dir/__/aten/src/THC/./caffe2_gpu_generated_THCTensorMath.cu.o

make[2]: *** [caffe2/CMakeFiles/caffe2_gpu.dir/__/aten/src/THC/caffe2_gpu_generated_THCTensorMath.cu.o] Error 1
make[1]: *** [caffe2/CMakeFiles/caffe2_gpu.dir/all] Error 2
make: *** [all] Error 2
Failed to run 'bash tools/build_pytorch_libs.sh --use-cuda --use-nnpack --use-mkldnn nccl caffe2 nanopb libshm gloo THD c10d'

@zhanghang1989
Copy link
Owner

Try install the dependencies as following first:

export CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" # [anaconda root directory]

# Install basic dependencies
conda install numpy pyyaml mkl mkl-include setuptools cmake cffi typing
conda install -c mingfeima mkldnn

# Add LAPACK support for the GPU
conda install -c pytorch magma-cuda80 # or magma-cuda90 if CUDA 9

You may want to ask on PyTorch repo for further help

@qiulesun
Copy link
Author

qiulesun commented Jun 26, 2018

Are the models you released (model_zoo.py) all trained with two Context Encoding Modules? Can you detail the MS evaluation in the table 1?

models = {
     'encnet_resnet50_pcontext': get_encnet_resnet50_pcontext,
    'encnet_resnet101_pcontext': get_encnet_resnet101_pcontext,
    'encnet_resnet50_ade': get_encnet_resnet50_ade,
    }

@zhanghang1989
Copy link
Owner

We only use one Context Encoding Module now, which is more efficient and makes the model compatible with EncNetV2.

@qiulesun
Copy link
Author

qiulesun commented Jul 1, 2018

Can Ubuntu, Mac and Windows os all run the released codes?

@zhanghang1989
Copy link
Owner

It mainly depends on the PyTorch. If the pytorch is compiled successfully on your system, there won't be a problem. I am using both Mac and Ubuntu. Note that PyTorch master branch is required.

@qiulesun
Copy link
Author

qiulesun commented Jul 3, 2018

The comand (e.g., CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --dataset PContext --model EncNet --aux --se-loss --backbone resnet101) for training the model means training resnet101 from scratch or finetuning resnet101?

@zhanghang1989
Copy link
Owner

resnet101 is pretrained from ImageNet.

@qiulesun
Copy link
Author

qiulesun commented Jul 3, 2018

I used the comand (CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --dataset PContext --model EncNet --aux --se-loss) for training the model resnet50. However, when it ran to the epoch12, I stopped it. Next, I restart it and find unluckily it has ran from epoch0 rather than epoch12. What should I do to run it from epoch12?

@zhanghang1989
Copy link
Owner

Please resume by adding command --resume path/to/checkpoint.pth.tar

@qiulesun
Copy link
Author

qiulesun commented Jul 6, 2018

Thank you. I have another interest. When does PyTroch 0.4.0 meets the requirements of running released code ?

@zhanghang1989
Copy link
Owner

This package won't be compatible with PyTroch 0.4.0, but it will be compatible with next stable release.

@qiulesun
Copy link
Author

Question about selayer, why does the selayer have no sigmoid activation function?

(encmodule): EncModule(
(encoding): Sequential(
(0): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace)
(3): Encoding(N x 512=>32x512)
(4): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace)
(6): Mean()
)
(fc): Sequential(
(0): Linear(in_features=512, out_features=512, bias=True)
(1): Sigmoid()
)
(selayer): Linear(in_features=512, out_features=59, bias=True)
)

@zhanghang1989
Copy link
Owner

zhanghang1989 commented Jul 13, 2018

That is the prediction layer for minimizing SE-Loss.
The sigmod function is applied during the loss calculation https://github.com/zhanghang1989/PyTorch-Encoding/blob/master/encoding/nn/customize.py#L65

@qiulesun
Copy link
Author

qiulesun commented Aug 3, 2018

Sorry for bothering you agian, I have no idea with next errors when I run CUDA_VISIBLE_DEVICES=0,1 python train.py --dataset pcontext --model encnet --aux --se-loss.
And import encoding gets similar errors.

OS: ubuntu14.04
Pytorch version: 0.5.0 (from source)
Python version: 3.6
CUDA: 8.0
cudnn: 6.0.21
GPU: 2 1080

/usr/local/anaconda3/bin/python3.6 /media/cv-pc-00/QL_480G/sql/pytorch_dir/PyTorch-Encoding/experiments/segmentation/train.py --dataset PContext --model EncNet --se-loss
——————————————————————————————————————————————
Traceback (most recent call last):
File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 742, in _build_extension_module
['ninja', '-v'], stderr=subprocess.STDOUT, cwd=build_directory)
File "/usr/local/anaconda3/lib/python3.6/subprocess.py", line 336, in check_output
**kwargs).stdout
File "/usr/local/anaconda3/lib/python3.6/subprocess.py", line 418, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/media/cv-pc-00/QL_480G/sql/pytorch_dir/PyTorch-Encoding/experiments/segmentation/train.py", line 17, in
import encoding.utils as utils
File "/usr/local/anaconda3/lib/python3.6/site-packages/encoding/init.py", line 13, in
from . import nn, functions, dilated, parallel, utils, models, datasets
File "/usr/local/anaconda3/lib/python3.6/site-packages/encoding/nn/init.py", line 12, in
from .encoding import *
File "/usr/local/anaconda3/lib/python3.6/site-packages/encoding/nn/encoding.py", line 18, in
from ..functions import scaledL2, aggregate, pairwise_cosine
File "/usr/local/anaconda3/lib/python3.6/site-packages/encoding/functions/init.py", line 2, in
from .encoding import *
File "/usr/local/anaconda3/lib/python3.6/site-packages/encoding/functions/encoding.py", line 14, in
from .. import lib
File "/usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/init.py", line 20, in
], build_directory=gpu_path, verbose=False)
File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 496, in load
with_cuda=with_cuda)
File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 664, in _jit_compile
_build_extension_module(name, build_directory)
File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 748, in _build_extension_module
name, error.output.decode()))
RuntimeError: Error building extension 'enclib_gpu': [1/4] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=enclib_gpu -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/usr/local/anaconda3/include/python3.6m --compiler-options '-fPIC' -std=c++11 -c /usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/roi_align_kernel.cu -o roi_align_kernel.cuda.o
FAILED: roi_align_kernel.cuda.o
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=enclib_gpu -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/usr/local/anaconda3/include/python3.6m --compiler-options '-fPIC' -std=c++11 -c /usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/roi_align_kernel.cu -o roi_align_kernel.cuda.o
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
/usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/roi_align_kernel.cu(373): error: class "at::Context" has no member "getCurrentCUDAStream"

/usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/roi_align_kernel.cu(373): error: class "at::Context" has no member "getCurrentCUDAStream"

/usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/roi_align_kernel.cu(420): error: class "at::Context" has no member "getCurrentCUDAStream"

/usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/roi_align_kernel.cu(420): error: class "at::Context" has no member "getCurrentCUDAStream"

4 errors detected in the compilation of "/tmp/tmpxft_0000662c_00000000-7_roi_align_kernel.cpp1.ii".
[2/4] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=enclib_gpu -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/usr/local/anaconda3/include/python3.6m --compiler-options '-fPIC' -std=c++11 -c /usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/encoding_kernel.cu -o encoding_kernel.cuda.o
FAILED: encoding_kernel.cuda.o
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=enclib_gpu -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/usr/local/anaconda3/include/python3.6m --compiler-options '-fPIC' -std=c++11 -c /usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/encoding_kernel.cu -o encoding_kernel.cuda.o
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
/usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/encoding_kernel.cu(315): error: class "at::Context" has no member "getCurrentCUDAStream"

/usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/encoding_kernel.cu(341): error: class "at::Context" has no member "getCurrentCUDAStream"

/usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/encoding_kernel.cu(364): error: class "at::Context" has no member "getCurrentCUDAStream"

/usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/encoding_kernel.cu(391): error: class "at::Context" has no member "getCurrentCUDAStream"

4 errors detected in the compilation of "/tmp/tmpxft_00006623_00000000-7_encoding_kernel.cpp1.ii".
[3/4] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=enclib_gpu -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/usr/local/anaconda3/include/python3.6m --compiler-options '-fPIC' -std=c++11 -c /usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/syncbn_kernel.cu -o syncbn_kernel.cuda.o
FAILED: syncbn_kernel.cuda.o
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=enclib_gpu -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/usr/local/anaconda3/include/python3.6m --compiler-options '-fPIC' -std=c++11 -c /usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/syncbn_kernel.cu -o syncbn_kernel.cuda.o
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
/usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/syncbn_kernel.cu(183): error: class "at::Context" has no member "getCurrentCUDAStream"

/usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/syncbn_kernel.cu(217): error: class "at::Context" has no member "getCurrentCUDAStream"

/usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/syncbn_kernel.cu(249): error: class "at::Context" has no member "getCurrentCUDAStream"

/usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/syncbn_kernel.cu(272): error: class "at::Context" has no member "getCurrentCUDAStream"

4 errors detected in the compilation of "/tmp/tmpxft_00006627_00000000-7_syncbn_kernel.cpp1.ii".
ninja: build stopped: subcommand failed.

Process finished with exit code 1

@zhanghang1989
Copy link
Owner

Hi, That is because the PyTorch updates in backend.

  1. Could you change at::Context:: getCurrentCUDAStream to cudaStream_t stream = at::cuda::getCurrentCUDAStream();
  2. Also add #include <ATen/cuda/CUDAContext.h>

This will be fixed in next version.

@qiulesun
Copy link
Author

qiulesun commented Aug 4, 2018

Thanks for your attention. It does work! However, three warnings occur, do that matter?

  1. /usr/local/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py:1940:
    UserWarning: nn.functional.upsample is deprecated. Use nn.functional.interpolate instead. warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.")

  2. /usr/local/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py:1025:
    UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
    warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")

  3. /usr/local/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py:52:
    UserWarning: size_average and reduce args will be deprecated, please use reduction='elementwise_mean' instead.
    warnings.warn(warning.format(ret))

@zhanghang1989
Copy link
Owner

The deprecate warning is okay for now.

@qiulesun
Copy link
Author

qiulesun commented Aug 7, 2018

Problem with debugging the backward method of Function class

Hi, aggregate(A, X, C) and scaledL2(X, C, S) in encoding.functions.encoding.py implement the forward and backwark of your custom function. I want to debug their forward and backwark and the pycharm-community-2018.1.4 I used on Ubuntu 16.04 LTS has allowed me debug the forward step by step. However, I could not debug backward function like forward equipped with 2 1080 GPU.
Could you tell me is it possilbe and how to address it? (ps: for my own custom functions based on your codes, I also face the same problem)

@zhanghang1989
Copy link
Owner

You can directly call the backend function for debugging https://github.com/zhanghang1989/PyTorch-Encoding/blob/master/encoding/functions/encoding.py#L77

@qiulesun
Copy link
Author

qiulesun commented Aug 8, 2018

For my special case, I want to run the codes with one GPU (ps: my machine is equipped with 2 GPUs), for example debugging the codes, etc.
Do the codes support a single GPU operation even if the machine is equipped with 2 GPUs?
Is the default multi GPU running if the machine is equipped with multiple GPUs?

@zhanghang1989
Copy link
Owner

CUDA_VISIBLE_DEVICES=0 python train.py ...

@qiulesun
Copy link
Author

qiulesun commented Aug 9, 2018

Question 1
I use pycharm-community-2018.1.4 to make it easier to debug the codes and CUDA_VISIBLE_DEVICES=0 --dataset PContext --model EncNet --se-loss is given in debug configurations.
However, I get the error train.py: error: unrecognized arguments: CUDA_VISIBLE_DEVICES=0
When I use the pycharm-community-2018.1.4 to debug the codes with a single GPU, I should do what next ?

Connected to pydev debugger (build 181.5087.37)
usage: train.py [-h] [--model MODEL] [--backbone BACKBONE] [--dataset DATASET]
[--data-folder DATA_FOLDER] [--workers N] [--aux] [--se-loss]
[--epochs N] [--start_epoch N] [--batch-size N]
[--test-batch-size N] [--lr LR] [--lr-scheduler LR_SCHEDULER]
[--momentum M] [--weight-decay M] [--no-cuda] [--seed S]
[--resume RESUME] [--checkname CHECKNAME]
[--model-zoo MODEL_ZOO] [--ft] [--pre-class PRE_CLASS] [--ema]
[--eval] [--no-val] [--test-folder TEST_FOLDER]
train.py: error: unrecognized arguments: CUDA_VISIBLE_DEVICES=0

Question 2
args.lr = lrs[args.dataset.lower()] / 16 * args.batch_size in option.py means that the lr is relate to batch_size you give. Is that the lr not fixed depending on the batch_size (GPU memory)?
In my experiments, I set the args.lr = lrs[args.dataset.lower()], is it reasonable and feasible, does it respect your paper and intentions?

Question 3
For multi-size evaluation, the 27th line base_size=576, crop_size=608 (base_size less than crop_size) in encoding/models/base.py should be base_size=608, crop_size=576?
Previously, you set base_size=520, crop_size=480 and now you change them to base_size=576, crop_size=608. I hold the view that crop_size less than base_size seems reasonable. What settings should I follow to reproduce your results?

I am looking forward to your reply.

@zhanghang1989
Copy link
Owner

Q1: please use the terminal to launch the program.
Q2: That is a kind of standard setting for LR. When increasing the batch size, people typically increase the LR accordingly.
Q3. That is a bug. It will be fixed in next release.

@qiulesun
Copy link
Author

qiulesun commented Aug 9, 2018

For the Q2 above, due to the limited GPU memory, the batch size has to be small (typically less than 16) unfortunately. It means that I have to use smaller LR according to the standard setting, i.e., args.lr = lrs[args.dataset.lower()] / 16 * args.batch_size ?

@zhanghang1989
Copy link
Owner

Yes. If the batch size is too small, the model will get worse result, because the working batch size for batch normalization is small.

@qiulesun
Copy link
Author

qiulesun commented Aug 10, 2018

I only have 2 1080 GPUs with a total of 16G memory. The batch size is small less than 16 in my experiments. Can I alleviate this side effect (the model will get worse result you said) by using larger LR and set args.lr = lrs[args.dataset.lower()], independent of batch size?

@zhanghang1989
Copy link
Owner

The batch size matters for segmentation task, due to working batch size for the Synchronize Batch Normalization. For batch size =16 yields the best performance.

@qiulesun
Copy link
Author

qiulesun commented Aug 12, 2018

What is the main difference between encoding.nn.BatchNorm1d and encoding.nn.BatchNorm2d?

@zhanghang1989
Copy link
Owner

same as torch.nn.BatchNorm1d and torch.nn.BatchNorm2d

@qiulesun
Copy link
Author

qiulesun commented Aug 24, 2018

I have two questions.
(1) For cos ans poly lr schedules, every batch (iter) has a different lr rather than them in one epoch has same lr. Is that right?
(2) For cifar10 recognition, the scaling factor s_k is not learnt but randomly sampling from a uniform distribution between 0 and 1, which is different from segmentation tasks. Is that right?

@qiulesun
Copy link
Author

qiulesun commented Sep 19, 2018

I'm sorry for disturbing you again.
Your work is very encouraging to me. I notice that the scaled_l2 and aggregate opertors of the proposed encoding layer are implemented by C++ language. Duo to I am not good at it, could you share the corresponding implementation using python code if you want?

@zhanghang1989
Copy link
Owner

We change LR every iter.
The cifar experiment use shake-out like regularization.
Scaled L2 and aggregate are easy to implement in python, but that will be memory consuming.

@qiulesun
Copy link
Author

qiulesun commented Sep 21, 2018

question 1:
Sorry to ask the stupid question.
The augmented pascal voc 2012 has 11533 images in trainval.txt rather than 10582 used in paper. It's troubled me. And I do not get the information about how to augment the 1464 trainging images of pascal voc 2012 to result in 10582 ones. In other words, I do not get the relationship between the pascal voc 2012 and its augmented version. Could I fortunately know your opinion?
If you think this question is not worth answering, I can understand completely.

question 2:
As far as I known, Group norm (https://arxiv.org/pdf/1803.08494.pdf) is independent of batch size, much suitable for semantic segmentation task, which requires small batches constrained by memory consumption.
Could you consider employing it in your updated version?

@zhanghang1989
Copy link
Owner

Q1. For VOC experiments, first pretrained on COCO, then finetune on "pascal_aug" and finally on "pascal_voc". I am releasing the training detail for reproducing VOC experiments this weekend.
Q2. Group Norm still has inferior performance comparing to BN. You can easily use that by changing the code a little bit.

@qiulesun
Copy link
Author

qiulesun commented Sep 24, 2018

Question 1:
I see base_size=608 and crop_size=576 in the training log of EncNet_ResNet50_ADE, (https://raw.githubusercontent.com/zhanghang1989/image-data/master/encoding/segmentation/logs/encnet_resnet50_ade.log), however, the base_size and crop_size are set to 520 and 480 respectively in https://github.com/zhanghang1989/PyTorch-Encoding/blob/master/encoding/datasets/base.py#L17.
It's troubled me. Does the special case for ADE20K use base_size=608 and crop_size=576 and use base_size=520 and crop_size=480 for PASCAL Context and PASCAL VOC12 ?
Question 2:
Besides, base_size=576 and crop_size=608 in https://github.com/zhanghang1989/PyTorch-Encoding/blob/master/encoding/models/base.py#L27 is only to multiscale test ?

@zhanghang1989
Copy link
Owner

There are some bugs in existing code. I am updating them soon.

@qiulesun
Copy link
Author

qiulesun commented Sep 26, 2018

Question 1:
As mentioned above, there are some bugs in existing code. I still have a question.
The EncNet_ResNet50_ADE achieves 79.9 pixAcc and 41.2 mIoU at the last row in the table (https://hangzhang.org/PyTorch-Encoding/experiments/segmentation.html), however, from the training log file (https://raw.githubusercontent.com/zhanghang1989/image-data/master/encoding/segmentation/logs/encnet_resnet50_ade.log) I see that it obtains 78.0 pixAcc and 40.2 mIoU lower than the results you reported.
Is this because you use the multi-scale testing strategy on ADE20K val set? Or something else ?

@zhanghang1989
Copy link
Owner

The validation during the training is using center crop, only for monitoring the training process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants