No module named cpp_extension #67

qiulesun · 2018-06-09T01:28:43Z

Hi, I got the error named No module named cpp_extension (from torch.utils.cpp_extension import load) when I run the quick demo http://hangzh.com/PyTorch-Encoding/experiments/segmentation.html#install-package. The version of python and torch are 2.7 and 0.3.1 respectively. How can I handle it?

zhanghang1989 · 2018-06-09T18:18:37Z

0.3.1 is way too old. Please install PyTorch master branch > 0.5.0

qiulesun · 2018-06-15T06:22:06Z

The version of python and torch are updated to 3.6 and 0.4.0 respectively. Follow the link you provided https://www.claudiokuenzler.com/blog/756/install-newer-ninja-build-tools-ubuntu-14.04-trusty#.WxYrvFMvzJw, I install ninja 1.8.2. However, when I run again the quick demo http://hangzh.com/PyTorch-Encoding/experiments/segmentation.html#install-package, I got another error. How can I solve it? I believe your papers and code can make me interested in semantic segmentation tasks.

root@hh-Z97X-UD3H:/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master# python quick_demo.py
Traceback (most recent call last):
File "/usr/anaconda3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 576, in _build_extension_module
['ninja', '-v'], stderr=subprocess.STDOUT, cwd=build_directory)
File "/usr/anaconda3/lib/python3.6/subprocess.py", line 336, in check_output
**kwargs).stdout
File "/usr/anaconda3/lib/python3.6/subprocess.py", line 418, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "demo.py", line 2, in
import encoding
File "/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/init.py", line 13, in
from . import nn, functions, dilated, parallel, utils, models, datasets
File "/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/nn/init.py", line 12, in
from .encoding import *
File "/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/nn/encoding.py", line 18, in
from ..functions import scaledL2, aggregate
File "/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/functions/init.py", line 2, in
from .encoding import *
File "/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/functions/encoding.py", line 13, in
from .. import lib
File "/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/init.py", line 12, in
], build_directory=cpu_path, verbose=False)
File "/usr/anaconda3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 501, in load
_build_extension_module(name, build_directory)
File "/usr/anaconda3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 582, in _build_extension_module
name, error.output.decode()))
RuntimeError: Error building extension 'enclib_cpu': [1/2] c++ -MMD -MF roi_align_cpu.o.d -DTORCH_EXTENSION_NAME=enclib_cpu -I/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/anaconda3/include/python3.6m -fPIC -std=c++11 -c /media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp -o roi_align_cpu.o
FAILED: roi_align_cpu.o
c++ -MMD -MF roi_align_cpu.o.d -DTORCH_EXTENSION_NAME=enclib_cpu -I/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/anaconda3/include/python3.6m -fPIC -std=c++11 -c /media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp -o roi_align_cpu.o
In file included from /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/ArrayRef.h:18:0,
from /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/ScalarType.h:5,
from /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Scalar.h:11,
from /usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/ATen.h:6,
from /media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:1:
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp: In function ‘at::Tensor ROIAlignForwardCPU(const at::Tensor&, const at::Tensor&, int64_t, int64_t, double, int64_t)’:
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:18: error: expected primary-expression before ‘(’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:388:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(input.is_contiguous());
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:62: error: expected primary-expression before ‘)’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:388:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(input.is_contiguous());
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:18: error: expected primary-expression before ‘(’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:389:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(bottom_rois.is_contiguous());
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:62: error: expected primary-expression before ‘)’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:389:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(bottom_rois.is_contiguous());
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:18: error: expected primary-expression before ‘(’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:390:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(input.ndimension() == 4);
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:62: error: expected primary-expression before ‘)’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:390:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(input.ndimension() == 4);
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:18: error: expected primary-expression before ‘(’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:391:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(bottom_rois.ndimension() == 2);
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:62: error: expected primary-expression before ‘)’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:391:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(bottom_rois.ndimension() == 2);
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:18: error: expected primary-expression before ‘(’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:392:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(bottom_rois.size(1) == 5);
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:62: error: expected primary-expression before ‘)’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:392:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(bottom_rois.size(1) == 5);
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:18: error: expected primary-expression before ‘(’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:404:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(roi_cols == 4 || roi_cols == 5);
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:62: error: expected primary-expression before ‘)’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:404:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(roi_cols == 4 || roi_cols == 5);
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:18: error: expected primary-expression before ‘(’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:409:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(input.is_contiguous());
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:62: error: expected primary-expression before ‘)’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:409:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(input.is_contiguous());
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:18: error: expected primary-expression before ‘(’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:410:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(bottom_rois.is_contiguous());
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:62: error: expected primary-expression before ‘)’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:410:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(bottom_rois.is_contiguous());
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp: In function ‘at::Tensor ROIAlignBackwardCPU(const at::Tensor&, const at::Tensor&, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, double, int64_t)’:
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:18: error: expected primary-expression before ‘(’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:444:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(bottom_rois.is_contiguous());
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:62: error: expected primary-expression before ‘)’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:444:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(bottom_rois.is_contiguous());
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:18: error: expected primary-expression before ‘(’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:445:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(bottom_rois.ndimension() == 2);
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:62: error: expected primary-expression before ‘)’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:445:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(bottom_rois.ndimension() == 2);
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:18: error: expected primary-expression before ‘(’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:446:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(bottom_rois.size(1) == 5);
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:62: error: expected primary-expression before ‘)’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:446:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(bottom_rois.size(1) == 5);
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:18: error: expected primary-expression before ‘(’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:451:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(roi_cols == 4 || roi_cols == 5);
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:62: error: expected primary-expression before ‘)’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:451:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(roi_cols == 4 || roi_cols == 5);
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:18: error: expected primary-expression before ‘(’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:456:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(bottom_rois.is_contiguous());
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:281:62: error: expected primary-expression before ‘)’ token
throw at::Error({func, FILE, LINE}, VA_ARGS)
^
/usr/anaconda3/lib/python3.6/site-packages/torch/lib/include/ATen/Error.h:285:5: note: in expansion of macro ‘AT_ERROR’
AT_ERROR(VA_ARGS);
^
/media/hh/0bfd0eaf-cf46-48b3-915a-aa317b67d9ec/PyTorch-Encoding/PyTorch-Encoding-master/encoding/lib/cpu/roi_align_cpu.cpp:456:3: note: in expansion of macro ‘AT_ASSERT’
AT_ASSERT(bottom_rois.is_contiguous());
^
ninja: build stopped: subcommand failed.

zhanghang1989 · 2018-06-15T06:26:36Z

This package depend on a slightly higher version than PyTroch 0.4.0. Please follow the instructions to install pytorch from source https://github.com/pytorch/pytorch#from-source

qiulesun · 2018-06-19T07:53:14Z

In your paper, the sentence ''The ground truth labels for SE-loss are generated by “unique” operation finding the categories presented in the given ground-truth segmentation mask.'' means that every input image has multiple labels. As far as I know, the binary cross entroy loss can handle binary class or multi-class task rather than multi-labels.

zhanghang1989 · 2018-06-19T13:26:54Z

I didn’t get the difference between multi class and multi labels. Could you please explain in detail?
Btw, the NN already has sigmoid activation

qiulesun · 2018-06-20T00:58:53Z

Multiclass classification means a classification task with more than two classes; e.g., classify a set of images of fruits which may be oranges, apples, or pears. Multiclass classification makes the assumption that each sample is assigned to one and only one label: a fruit can be either an apple or a pear but not both at the same time.
Multilabel classification assigns to each sample a set of target labels. This can be thought as predicting properties of a data-point that are not mutually exclusive, such as topics that are relevant for a document. A text might be about any of religion, politics, finance or education at the same time or none of these.
I note that the NN has sigmoid activation. I hold the question that, in your case, the input image has multiple labels or one.

zhanghang1989 · 2018-06-20T01:27:55Z

The presence of the object categories is indeed a multi-label task. Each category is predicted independently using a binary prediction. I hope it can address your concern.

zhanghang1989 · 2018-06-20T01:32:39Z

Please refer to the docs for binary cross entropy loss https://pytorch.org/docs/stable/nn.html?highlight=bceloss#torch.nn.BCELoss

qiulesun · 2018-06-20T03:48:33Z

In binary classification, the number of classes equals 2. The object categories in an input image are more than 2 (figure 2 in paper). So I don't understand why binary cross entropy loss is empolyed and ''Each category is predicted independently using a binary prediction. ''

zhanghang1989 · 2018-06-20T04:42:55Z

Each category is a binary classification problem. For 150 categories, there 150 individual binary classification problem. I hope this explanation is clear enough. If you still have difficulties, feel free to ask questions in Chinese.

qiulesun · 2018-06-20T07:31:07Z

Thank you for your patience. Your explanation is clear. The binary cross entropy loss can handle the multi-label classification task. Its target is something like [1,0,0,1,0...]. Sigmoid, unlike softmax don't give probability distribution around NCLASS as output, but independent probabilities.

zhanghang1989 · 2018-06-20T13:23:14Z

You’re welcome. That is correct.

qiulesun · 2018-06-24T07:58:39Z

I am really sorroy for disturbing you again. I shouldn't ask the question about installation PyTorch from source, but I have no idea to solve it. Can you help me to fix it out?

System Info：

How you installed PyTorch (conda, pip, source): source
Build command you used (if compiling from source): python setup.py install
OS: ubuntu14.04
PyTorch version: master
Python version: 3.6
CUDA/cuDNN version: cuda8.0+cudnn5.0
GPU models and configuration: GTX1080Ti
GCC version (if compiling from source): 4.9.4
CMake version: 3.7.2
############################################################
Issue description：

3 errors detected in the compilation of "/tmp/tmpxft_00002a14_00000000-7_THCTensorMath.cpp1.ii".
CMake Error at caffe2_gpu_generated_THCTensorMath.cu.o.Release.cmake:279 (message):
Error generating file
/media/hh/pytorch_dir/pytorch/build/caffe2/CMakeFiles/caffe2_gpu.dir/__/aten/src/THC/./caffe2_gpu_generated_THCTensorMath.cu.o

make[2]: *** [caffe2/CMakeFiles/caffe2_gpu.dir/__/aten/src/THC/caffe2_gpu_generated_THCTensorMath.cu.o] Error 1
make[1]: *** [caffe2/CMakeFiles/caffe2_gpu.dir/all] Error 2
make: *** [all] Error 2
Failed to run 'bash tools/build_pytorch_libs.sh --use-cuda --use-nnpack --use-mkldnn nccl caffe2 nanopb libshm gloo THD c10d'

zhanghang1989 · 2018-06-24T16:51:34Z

Try install the dependencies as following first:

export CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" # [anaconda root directory]

# Install basic dependencies
conda install numpy pyyaml mkl mkl-include setuptools cmake cffi typing
conda install -c mingfeima mkldnn

# Add LAPACK support for the GPU
conda install -c pytorch magma-cuda80 # or magma-cuda90 if CUDA 9

You may want to ask on PyTorch repo for further help

qiulesun · 2018-06-26T12:20:01Z

Are the models you released (model_zoo.py) all trained with two Context Encoding Modules? Can you detail the MS evaluation in the table 1?

models = {
     'encnet_resnet50_pcontext': get_encnet_resnet50_pcontext,
    'encnet_resnet101_pcontext': get_encnet_resnet101_pcontext,
    'encnet_resnet50_ade': get_encnet_resnet50_ade,
    }

zhanghang1989 · 2018-06-26T15:16:17Z

We only use one Context Encoding Module now, which is more efficient and makes the model compatible with EncNetV2.

qiulesun · 2018-07-01T06:30:43Z

Can Ubuntu, Mac and Windows os all run the released codes?

zhanghang1989 · 2018-07-01T19:16:58Z

It mainly depends on the PyTorch. If the pytorch is compiled successfully on your system, there won't be a problem. I am using both Mac and Ubuntu. Note that PyTorch master branch is required.

qiulesun · 2018-07-03T03:03:14Z

The comand (e.g., CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --dataset PContext --model EncNet --aux --se-loss --backbone resnet101) for training the model means training resnet101 from scratch or finetuning resnet101?

zhanghang1989 · 2018-07-03T03:10:46Z

resnet101 is pretrained from ImageNet.

qiulesun · 2018-07-03T09:46:27Z

I used the comand (CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --dataset PContext --model EncNet --aux --se-loss) for training the model resnet50. However, when it ran to the epoch12, I stopped it. Next, I restart it and find unluckily it has ran from epoch0 rather than epoch12. What should I do to run it from epoch12?

zhanghang1989 · 2018-07-03T14:50:26Z

Please resume by adding command --resume path/to/checkpoint.pth.tar

qiulesun · 2018-07-06T03:21:47Z

Thank you. I have another interest. When does PyTroch 0.4.0 meets the requirements of running released code ?

zhanghang1989 · 2018-07-06T18:34:43Z

This package won't be compatible with PyTroch 0.4.0, but it will be compatible with next stable release.

qiulesun · 2018-07-13T02:32:31Z

Question about selayer, why does the selayer have no sigmoid activation function?

(encmodule): EncModule(
(encoding): Sequential(
(0): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace)
(3): Encoding(N x 512=>32x512)
(4): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace)
(6): Mean()
)
(fc): Sequential(
(0): Linear(in_features=512, out_features=512, bias=True)
(1): Sigmoid()
)
(selayer): Linear(in_features=512, out_features=59, bias=True)
)

zhanghang1989 · 2018-07-13T02:35:51Z

That is the prediction layer for minimizing SE-Loss.
The sigmod function is applied during the loss calculation https://github.com/zhanghang1989/PyTorch-Encoding/blob/master/encoding/nn/customize.py#L65

qiulesun · 2018-08-03T17:22:58Z

Sorry for bothering you agian, I have no idea with next errors when I run CUDA_VISIBLE_DEVICES=0,1 python train.py --dataset pcontext --model encnet --aux --se-loss.
And import encoding gets similar errors.

OS: ubuntu14.04
Pytorch version: 0.5.0 (from source)
Python version: 3.6
CUDA: 8.0
cudnn: 6.0.21
GPU: 2 1080

/usr/local/anaconda3/bin/python3.6 /media/cv-pc-00/QL_480G/sql/pytorch_dir/PyTorch-Encoding/experiments/segmentation/train.py --dataset PContext --model EncNet --se-loss
——————————————————————————————————————————————
Traceback (most recent call last):
File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 742, in _build_extension_module
['ninja', '-v'], stderr=subprocess.STDOUT, cwd=build_directory)
File "/usr/local/anaconda3/lib/python3.6/subprocess.py", line 336, in check_output
**kwargs).stdout
File "/usr/local/anaconda3/lib/python3.6/subprocess.py", line 418, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/media/cv-pc-00/QL_480G/sql/pytorch_dir/PyTorch-Encoding/experiments/segmentation/train.py", line 17, in
import encoding.utils as utils
File "/usr/local/anaconda3/lib/python3.6/site-packages/encoding/init.py", line 13, in
from . import nn, functions, dilated, parallel, utils, models, datasets
File "/usr/local/anaconda3/lib/python3.6/site-packages/encoding/nn/init.py", line 12, in
from .encoding import *
File "/usr/local/anaconda3/lib/python3.6/site-packages/encoding/nn/encoding.py", line 18, in
from ..functions import scaledL2, aggregate, pairwise_cosine
File "/usr/local/anaconda3/lib/python3.6/site-packages/encoding/functions/init.py", line 2, in
from .encoding import *
File "/usr/local/anaconda3/lib/python3.6/site-packages/encoding/functions/encoding.py", line 14, in
from .. import lib
File "/usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/init.py", line 20, in
], build_directory=gpu_path, verbose=False)
File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 496, in load
with_cuda=with_cuda)
File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 664, in _jit_compile
_build_extension_module(name, build_directory)
File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 748, in _build_extension_module
name, error.output.decode()))
RuntimeError: Error building extension 'enclib_gpu': [1/4] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=enclib_gpu -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/usr/local/anaconda3/include/python3.6m --compiler-options '-fPIC' -std=c++11 -c /usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/roi_align_kernel.cu -o roi_align_kernel.cuda.o
FAILED: roi_align_kernel.cuda.o
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=enclib_gpu -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/usr/local/anaconda3/include/python3.6m --compiler-options '-fPIC' -std=c++11 -c /usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/roi_align_kernel.cu -o roi_align_kernel.cuda.o
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
/usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/roi_align_kernel.cu(373): error: class "at::Context" has no member "getCurrentCUDAStream"

/usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/roi_align_kernel.cu(373): error: class "at::Context" has no member "getCurrentCUDAStream"

/usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/roi_align_kernel.cu(420): error: class "at::Context" has no member "getCurrentCUDAStream"

4 errors detected in the compilation of "/tmp/tmpxft_0000662c_00000000-7_roi_align_kernel.cpp1.ii".
[2/4] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=enclib_gpu -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/usr/local/anaconda3/include/python3.6m --compiler-options '-fPIC' -std=c++11 -c /usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/encoding_kernel.cu -o encoding_kernel.cuda.o
FAILED: encoding_kernel.cuda.o
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=enclib_gpu -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/usr/local/anaconda3/include/python3.6m --compiler-options '-fPIC' -std=c++11 -c /usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/encoding_kernel.cu -o encoding_kernel.cuda.o
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
/usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/encoding_kernel.cu(315): error: class "at::Context" has no member "getCurrentCUDAStream"

/usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/encoding_kernel.cu(341): error: class "at::Context" has no member "getCurrentCUDAStream"

/usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/encoding_kernel.cu(364): error: class "at::Context" has no member "getCurrentCUDAStream"

/usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/encoding_kernel.cu(391): error: class "at::Context" has no member "getCurrentCUDAStream"

4 errors detected in the compilation of "/tmp/tmpxft_00006623_00000000-7_encoding_kernel.cpp1.ii".
[3/4] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=enclib_gpu -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/usr/local/anaconda3/include/python3.6m --compiler-options '-fPIC' -std=c++11 -c /usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/syncbn_kernel.cu -o syncbn_kernel.cuda.o
FAILED: syncbn_kernel.cuda.o
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=enclib_gpu -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/usr/local/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/usr/local/anaconda3/include/python3.6m --compiler-options '-fPIC' -std=c++11 -c /usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/syncbn_kernel.cu -o syncbn_kernel.cuda.o
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
/usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/syncbn_kernel.cu(183): error: class "at::Context" has no member "getCurrentCUDAStream"

/usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/syncbn_kernel.cu(217): error: class "at::Context" has no member "getCurrentCUDAStream"

/usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/syncbn_kernel.cu(249): error: class "at::Context" has no member "getCurrentCUDAStream"

/usr/local/anaconda3/lib/python3.6/site-packages/encoding/lib/gpu/syncbn_kernel.cu(272): error: class "at::Context" has no member "getCurrentCUDAStream"

4 errors detected in the compilation of "/tmp/tmpxft_00006627_00000000-7_syncbn_kernel.cpp1.ii".
ninja: build stopped: subcommand failed.

Process finished with exit code 1

zhanghang1989 · 2018-08-03T17:40:59Z

Hi, That is because the PyTorch updates in backend.

Could you change at::Context:: getCurrentCUDAStream to cudaStream_t stream = at::cuda::getCurrentCUDAStream();
Also add #include <ATen/cuda/CUDAContext.h>

This will be fixed in next version.

qiulesun · 2018-08-04T08:42:53Z

Thanks for your attention. It does work! However, three warnings occur, do that matter?

/usr/local/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py:1940:
UserWarning: nn.functional.upsample is deprecated. Use nn.functional.interpolate instead. warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.")
/usr/local/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py:1025:
UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
/usr/local/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py:52:
UserWarning: size_average and reduce args will be deprecated, please use reduction='elementwise_mean' instead.
warnings.warn(warning.format(ret))

zhanghang1989 · 2018-08-05T17:52:39Z

The deprecate warning is okay for now.

qiulesun · 2018-08-07T03:08:49Z

Problem with debugging the backward method of Function class

Hi, aggregate(A, X, C) and scaledL2(X, C, S) in encoding.functions.encoding.py implement the forward and backwark of your custom function. I want to debug their forward and backwark and the pycharm-community-2018.1.4 I used on Ubuntu 16.04 LTS has allowed me debug the forward step by step. However, I could not debug backward function like forward equipped with 2 1080 GPU.
Could you tell me is it possilbe and how to address it? (ps: for my own custom functions based on your codes, I also face the same problem)

zhanghang1989 · 2018-08-07T17:54:45Z

You can directly call the backend function for debugging https://github.com/zhanghang1989/PyTorch-Encoding/blob/master/encoding/functions/encoding.py#L77

qiulesun · 2018-08-08T13:40:24Z

For my special case, I want to run the codes with one GPU (ps: my machine is equipped with 2 GPUs), for example debugging the codes, etc.
Do the codes support a single GPU operation even if the machine is equipped with 2 GPUs?
Is the default multi GPU running if the machine is equipped with multiple GPUs?

zhanghang1989 · 2018-08-08T17:38:20Z

CUDA_VISIBLE_DEVICES=0 python train.py ...

qiulesun · 2018-08-09T00:50:45Z

Question 1
I use pycharm-community-2018.1.4 to make it easier to debug the codes and CUDA_VISIBLE_DEVICES=0 --dataset PContext --model EncNet --se-loss is given in debug configurations.
However, I get the error train.py: error: unrecognized arguments: CUDA_VISIBLE_DEVICES=0
When I use the pycharm-community-2018.1.4 to debug the codes with a single GPU, I should do what next ?

Connected to pydev debugger (build 181.5087.37)
usage: train.py [-h] [--model MODEL] [--backbone BACKBONE] [--dataset DATASET]
[--data-folder DATA_FOLDER] [--workers N] [--aux] [--se-loss]
[--epochs N] [--start_epoch N] [--batch-size N]
[--test-batch-size N] [--lr LR] [--lr-scheduler LR_SCHEDULER]
[--momentum M] [--weight-decay M] [--no-cuda] [--seed S]
[--resume RESUME] [--checkname CHECKNAME]
[--model-zoo MODEL_ZOO] [--ft] [--pre-class PRE_CLASS] [--ema]
[--eval] [--no-val] [--test-folder TEST_FOLDER]
train.py: error: unrecognized arguments: CUDA_VISIBLE_DEVICES=0

Question 2
args.lr = lrs[args.dataset.lower()] / 16 * args.batch_size in option.py means that the lr is relate to batch_size you give. Is that the lr not fixed depending on the batch_size (GPU memory)?
In my experiments, I set the args.lr = lrs[args.dataset.lower()], is it reasonable and feasible, does it respect your paper and intentions?

Question 3
For multi-size evaluation, the 27th line base_size=576, crop_size=608 (base_size less than crop_size) in encoding/models/base.py should be base_size=608, crop_size=576?
Previously, you set base_size=520, crop_size=480 and now you change them to base_size=576, crop_size=608. I hold the view that crop_size less than base_size seems reasonable. What settings should I follow to reproduce your results?

I am looking forward to your reply.

zhanghang1989 · 2018-08-09T16:41:27Z

Q1: please use the terminal to launch the program.
Q2: That is a kind of standard setting for LR. When increasing the batch size, people typically increase the LR accordingly.
Q3. That is a bug. It will be fixed in next release.

qiulesun · 2018-08-09T17:43:41Z

For the Q2 above, due to the limited GPU memory, the batch size has to be small (typically less than 16) unfortunately. It means that I have to use smaller LR according to the standard setting, i.e., args.lr = lrs[args.dataset.lower()] / 16 * args.batch_size ?

zhanghang1989 · 2018-08-09T17:46:48Z

Yes. If the batch size is too small, the model will get worse result, because the working batch size for batch normalization is small.

qiulesun · 2018-08-10T09:36:22Z

I only have 2 1080 GPUs with a total of 16G memory. The batch size is small less than 16 in my experiments. Can I alleviate this side effect (the model will get worse result you said) by using larger LR and set args.lr = lrs[args.dataset.lower()], independent of batch size?

zhanghang1989 · 2018-08-10T19:01:38Z

The batch size matters for segmentation task, due to working batch size for the Synchronize Batch Normalization. For batch size =16 yields the best performance.

qiulesun · 2018-08-12T12:36:08Z

What is the main difference between encoding.nn.BatchNorm1d and encoding.nn.BatchNorm2d?

zhanghang1989 · 2018-08-13T17:19:39Z

same as torch.nn.BatchNorm1d and torch.nn.BatchNorm2d

qiulesun · 2018-08-24T03:16:41Z

I have two questions.
(1) For cos ans poly lr schedules, every batch (iter) has a different lr rather than them in one epoch has same lr. Is that right?
(2) For cifar10 recognition, the scaling factor s_k is not learnt but randomly sampling from a uniform distribution between 0 and 1, which is different from segmentation tasks. Is that right?

qiulesun · 2018-09-19T09:04:56Z

I'm sorry for disturbing you again.
Your work is very encouraging to me. I notice that the scaled_l2 and aggregate opertors of the proposed encoding layer are implemented by C++ language. Duo to I am not good at it, could you share the corresponding implementation using python code if you want?

zhanghang1989 · 2018-09-19T13:58:13Z

We change LR every iter.
The cifar experiment use shake-out like regularization.
Scaled L2 and aggregate are easy to implement in python, but that will be memory consuming.

qiulesun · 2018-09-21T03:00:47Z

question 1:
Sorry to ask the stupid question.
The augmented pascal voc 2012 has 11533 images in trainval.txt rather than 10582 used in paper. It's troubled me. And I do not get the information about how to augment the 1464 trainging images of pascal voc 2012 to result in 10582 ones. In other words, I do not get the relationship between the pascal voc 2012 and its augmented version. Could I fortunately know your opinion?
If you think this question is not worth answering, I can understand completely.

question 2:
As far as I known, Group norm (https://arxiv.org/pdf/1803.08494.pdf) is independent of batch size, much suitable for semantic segmentation task, which requires small batches constrained by memory consumption.
Could you consider employing it in your updated version?

zhanghang1989 · 2018-09-21T17:37:03Z

Q1. For VOC experiments, first pretrained on COCO, then finetune on "pascal_aug" and finally on "pascal_voc". I am releasing the training detail for reproducing VOC experiments this weekend.
Q2. Group Norm still has inferior performance comparing to BN. You can easily use that by changing the code a little bit.

qiulesun · 2018-09-24T13:55:12Z

Question 1:
I see base_size=608 and crop_size=576 in the training log of EncNet_ResNet50_ADE, (https://raw.githubusercontent.com/zhanghang1989/image-data/master/encoding/segmentation/logs/encnet_resnet50_ade.log), however, the base_size and crop_size are set to 520 and 480 respectively in https://github.com/zhanghang1989/PyTorch-Encoding/blob/master/encoding/datasets/base.py#L17.
It's troubled me. Does the special case for ADE20K use base_size=608 and crop_size=576 and use base_size=520 and crop_size=480 for PASCAL Context and PASCAL VOC12 ?
Question 2:
Besides, base_size=576 and crop_size=608 in https://github.com/zhanghang1989/PyTorch-Encoding/blob/master/encoding/models/base.py#L27 is only to multiscale test ?

zhanghang1989 · 2018-09-24T16:51:37Z

There are some bugs in existing code. I am updating them soon.

qiulesun · 2018-09-26T01:17:42Z

Question 1:
As mentioned above, there are some bugs in existing code. I still have a question.
The EncNet_ResNet50_ADE achieves 79.9 pixAcc and 41.2 mIoU at the last row in the table (https://hangzhang.org/PyTorch-Encoding/experiments/segmentation.html), however, from the training log file (https://raw.githubusercontent.com/zhanghang1989/image-data/master/encoding/segmentation/logs/encnet_resnet50_ade.log) I see that it obtains 78.0 pixAcc and 40.2 mIoU lower than the results you reported.
Is this because you use the multi-scale testing strategy on ADE20K val set? Or something else ?

zhanghang1989 · 2018-09-26T17:12:44Z

The validation during the training is using center crop, only for monitoring the training process.

zhanghang1989 added the help wanted label Jun 9, 2018

zhanghang1989 added the bug label Aug 3, 2018

zhanghang1989 removed the help wanted label Aug 3, 2018

zhanghang1989 closed this as completed in f891919 Sep 26, 2018

No module named cpp_extension #67

No module named cpp_extension #67

Comments

qiulesun commented Jun 9, 2018 • edited

zhanghang1989 commented Jun 9, 2018

qiulesun commented Jun 15, 2018

zhanghang1989 commented Jun 15, 2018

qiulesun commented Jun 19, 2018

zhanghang1989 commented Jun 19, 2018 • edited

qiulesun commented Jun 20, 2018

zhanghang1989 commented Jun 20, 2018

zhanghang1989 commented Jun 20, 2018 • edited

qiulesun commented Jun 20, 2018 • edited

zhanghang1989 commented Jun 20, 2018

qiulesun commented Jun 20, 2018

zhanghang1989 commented Jun 20, 2018

qiulesun commented Jun 24, 2018 • edited

zhanghang1989 commented Jun 24, 2018

qiulesun commented Jun 26, 2018 • edited

zhanghang1989 commented Jun 26, 2018

qiulesun commented Jul 1, 2018

zhanghang1989 commented Jul 1, 2018

qiulesun commented Jul 3, 2018

zhanghang1989 commented Jul 3, 2018

qiulesun commented Jul 3, 2018 • edited

zhanghang1989 commented Jul 3, 2018

qiulesun commented Jul 6, 2018

zhanghang1989 commented Jul 6, 2018

qiulesun commented Jul 13, 2018

zhanghang1989 commented Jul 13, 2018 • edited

qiulesun commented Aug 3, 2018 • edited

zhanghang1989 commented Aug 3, 2018

qiulesun commented Aug 4, 2018 • edited

zhanghang1989 commented Aug 5, 2018

qiulesun commented Aug 7, 2018 • edited

zhanghang1989 commented Aug 7, 2018

qiulesun commented Aug 8, 2018 • edited

zhanghang1989 commented Aug 8, 2018

qiulesun commented Aug 9, 2018 • edited

zhanghang1989 commented Aug 9, 2018

qiulesun commented Aug 9, 2018 • edited

zhanghang1989 commented Aug 9, 2018

qiulesun commented Aug 10, 2018 • edited

zhanghang1989 commented Aug 10, 2018

qiulesun commented Aug 12, 2018 • edited

zhanghang1989 commented Aug 13, 2018

qiulesun commented Aug 24, 2018 • edited

qiulesun commented Sep 19, 2018 • edited

zhanghang1989 commented Sep 19, 2018

qiulesun commented Sep 21, 2018 • edited

zhanghang1989 commented Sep 21, 2018

qiulesun commented Sep 24, 2018 • edited

zhanghang1989 commented Sep 24, 2018

qiulesun commented Sep 26, 2018 • edited

zhanghang1989 commented Sep 26, 2018

qiulesun commented Jun 9, 2018 •

edited

zhanghang1989 commented Jun 19, 2018 •

edited

zhanghang1989 commented Jun 20, 2018 •

edited

qiulesun commented Jun 20, 2018 •

edited

qiulesun commented Jun 24, 2018 •

edited

qiulesun commented Jun 26, 2018 •

edited

qiulesun commented Jul 3, 2018 •

edited

zhanghang1989 commented Jul 13, 2018 •

edited

qiulesun commented Aug 3, 2018 •

edited

qiulesun commented Aug 4, 2018 •

edited

qiulesun commented Aug 7, 2018 •

edited

qiulesun commented Aug 8, 2018 •

edited

qiulesun commented Aug 9, 2018 •

edited

qiulesun commented Aug 9, 2018 •

edited

qiulesun commented Aug 10, 2018 •

edited

qiulesun commented Aug 12, 2018 •

edited

qiulesun commented Aug 24, 2018 •

edited

qiulesun commented Sep 19, 2018 •

edited

qiulesun commented Sep 21, 2018 •

edited

qiulesun commented Sep 24, 2018 •

edited

qiulesun commented Sep 26, 2018 •

edited