Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove with_stateless and only_stateless. Replace with variants: [method,function] #1

Closed
2 tasks done
zdevito opened this issue Jun 6, 2017 · 1 comment
Closed
2 tasks done
Assignees

Comments

@zdevito
Copy link
Owner

zdevito commented Jun 6, 2017

with_stateless: True === variants: [ method, function ]
only_stateless === variants: [ functions ]
nothing specified (i.e. default variants): variants: [ method ]

  • do it for TensorLib, leaving with_stateless and only_stateless in place
  • backport it to actual cwrap and remove with_stateless and only_stateless
@killeent killeent self-assigned this Jun 6, 2017
@killeent
Copy link
Collaborator

killeent commented Jun 6, 2017

This is done.

@killeent killeent closed this as completed Jun 6, 2017
zdevito pushed a commit that referenced this issue Nov 14, 2017
* Fix CUDA 9 builds for Windows

* Add msvc conditional flag

* minor bug fix

* minor bugs #1
zdevito pushed a commit that referenced this issue Jan 23, 2018
Currently, index operation kernels work in "source/destination index-major
order".  (E.g., if thread count equals slice size, each thread will process
slice #0 in lockstep, and then slice #1, and so on.)

However, when elements inside each "slice" is separated by large strides (e.g.,
selecting columns of a matrix), it is better to switch to "elementInSlice-major
order".  For example, each thread can process element #0 of every slice, and
then element #1 of every slice, and so on.
zdevito pushed a commit that referenced this issue Sep 11, 2018
…orms we care about. (pytorch#11394)

Summary:
While the use of memcpy as part of the byte swapping sequence looks funky, all major
compilers recognize and optimize this pattern reliably, resulting in essentially
optimal code generation.

For example, decodeUInt32LE goes from this on iOS arm64:
>         ldrb    w8, [x0, #3]
>         ldrb    w9, [x0, #2]
>         bfi     w8, w9, #8, #8
>         ldrb    w9, [x0, #1]
>         bfi     w8, w9, #16, #8
>         ldrb            w9, [x0]
>         bfi     w8, w9, #24, #8
>         mov      x0, x8
>         ret

To this:
>         ldr             w8, [x0]
>         rev     w0, w8
>         ret
Pull Request resolved: pytorch#11394

Reviewed By: SsnL

Differential Revision: D9728659

Pulled By: resistor

fbshipit-source-id: 9afbd4adfad1d1fb7b01f1179e6707ee21fa726f
zdevito pushed a commit that referenced this issue Nov 16, 2018
pytorch#14040)

Summary:
…2164)"

This reverts commit 4b7c615.
Pull Request resolved: pytorch#14040

Differential Revision: D13089531

Pulled By: yinghai

fbshipit-source-id: 2114b36111dab6f179c02921bbc9bd382ef461bf
zdevito pushed a commit that referenced this issue Feb 22, 2019
Summary:
Currently there is a mismatch in naming between Python BatchNorm `running_var` and C++ BatchNorm `running_variance`, which causes JIT model parameters loading to fail (pytorch/vision#728 (comment)):
```
terminate called after throwing an instance of 'c10::Error'
  what():  No such serialized tensor 'running_variance' (read at /home/shahriar/Build/pytorch/torch/csrc/api/src/serialize/input-archive.cpp:27)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x85 (0x7f2d92d32f95 in /usr/local/lib/libc10.so)
frame #1: torch::serialize::InputArchive::read(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, at::Tensor&, bool) + 0xdeb (0x7f2d938551ab in /usr/local/lib/libtorch.so.1)
frame #2: torch::nn::Module::load(torch::serialize::InputArchive&) + 0x98 (0x7f2d9381cd08 in /usr/local/lib/libtorch.so.1)
frame #3: torch::nn::Module::load(torch::serialize::InputArchive&) + 0xf9 (0x7f2d9381cd69 in /usr/local/lib/libtorch.so.1)
frame #4: torch::nn::Module::load(torch::serialize::InputArchive&) + 0xf9 (0x7f2d9381cd69 in /usr/local/lib/libtorch.so.1)
frame #5: torch::nn::operator>>(torch::serialize::InputArchive&, std::shared_ptr<torch::nn::Module> const&) + 0x32 (0x7f2d9381c7b2 in /usr/local/lib/libtorch.so.1)
frame #6: <unknown function> + 0x2b16c (0x5645f4d1916c in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest)
frame #7: <unknown function> + 0x27a3c (0x5645f4d15a3c in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest)
frame #8: <unknown function> + 0x2165c (0x5645f4d0f65c in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest)
frame #9: <unknown function> + 0x1540b (0x5645f4d0340b in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest)
frame #10: __libc_start_main + 0xf3 (0x7f2d051dd223 in /usr/lib/libc.so.6)
frame #11: <unknown function> + 0x1381e (0x5645f4d0181e in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest)
```
Renaming C++ BatchNorm `running_variance` to `running_var` should fix this problem.

This is a BC-breaking change, but it should be easy for end user to rename `running_variance` to `running_var` in their call sites.
Pull Request resolved: pytorch#17371

Reviewed By: goldsborough

Differential Revision: D14172775

Pulled By: yf225

fbshipit-source-id: b9d3729ec79272a8084269756f28a8f7c4dd16b6
ailzhang pushed a commit that referenced this issue Apr 9, 2019
Summary:
Tracing models which attempts to return this in-place value doesn't turn out well.

I haven't run any tests to confirm the results to be honest, but regardless of the outcome, the operation happens in-place, so it should work as before.

Sample output from traced model attempting to set `max_norm` on `Embedding`:
```
a leaf Variable that requires grad has been used in an in-place operation. (check_inplace at /pytorch/torch/csrc/autograd/VariableTypeUtils.h:49)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f0ecc5cc021 in /usr/local/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f0ecc5cb8ea in /usr/local/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #2: <unknown function> + 0x38ab2f (0x7f0ecb55ab2f in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #3: torch::autograd::VariableType::embedding_renorm_(at::Tensor&, at::Tensor const&, double, double) const + 0x76 (0x7f0ecb5b5966 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #4: <unknown function> + 0x56c958 (0x7f0ecb73c958 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #5: <unknown function> + 0x672286 (0x7f0ecb842286 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #6: torch::jit::InterpreterState::run(std::vector<c10::IValue, std::allocator<c10::IValue> >&) + 0x22 (0x7f0ecb83d842 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #7: <unknown function> + 0x65c6ac (0x7f0ecb82c6ac in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #8: <unknown function> + 0x3c8ab4 (0x7f0f06bc0ab4 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #9: <unknown function> + 0x3ad2c3 (0x7f0f06ba52c3 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #10: <unknown function> + 0x11663e (0x7f0f0690e63e in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame pytorch#39: python_call + 0x11 (0x5563c3c521c1 in uwsgi)
frame pytorch#40: uwsgi_request_wsgi + 0x100 (0x5563c3c54410 in uwsgi)
frame pytorch#41: wsgi_req_recv + 0xac (0x5563c3becabc in uwsgi)
frame pytorch#42: simple_loop_run + 0xc4 (0x5563c3c35be4 in uwsgi)
frame pytorch#43: simple_loop + 0x10 (0x5563c3c35a00 in uwsgi)
frame pytorch#44: uwsgi_ignition + 0x241 (0x5563c3c3a3a1 in uwsgi)
frame pytorch#45: uwsgi_worker_run + 0x275 (0x5563c3c3ec35 in uwsgi)
frame pytorch#46: <unknown function> + 0x8f22c (0x5563c3c3f22c in uwsgi)
frame pytorch#47: <unknown function> + 0x3c13e (0x5563c3bec13e in uwsgi)
frame pytorch#48: __libc_start_main + 0xf1 (0x7f0f138922e1 in /lib/x86_64-linux-gnu/libc.so.6)
frame pytorch#49: _start + 0x2a (0x5563c3bec16a in uwsgi)
:
operation failed in interpreter:
op_version_set = 0
def forward(self,
    input_1: Tensor) -> Tensor:
  _0 = torch.norm(self.item_embedding.weight, 2, 1, True)
  _1 = torch.div(self.item_embedding.weight, _0)
  m_weight = torch.t(_1)
  input_2 = torch.contiguous(input_1)
  weight_1 = torch.embedding_renorm_(self.item_embedding.weight, input_2, 1., 2.)
             ~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
  x = torch.embedding(weight_1, input_2, -1, False, False)
  input_3 = torch.div(x, torch.norm(x, 2, 2, True))
  max_batch_size = ops.prim.NumToTensor(torch.size(input_3, 0))
  hx = torch.zeros([2, int(max_batch_size), 70], dtype=6, layout=0, device=torch.device("cpu"))
  _2 = [self.lstm_layer.weight_ih_l0, self.lstm_layer.weight_hh_l0, self.lstm_layer.weight_ih_l1, self.lstm_layer.weight_hh_l1]
  input_4, _3, _4 = torch.lstm(input_3, [hx, hx], _2, False, 2, 0.10000000000000001, False, False, True)
  input = torch.matmul(input_4, torch.t(self.rnn2item.weight))
  tastevec = torch.div(input, torch.norm(input, 2, 2, True))
  outputs = torch.matmul(tastevec, m_weight)
```
Pull Request resolved: pytorch#18684

Differential Revision: D14782041

Pulled By: ezyang

fbshipit-source-id: 7b2fc19b7d5b6600263644498bb728319a19f39d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants