-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inroducing CPU "half" type for 16-bit float, copy and init only #826
Conversation
@soumith: this one should be ready to go. |
thanks. will get this reviewed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, thanks. Just some minor comments/questions below.
@@ -5,14 +5,14 @@ local Storage = {} | |||
local Tensor = {} | |||
|
|||
-- types | |||
local types = {'Byte', 'Char', 'Short', 'Int', 'Long', 'Float', 'Double'} | |||
local types = {'Byte', 'Char', 'Short', 'Int', 'Long', 'Float', 'Half', 'Double'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we guard adding Half/half() only if TH_GENERIC_USE_HALF is defined? E.g. in cutorch, these are only defined if CUDA_HALF_TENSOR is defined. Or are we basically saying we don't support torch without TH_GENERIC_USE_HALF (only dependencies can have it off, e.g. nn)?
Opinion @soumith?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@soumith, @gchanan: there actualy is some challenge here.
After building other modules (updated for half and not) with these changes in torch7, I have learned the following:
- it would have been easier to introduce Half unconditionally and remove TH_GENERIC_USE_HALF switch altogether - then other modules would not have to set this macro up to compile exported torch7 headers properly.
The issue with exported headers may partially be fixed by moving TH_GENERIC_USE_HALF definition into into torch general .h from CMakeLists.txt. Then other modules won't have to set it up , or even know about it - this probably should be done.
Remaining issue here would be that __half is not a native type and normally comes from cuda. We can't define it in torch7 unconditonally as cuda does not have macros to tell that it;s already been defined, so incuding cuda.h after torch would cause an error. I see no way to set it in torch7 without making it and its cmake bits CUDA-aware. I think that is not acceptable.
Any ideas ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@soumith, @gchanan: I think I have found the right way to handle it: please check out the last commit.
Also, I have updated torch/cutorch#578 - see how the CUDA 'half' is passed to TH
@@ -257,6 +211,7 @@ for _,Tensor in ipairs({"ByteTensor", "CharTensor", | |||
end | |||
end | |||
|
|||
if Tensor ~= 'HalfTensor' then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be some check here to pass the if statement when torch.hashalfmath is true? It seems like we'd want these functions if that were the case in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, I will extend the condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gchanan : adding torch.hashalfmath literally does not compile: how do I refer to it correctly from this module ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right, we don't have access to torch functions at this point. There may be some clever things you can do with interface:print cwrap to properly guard everything with #ifdefs, or presumably building the libraries in multiple stages, so you have access to the functions you need when this is actually running.
In any case, since TH_NATIVE_HALF is basically a non-functional placeholder at this point anyway, it doesn't seem worth it to solve this issue right now.
@@ -33,8 +35,33 @@ extern void torch_LongTensorOperator_init(lua_State *L); | |||
extern void torch_FloatTensorOperator_init(lua_State *L); | |||
extern void torch_DoubleTensorOperator_init(lua_State *L); | |||
|
|||
#ifdef TH_USE_HALF_MATH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need both TH_USE_HALF_MATH and TH_USE_GENERIC_HALF_MATH? Is there any case where we would want those values to be different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @gchanan an!
I assume you refer to TH_GENERIC_USE_HALF above:
Yes TH_USE_HALF_MATH was designed to be different, by default TH_USE_HALF_MATH would be on but TH_USE_HALF_MATH would be off (init, file read/write, copy/conversion only). On machines where FP16 is actually supported on CPU (all ARMs have __fp16 supported by gcc), we may choose to enable HalfTensor math as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TH_GENERIC_USE_HALF, on the other side, may need to go. I will comment on top about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed the name and the condition to refer to TH_NATIVE_HALF
#define real half | ||
#define accreal float | ||
#define Real Half | ||
#define THInf FLT_MAX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this won't work directly, e.g. nn has code like:
real maxval = -THInf;
which will give a type conversion error. I don't see any super obvious fix for this, though. In a later commit we could remove this# define for all types and define THNumerics:min/max functions a la cutorch that can be used in nn (they don't have to be constant expressions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There could be working syntax for this like { 0xXXX }, I will check it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that syntax worked, check th update.
IMPLEMENT_THStorage_COPY(Byte) | ||
IMPLEMENT_THStorage_COPY(Char) | ||
IMPLEMENT_THStorage_COPY(Short) | ||
IMPLEMENT_THStorage_COPY(Int) | ||
IMPLEMENT_THStorage_COPY(Long) | ||
IMPLEMENT_THStorage_COPY(Float) | ||
IMPLEMENT_THStorage_COPY(Double) | ||
#else | ||
/* only allow pass-through for Half */ | ||
IMPLEMENT_THStorage_COPY(Half) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like you need something similar to what you do in THTensorCopy.c here. How this is written, a number of functions have their signatures declared in THTensorCopy.h but are undefined, e.g. THDoubleStorage_copyHalf (and the above code won't work to generate it because of conversion problems).
Here's a reproduction of this problem:
th> x=torch.HalfStorage(3)
[0.0000s]
th> x
torch/install/bin/luajit: symbol lookup error:torch/install/lib/lua/5.1/libtorch.so: undefined symbol: THDoubleStorage_copyHalf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the catch, will do!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a change in this file in the latest commits -- did I miss it? Running the above code yields the same results as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, it was not there. Now it is:
th> x=torch.HalfStorage(3)
[0.0001s]
th> x
0
0
0
[torch.HalfStorage of size 3]
[0.0003s]
@@ -128,6 +128,11 @@ TH_API void THTensor_(eqTensorT)(THTensor *r_, THTensor *ta, THTensor *tb); | |||
TH_API void THTensor_(abs)(THTensor *r_, THTensor *t); | |||
#endif | |||
|
|||
#if defined(TH_REAL_IS_FLOAT) || defined(TH_REAL_IS_DOUBLE) || defined(TH_REAL_IS_HALF) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the change here? If I'm reading this right, even if GENERIC_USE_HALF_MATH were enabled, we wouldn't generate these because in THTensorMath.c, these are still guarded by TH_REAL_IS_FLOAT/DOUBLE and not HALF. Did you mean to change THTensorMath.c as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably leftover from some experiments, I will clean it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@@ -32,7 +31,7 @@ void THVector_(fill)(real *x, const real c, const ptrdiff_t n) { | |||
THVector_(fill_DISPATCHPTR)(x, c, n); | |||
} | |||
|
|||
|
|||
#ifndef TH_GENERIC_NO_MATH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just curious, why is fill outside of TH_GENERIC_NO_MATH? AFAICT it's not called unless TH_GENERIC_HALF_MATH is on anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should be removed as well. I thought enabling initialization via fill()was a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
mytester:assert(torch.Tensor.isTensor(t), 'alias not working') | ||
end | ||
function torchtest.isStorage() | ||
local t = torch.randn(3,4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you forget :half() on the tensors below? I can pass expand, repeatTensor, isSameSizeAs without your patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
IMPLEMENT_THStorage_COPY(Byte) | ||
IMPLEMENT_THStorage_COPY(Char) | ||
IMPLEMENT_THStorage_COPY(Short) | ||
IMPLEMENT_THStorage_COPY(Int) | ||
IMPLEMENT_THStorage_COPY(Long) | ||
IMPLEMENT_THStorage_COPY(Float) | ||
IMPLEMENT_THStorage_COPY(Double) | ||
#else | ||
/* only allow pass-through for Half */ | ||
IMPLEMENT_THStorage_COPY(Half) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a change in this file in the latest commits -- did I miss it? Running the above code yields the same results as before.
@@ -3,6 +3,10 @@ | |||
|
|||
#include "THGeneral.h" | |||
|
|||
#ifdef TH_GENERIC_USE_HALF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what your goal is with having some checks as "#if TH_GENERIC_USE_HALF" and some being "#ifdef TH_GENERIC_USE_HALF" -- could you explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relatedly, I don't know what the goal is with cutorch exactly -- do we require torch/cutorch#578 at the same time? Can we commit this without that PR if we restrict ourselves not to use HalfTensor/HalfStorage if we bring in cutorch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was supposed to be '#if TH_GENERIC_USE_HALF', fixed.
The goal with cutorch is to avoid half<->float conversions when not necessary.
This PR can be committed w/o cutorch and no module that do not define TH_GENERIC_USE_HALF would be affected - cutorch woudl still be using its own Half, too.
I just rebuilt and ran the test with this branch and cutorch master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, to be clear: TH_GENERIC_USE_HALF is meant as temporary flag to let us convert Torch modules one at a time. So that modules that do not define it, will not break. Once we have converted all the modules (that do care about generics and use GenerateAllTypes.h from TH), the flag could be retired.
#endif | ||
|
||
#ifndef TH_NATIVE_HALF | ||
# define TH_NATIVE_HALF 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you check if TH_NATIVE_HALF works? (on android?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I have found __fp16 in gcc is not exactly suitable as direct 'half' replacement: there are restrictions to not return it from the function and not to pass as parameter ...
I woudl keep TH_NATIVE_HALF as a placeholder, as it looks like both C++ and C are going to adopt our proposal by 2020: http://open-std.org/JTC1/SC22/WG21/docs/papers/2016/p0192r1.pdf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and I am also looking for a way to adopt __fp16 (via struct) to be usable as it is, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leaving it as a placeholder seems reasonable for now.
Replaced half<-> float conversion routines with NVidia ones (handle denorm better)
Conflicts: lib/TH/generic/THVectorDispatch.c
@@ -11,7 +10,7 @@ | |||
* which SIMD extension a given implementation uses | |||
* 3. A dispatch stub, which is what is actually called by clients, that simply wraps the dispatch pointer. | |||
*/ | |||
|
|||
#ifndef TH_GENERIC_NO_MATH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's no matching #endif for this, I don't know if you meant to add it to the end or prior to vectorDispatchInit (see previous comment about this). If it's prior to vectorDispatchInit, you could just remove the preprocessor directives from that function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, it was merge atrifact between my working branch cpu_half and half_type used for this PR ...
Fixed it now.
@@ -1,3 +1,10 @@ | |||
#include "THGeneral.h" | |||
|
|||
#ifdef TH_GENERIC_USE_HALF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this just be #if TH_GENERIC_USE_HALF? There was a case in a previous review -- if possible, seems much simpler to have all the checks be #if and just do the definition once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks for pointing out! Not just simpler, but most importantly, correct - as THGeneral.h always defines it. In this case it did not make difference, but it's important to have it in uniform fashion.
TH_API void THVector_(fill)(real *x, const real c, const ptrdiff_t n); | ||
TH_API void THVector_(add)(real *y, const real *x, const real c, const ptrdiff_t n); | ||
TH_API void THVector_(diff)(real *z, const real *x, const real *y, const ptrdiff_t n); | ||
TH_API void THVector_(scale)(real *y, const real c, const ptrdiff_t n); | ||
TH_API void THVector_(mul)(real *y, const real *x, const ptrdiff_t n); | ||
#endif | ||
|
||
/* Initialize the dispatch pointers */ | ||
TH_API void THVector_(vectorDispatchInit)(void); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you want this under TH_GENERIC_NO_MATH? In generic/Tensor.c you only call it in that situation anyway.
#endif | ||
|
||
#ifndef TH_NATIVE_HALF | ||
# define TH_NATIVE_HALF 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leaving it as a placeholder seems reasonable for now.
@@ -257,6 +211,7 @@ for _,Tensor in ipairs({"ByteTensor", "CharTensor", | |||
end | |||
end | |||
|
|||
if Tensor ~= 'HalfTensor' then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right, we don't have access to torch functions at this point. There may be some clever things you can do with interface:print cwrap to properly guard everything with #ifdefs, or presumably building the libraries in multiple stages, so you have access to the functions you need when this is actually running.
In any case, since TH_NATIVE_HALF is basically a non-functional placeholder at this point anyway, it doesn't seem worth it to solve this issue right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a number of lua functions defined on half that don't work; I assume they should not be defined because they are mainly mathematical (see below). I didn't check all of them, but it would be nice if we had tests to exercise these:
- Index functions: (e.g. /home/gchanan/local/torch/install/bin/luajit: symbol lookup error: /home/gchanan/local/torch/install/lib/lua/5.1/libtorch.so: undefined symbol: THHalfTensor_indexSelect)
index
indexAdd
indexCopy
indexFill
I didn't actually check, but I assume the maskedFunctions (maskedCopy, maskedFille, maskedSelect have the same issue).
- Function application (CudaHalfTensor doesn't implement this afaict, so there's no ground truth, but should probably do the conversion automatically:
apply
(e.g. i = 0
[0.0000s]
th> y:apply(function() i = i +1; return i end)
.../gchanan/local/torch/install/share/lua/5.1/torch/FFI.lua:124: cannot convert 'number' to 'struct 114'
stack traceback:)
I assume these have the same issue, but didn't actually test:
map
map2
@@ -32,6 +40,7 @@ struct THFileVTable | |||
size_t (*writeLong)(THFile *self, long *data, size_t n); | |||
size_t (*writeFloat)(THFile *self, float *data, size_t n); | |||
size_t (*writeDouble)(THFile *self, double *data, size_t n); | |||
size_t (*writeHalf)(THFile *self, half *data, size_t n); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gchanan: the intent was to not define any functions except copy/init and file I/O. So that one can just write a pipeline using CUDNN with half and avoid all conversions. I suggest we revisit that if/when I manage to get NATIVE_HALF working ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will fix the guards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was the comment re: "intent was not to define" meant for this comment, or the one above it about the math functions in lua? (I'm going to assume that it refers to the math functions since this is in regards to file I/O). I agree with the intent -- but shouldn't we avoid defining the functions in lua if they don't actually have an implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, it was for the math. Yes it would be better to block Lua definitions also I just was not sure how to do it ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to guard the relevant functions in generic/Tensor.c.
@@ -23,6 +30,7 @@ struct THFileVTable | |||
size_t (*readLong)(THFile *self, long *data, size_t n); | |||
size_t (*readFloat)(THFile *self, float *data, size_t n); | |||
size_t (*readDouble)(THFile *self, double *data, size_t n); | |||
size_t (*readHalf)(THFile *self, half *data, size_t n); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be guarded by #if TH_GENERIC_HALF? There are other places in the code where half isn't guarded, but those are .c files that presumably wouldn't be included by outside projects, whereas this is an .h file.
Merged via #874 |
* | ||
* NOTICE TO LICENSEE: | ||
* | ||
* This source code and/or documentation ("Licensed Deliverables") are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted in pytorch/pytorch#654 , the license text that we included here was incorrect. We are working to get this corrected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a finished and squashed version of #818.