Inroducing CPU "half" type for 16-bit float, copy and init only #826

borisfom · 2016-10-29T22:06:15Z

This is a finished and squashed version of #818.

borisfom · 2016-11-10T18:46:10Z

@soumith: this one should be ready to go.

soumith · 2016-11-16T22:39:25Z

thanks. will get this reviewed.

gchanan

This looks good, thanks. Just some minor comments/questions below.

gchanan · 2016-11-18T18:51:55Z

Tensor.lua

@@ -5,14 +5,14 @@ local Storage = {}
 local Tensor = {}

 -- types
-local types = {'Byte', 'Char', 'Short', 'Int', 'Long', 'Float', 'Double'}
+local types = {'Byte', 'Char', 'Short', 'Int', 'Long', 'Float', 'Half', 'Double'}


should we guard adding Half/half() only if TH_GENERIC_USE_HALF is defined? E.g. in cutorch, these are only defined if CUDA_HALF_TENSOR is defined. Or are we basically saying we don't support torch without TH_GENERIC_USE_HALF (only dependencies can have it off, e.g. nn)?

Opinion @soumith?

@soumith, @gchanan: there actualy is some challenge here.
After building other modules (updated for half and not) with these changes in torch7, I have learned the following:

it would have been easier to introduce Half unconditionally and remove TH_GENERIC_USE_HALF switch altogether - then other modules would not have to set this macro up to compile exported torch7 headers properly.
The issue with exported headers may partially be fixed by moving TH_GENERIC_USE_HALF definition into into torch general .h from CMakeLists.txt. Then other modules won't have to set it up , or even know about it - this probably should be done.
Remaining issue here would be that __half is not a native type and normally comes from cuda. We can't define it in torch7 unconditonally as cuda does not have macros to tell that it;s already been defined, so incuding cuda.h after torch would cause an error. I see no way to set it in torch7 without making it and its cmake bits CUDA-aware. I think that is not acceptable.
Any ideas ?

@soumith, @gchanan: I think I have found the right way to handle it: please check out the last commit.
Also, I have updated torch/cutorch#578 - see how the CUDA 'half' is passed to TH

gchanan · 2016-11-18T18:58:36Z

TensorMath.lua

@@ -257,6 +211,7 @@ for _,Tensor in ipairs({"ByteTensor", "CharTensor",
             end
   end

+   if Tensor ~= 'HalfTensor' then


Should there be some check here to pass the if statement when torch.hashalfmath is true? It seems like we'd want these functions if that were the case in the future.

Correct, I will extend the condition.

@gchanan : adding torch.hashalfmath literally does not compile: how do I refer to it correctly from this module ?

you are right, we don't have access to torch functions at this point. There may be some clever things you can do with interface:print cwrap to properly guard everything with #ifdefs, or presumably building the libraries in multiple stages, so you have access to the functions you need when this is actually running.

In any case, since TH_NATIVE_HALF is basically a non-functional placeholder at this point anyway, it doesn't seem worth it to solve this issue right now.

gchanan · 2016-11-18T19:03:13Z

init.c

@@ -33,8 +35,33 @@ extern void torch_LongTensorOperator_init(lua_State *L);
 extern void torch_FloatTensorOperator_init(lua_State *L);
 extern void torch_DoubleTensorOperator_init(lua_State *L);

+#ifdef TH_USE_HALF_MATH


do we need both TH_USE_HALF_MATH and TH_USE_GENERIC_HALF_MATH? Is there any case where we would want those values to be different?

Thanks for the review @gchanan an!
I assume you refer to TH_GENERIC_USE_HALF above:
Yes TH_USE_HALF_MATH was designed to be different, by default TH_USE_HALF_MATH would be on but TH_USE_HALF_MATH would be off (init, file read/write, copy/conversion only). On machines where FP16 is actually supported on CPU (all ARMs have __fp16 supported by gcc), we may choose to enable HalfTensor math as well.

TH_GENERIC_USE_HALF, on the other side, may need to go. I will comment on top about that.

I have changed the name and the condition to refer to TH_NATIVE_HALF

gchanan · 2016-11-18T20:02:04Z

lib/TH/THGenerateAllTypes.h

+#define real half
+#define accreal float
+#define Real Half
+#define THInf FLT_MAX


this won't work directly, e.g. nn has code like:
real maxval = -THInf;
which will give a type conversion error. I don't see any super obvious fix for this, though. In a later commit we could remove this# define for all types and define THNumerics:min/max functions a la cutorch that can be used in nn (they don't have to be constant expressions).

There could be working syntax for this like { 0xXXX }, I will check it.

Yes that syntax worked, check th update.

gchanan · 2016-11-18T23:05:06Z

lib/TH/generic/THStorageCopy.c

 IMPLEMENT_THStorage_COPY(Byte)
 IMPLEMENT_THStorage_COPY(Char)
 IMPLEMENT_THStorage_COPY(Short)
 IMPLEMENT_THStorage_COPY(Int)
 IMPLEMENT_THStorage_COPY(Long)
 IMPLEMENT_THStorage_COPY(Float)
 IMPLEMENT_THStorage_COPY(Double)
+#else
+/* only allow pass-through for Half */
+IMPLEMENT_THStorage_COPY(Half)


it looks like you need something similar to what you do in THTensorCopy.c here. How this is written, a number of functions have their signatures declared in THTensorCopy.h but are undefined, e.g. THDoubleStorage_copyHalf (and the above code won't work to generate it because of conversion problems).

Here's a reproduction of this problem:
th> x=torch.HalfStorage(3)
[0.0000s]
th> x
torch/install/bin/luajit: symbol lookup error:torch/install/lib/lua/5.1/libtorch.so: undefined symbol: THDoubleStorage_copyHalf

Thanks for the catch, will do!

I don't see a change in this file in the latest commits -- did I miss it? Running the above code yields the same results as before.

Right, it was not there. Now it is:
th> x=torch.HalfStorage(3)
[0.0001s]
th> x
0
0
0
[torch.HalfStorage of size 3]

[0.0003s]

gchanan · 2016-11-18T23:37:48Z

lib/TH/generic/THTensorMath.h

@@ -128,6 +128,11 @@ TH_API void THTensor_(eqTensorT)(THTensor *r_, THTensor *ta, THTensor *tb);
 TH_API void THTensor_(abs)(THTensor *r_, THTensor *t);
 #endif

+#if defined(TH_REAL_IS_FLOAT) || defined(TH_REAL_IS_DOUBLE) || defined(TH_REAL_IS_HALF)


Why the change here? If I'm reading this right, even if GENERIC_USE_HALF_MATH were enabled, we wouldn't generate these because in THTensorMath.c, these are still guarded by TH_REAL_IS_FLOAT/DOUBLE and not HALF. Did you mean to change THTensorMath.c as well?

This is probably leftover from some experiments, I will clean it up.

gchanan · 2016-11-18T23:41:51Z

lib/TH/generic/THVectorDispatch.c

@@ -32,7 +31,7 @@ void THVector_(fill)(real *x, const real c, const ptrdiff_t n) {
  THVector_(fill_DISPATCHPTR)(x, c, n);
 }

-
+#ifndef TH_GENERIC_NO_MATH


just curious, why is fill outside of TH_GENERIC_NO_MATH? AFAICT it's not called unless TH_GENERIC_HALF_MATH is on anyway?

Probably should be removed as well. I thought enabling initialization via fill()was a good idea.

gchanan · 2016-11-18T23:55:53Z

test/test_half.lua

+   mytester:assert(torch.Tensor.isTensor(t), 'alias not working')
+end
+function torchtest.isStorage()
+  local t = torch.randn(3,4)


did you forget :half() on the tensors below? I can pass expand, repeatTensor, isSameSizeAs without your patch.

borisfom · 2016-11-29T01:20:28Z

@soumith, @gchanan : I believe I have addressed all the code review comments.
Please verify if it's ready to go!

gchanan · 2016-12-06T19:47:23Z

lib/TH/generic/THStorageCopy.c

 IMPLEMENT_THStorage_COPY(Byte)
 IMPLEMENT_THStorage_COPY(Char)
 IMPLEMENT_THStorage_COPY(Short)
 IMPLEMENT_THStorage_COPY(Int)
 IMPLEMENT_THStorage_COPY(Long)
 IMPLEMENT_THStorage_COPY(Float)
 IMPLEMENT_THStorage_COPY(Double)
+#else
+/* only allow pass-through for Half */
+IMPLEMENT_THStorage_COPY(Half)


I don't see a change in this file in the latest commits -- did I miss it? Running the above code yields the same results as before.

gchanan · 2016-12-06T21:41:27Z

lib/TH/TH.h

@@ -3,6 +3,10 @@

 #include "THGeneral.h"

+#ifdef TH_GENERIC_USE_HALF


I'm not sure what your goal is with having some checks as "#if TH_GENERIC_USE_HALF" and some being "#ifdef TH_GENERIC_USE_HALF" -- could you explain?

Relatedly, I don't know what the goal is with cutorch exactly -- do we require torch/cutorch#578 at the same time? Can we commit this without that PR if we restrict ourselves not to use HalfTensor/HalfStorage if we bring in cutorch.

This was supposed to be '#if TH_GENERIC_USE_HALF', fixed.
The goal with cutorch is to avoid half<->float conversions when not necessary.
This PR can be committed w/o cutorch and no module that do not define TH_GENERIC_USE_HALF would be affected - cutorch woudl still be using its own Half, too.
I just rebuilt and ran the test with this branch and cutorch master.

Also, to be clear: TH_GENERIC_USE_HALF is meant as temporary flag to let us convert Torch modules one at a time. So that modules that do not define it, will not break. Once we have converted all the modules (that do care about generics and use GenerateAllTypes.h from TH), the flag could be retired.

gchanan · 2016-12-06T22:06:01Z

lib/TH/THGeneral.h.in

+#endif
+
+#ifndef TH_NATIVE_HALF
+# define TH_NATIVE_HALF 0


did you check if TH_NATIVE_HALF works? (on android?)

Well, I have found __fp16 in gcc is not exactly suitable as direct 'half' replacement: there are restrictions to not return it from the function and not to pass as parameter ...
I woudl keep TH_NATIVE_HALF as a placeholder, as it looks like both C++ and C are going to adopt our proposal by 2020: http://open-std.org/JTC1/SC22/WG21/docs/papers/2016/p0192r1.pdf

... and I am also looking for a way to adopt __fp16 (via struct) to be usable as it is, too.

leaving it as a placeholder seems reasonable for now.

Replaced half<-> float conversion routines with NVidia ones (handle denorm better)

Conflicts: lib/TH/generic/THVectorDispatch.c

gchanan · 2016-12-08T20:45:13Z

lib/TH/generic/THVectorDispatch.c

@@ -11,7 +10,7 @@
 *    which SIMD extension a given implementation uses
 * 3. A dispatch stub, which is what is actually called by clients, that simply wraps the dispatch pointer.
 */
-
+#ifndef TH_GENERIC_NO_MATH


there's no matching #endif for this, I don't know if you meant to add it to the end or prior to vectorDispatchInit (see previous comment about this). If it's prior to vectorDispatchInit, you could just remove the preprocessor directives from that function.

Right, it was merge atrifact between my working branch cpu_half and half_type used for this PR ...
Fixed it now.

gchanan · 2016-12-08T20:51:59Z

lib/TH/THFilePrivate.h

@@ -1,3 +1,10 @@
+#include "THGeneral.h"
+
+#ifdef TH_GENERIC_USE_HALF


can this just be #if TH_GENERIC_USE_HALF? There was a case in a previous review -- if possible, seems much simpler to have all the checks be #if and just do the definition once.

Fixed, thanks for pointing out! Not just simpler, but most importantly, correct - as THGeneral.h always defines it. In this case it did not make difference, but it's important to have it in uniform fashion.

gchanan · 2016-12-08T20:59:20Z

lib/TH/generic/THVector.h

 TH_API void THVector_(fill)(real *x, const real c, const ptrdiff_t n);
 TH_API void THVector_(add)(real *y, const real *x, const real c, const ptrdiff_t n);
 TH_API void THVector_(diff)(real *z, const real *x, const real *y, const ptrdiff_t n);
 TH_API void THVector_(scale)(real *y, const real c, const ptrdiff_t n);
 TH_API void THVector_(mul)(real *y, const real *x, const ptrdiff_t n);
+#endif

 /* Initialize the dispatch pointers */
 TH_API void THVector_(vectorDispatchInit)(void);


do you want this under TH_GENERIC_NO_MATH? In generic/Tensor.c you only call it in that situation anyway.

gchanan · 2016-12-08T21:02:52Z

lib/TH/THGeneral.h.in

+#endif
+
+#ifndef TH_NATIVE_HALF
+# define TH_NATIVE_HALF 0


leaving it as a placeholder seems reasonable for now.

gchanan · 2016-12-08T22:56:42Z

TensorMath.lua

@@ -257,6 +211,7 @@ for _,Tensor in ipairs({"ByteTensor", "CharTensor",
             end
   end

+   if Tensor ~= 'HalfTensor' then


you are right, we don't have access to torch functions at this point. There may be some clever things you can do with interface:print cwrap to properly guard everything with #ifdefs, or presumably building the libraries in multiple stages, so you have access to the functions you need when this is actually running.

In any case, since TH_NATIVE_HALF is basically a non-functional placeholder at this point anyway, it doesn't seem worth it to solve this issue right now.

gchanan

There are a number of lua functions defined on half that don't work; I assume they should not be defined because they are mainly mathematical (see below). I didn't check all of them, but it would be nice if we had tests to exercise these:

Index functions: (e.g. /home/gchanan/local/torch/install/bin/luajit: symbol lookup error: /home/gchanan/local/torch/install/lib/lua/5.1/libtorch.so: undefined symbol: THHalfTensor_indexSelect)
index
indexAdd
indexCopy
indexFill

I didn't actually check, but I assume the maskedFunctions (maskedCopy, maskedFille, maskedSelect have the same issue).

Function application (CudaHalfTensor doesn't implement this afaict, so there's no ground truth, but should probably do the conversion automatically:
apply
(e.g. i = 0
[0.0000s]
th> y:apply(function() i = i +1; return i end)
.../gchanan/local/torch/install/share/lua/5.1/torch/FFI.lua:124: cannot convert 'number' to 'struct 114'
stack traceback:)

I assume these have the same issue, but didn't actually test:
map
map2

gchanan · 2016-12-09T16:53:31Z

lib/TH/THFilePrivate.h

@@ -32,6 +40,7 @@ struct THFileVTable
    size_t (*writeLong)(THFile *self, long *data, size_t n);
    size_t (*writeFloat)(THFile *self, float *data, size_t n);
    size_t (*writeDouble)(THFile *self, double *data, size_t n);
+    size_t (*writeHalf)(THFile *self, half *data, size_t n);


@gchanan: the intent was to not define any functions except copy/init and file I/O. So that one can just write a pipeline using CUDNN with half and avoid all conversions. I suggest we revisit that if/when I manage to get NATIVE_HALF working ?

I will fix the guards.

was the comment re: "intent was not to define" meant for this comment, or the one above it about the math functions in lua? (I'm going to assume that it refers to the math functions since this is in regards to file I/O). I agree with the intent -- but shouldn't we avoid defining the functions in lua if they don't actually have an implementation?

right, it was for the math. Yes it would be better to block Lua definitions also I just was not sure how to do it ...

I think you need to guard the relevant functions in generic/Tensor.c.

gchanan · 2016-12-09T16:54:33Z

lib/TH/THFilePrivate.h

@@ -23,6 +30,7 @@ struct THFileVTable
    size_t (*readLong)(THFile *self, long *data, size_t n);
    size_t (*readFloat)(THFile *self, float *data, size_t n);
    size_t (*readDouble)(THFile *self, double *data, size_t n);
+    size_t (*readHalf)(THFile *self, half *data, size_t n);


shouldn't this be guarded by #if TH_GENERIC_HALF? There are other places in the code where half isn't guarded, but those are .c files that presumably wouldn't be included by outside projects, whereas this is an .h file.

borisfom · 2016-12-30T11:00:17Z

Merged via #874

cliffwoolley · 2017-02-01T19:29:20Z

lib/TH/THHalf.c

+ *
+ * NOTICE TO LICENSEE:
+ *
+ * This source code and/or documentation ("Licensed Deliverables") are


As noted in pytorch/pytorch#654 , the license text that we included here was incorrect. We are working to get this corrected.

borisfom added 8 commits October 25, 2016 23:00

torch.HalfTensor mostly working

f64601a

Removing LuaT dependency from lib/TH

e81caad

Right include guards, THFile* fixed

84d766c

half -> TH_half to avoid conflicts; TensorMath cleaned up

b60e524

Merge remote-tracking branch 'upstream/master' into cpu_half

41ad114

Rebase. Storage copy for same type optimized

c2c1cdb

Back to using the same type

cb3dd36

Inroducing CPU half type for 16-bin float, copy and init only

7d4c13e

borisfom changed the title ~~Inroducing CPU half type for 16-bin float, copy and init only~~ Inroducing CPU half type for 16-bit float, copy and init only Oct 29, 2016

borisfom changed the title ~~Inroducing CPU half type for 16-bit float, copy and init only~~ Inroducing CPU "half" type for 16-bit float, copy and init only Oct 29, 2016

This was referenced Oct 29, 2016

torch.HalfTensor mostly working #818

Closed

Using Torch.HalfTensor torch/cutorch#578

Closed

This was referenced Nov 11, 2016

There should be CPU type HalfTensor with type half (Also HalfStorage etc). torch/cutorch#465

Closed

Better integration of CudaHalfTensor proposal torch/cutorch#484

Closed

Merge branch 'cpu_half' into android

7a2d416

gchanan suggested changes Nov 19, 2016

View reviewed changes

borisfom added 2 commits November 21, 2016 04:25

Addressing code review comments

2131afe

Merge branch 'android' into half_type

c024ded

borisfom mentioned this pull request Nov 29, 2016

Tensor / Storage copy bugs torch/cutorch#612

Closed

gchanan reviewed Dec 6, 2016

View reviewed changes

borisfom added 4 commits December 6, 2016 17:01

Code review fixes

3a003e2

Merge remote-tracking branch 'upstream/master' into cpu_half

29c4ee0

Added missing copy methods

420fab6

Replaced half<-> float conversion routines with NVidia ones (handle denorm better)

Merge branch 'cpu_half' into half_type

56741eb

Conflicts: lib/TH/generic/THVectorDispatch.c

gchanan reviewed Dec 8, 2016

View reviewed changes

borisfom added 2 commits December 8, 2016 19:12

Fixing NO_MATH conditionals

c8cc7f3

Fixed ifdef

2a2bbfd

gchanan reviewed Dec 9, 2016

View reviewed changes

borisfom added 2 commits December 9, 2016 13:43

Added guards

05de34a

Merge remote-tracking branch 'upstream/master' into half_type

41580c9

gchanan mentioned this pull request Dec 21, 2016

Add support for torch.HalfTensor #874

Merged

borisfom closed this Dec 30, 2016

borisfom deleted the half_type branch December 30, 2016 11:00

cliffwoolley reviewed Feb 1, 2017

View reviewed changes

		@@ -3,6 +3,10 @@

		#include "THGeneral.h"

		#ifdef TH_GENERIC_USE_HALF

		@@ -1,3 +1,10 @@
		#include "THGeneral.h"

		#ifdef TH_GENERIC_USE_HALF

Inroducing CPU "half" type for 16-bit float, copy and init only #826

Inroducing CPU "half" type for 16-bit float, copy and init only #826

Conversation

borisfom commented Oct 29, 2016

borisfom commented Nov 10, 2016

soumith commented Nov 16, 2016

gchanan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

borisfom commented Nov 29, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gchanan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

borisfom commented Dec 30, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment