Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inroducing CPU "half" type for 16-bit float, copy and init only #826

Closed
wants to merge 19 commits into from

Conversation

borisfom
Copy link
Contributor

This is a finished and squashed version of #818.

@borisfom borisfom changed the title Inroducing CPU half type for 16-bin float, copy and init only Inroducing CPU half type for 16-bit float, copy and init only Oct 29, 2016
@borisfom borisfom changed the title Inroducing CPU half type for 16-bit float, copy and init only Inroducing CPU "half" type for 16-bit float, copy and init only Oct 29, 2016
@borisfom
Copy link
Contributor Author

@soumith: this one should be ready to go.

@soumith
Copy link
Member

soumith commented Nov 16, 2016

thanks. will get this reviewed.

Copy link
Contributor

@gchanan gchanan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, thanks. Just some minor comments/questions below.

@@ -5,14 +5,14 @@ local Storage = {}
local Tensor = {}

-- types
local types = {'Byte', 'Char', 'Short', 'Int', 'Long', 'Float', 'Double'}
local types = {'Byte', 'Char', 'Short', 'Int', 'Long', 'Float', 'Half', 'Double'}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we guard adding Half/half() only if TH_GENERIC_USE_HALF is defined? E.g. in cutorch, these are only defined if CUDA_HALF_TENSOR is defined. Or are we basically saying we don't support torch without TH_GENERIC_USE_HALF (only dependencies can have it off, e.g. nn)?

Opinion @soumith?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soumith, @gchanan: there actualy is some challenge here.
After building other modules (updated for half and not) with these changes in torch7, I have learned the following:

  • it would have been easier to introduce Half unconditionally and remove TH_GENERIC_USE_HALF switch altogether - then other modules would not have to set this macro up to compile exported torch7 headers properly.
    The issue with exported headers may partially be fixed by moving TH_GENERIC_USE_HALF definition into into torch general .h from CMakeLists.txt. Then other modules won't have to set it up , or even know about it - this probably should be done.
    Remaining issue here would be that __half is not a native type and normally comes from cuda. We can't define it in torch7 unconditonally as cuda does not have macros to tell that it;s already been defined, so incuding cuda.h after torch would cause an error. I see no way to set it in torch7 without making it and its cmake bits CUDA-aware. I think that is not acceptable.
    Any ideas ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soumith, @gchanan: I think I have found the right way to handle it: please check out the last commit.
Also, I have updated torch/cutorch#578 - see how the CUDA 'half' is passed to TH

@@ -257,6 +211,7 @@ for _,Tensor in ipairs({"ByteTensor", "CharTensor",
end
end

if Tensor ~= 'HalfTensor' then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be some check here to pass the if statement when torch.hashalfmath is true? It seems like we'd want these functions if that were the case in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, I will extend the condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gchanan : adding torch.hashalfmath literally does not compile: how do I refer to it correctly from this module ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, we don't have access to torch functions at this point. There may be some clever things you can do with interface:print cwrap to properly guard everything with #ifdefs, or presumably building the libraries in multiple stages, so you have access to the functions you need when this is actually running.

In any case, since TH_NATIVE_HALF is basically a non-functional placeholder at this point anyway, it doesn't seem worth it to solve this issue right now.

@@ -33,8 +35,33 @@ extern void torch_LongTensorOperator_init(lua_State *L);
extern void torch_FloatTensorOperator_init(lua_State *L);
extern void torch_DoubleTensorOperator_init(lua_State *L);

#ifdef TH_USE_HALF_MATH
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need both TH_USE_HALF_MATH and TH_USE_GENERIC_HALF_MATH? Is there any case where we would want those values to be different?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @gchanan an!
I assume you refer to TH_GENERIC_USE_HALF above:
Yes TH_USE_HALF_MATH was designed to be different, by default TH_USE_HALF_MATH would be on but TH_USE_HALF_MATH would be off (init, file read/write, copy/conversion only). On machines where FP16 is actually supported on CPU (all ARMs have __fp16 supported by gcc), we may choose to enable HalfTensor math as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TH_GENERIC_USE_HALF, on the other side, may need to go. I will comment on top about that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the name and the condition to refer to TH_NATIVE_HALF

#define real half
#define accreal float
#define Real Half
#define THInf FLT_MAX
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this won't work directly, e.g. nn has code like:
real maxval = -THInf;
which will give a type conversion error. I don't see any super obvious fix for this, though. In a later commit we could remove this# define for all types and define THNumerics:min/max functions a la cutorch that can be used in nn (they don't have to be constant expressions).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could be working syntax for this like { 0xXXX }, I will check it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that syntax worked, check th update.

IMPLEMENT_THStorage_COPY(Byte)
IMPLEMENT_THStorage_COPY(Char)
IMPLEMENT_THStorage_COPY(Short)
IMPLEMENT_THStorage_COPY(Int)
IMPLEMENT_THStorage_COPY(Long)
IMPLEMENT_THStorage_COPY(Float)
IMPLEMENT_THStorage_COPY(Double)
#else
/* only allow pass-through for Half */
IMPLEMENT_THStorage_COPY(Half)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like you need something similar to what you do in THTensorCopy.c here. How this is written, a number of functions have their signatures declared in THTensorCopy.h but are undefined, e.g. THDoubleStorage_copyHalf (and the above code won't work to generate it because of conversion problems).

Here's a reproduction of this problem:
th> x=torch.HalfStorage(3)
[0.0000s]
th> x
torch/install/bin/luajit: symbol lookup error:torch/install/lib/lua/5.1/libtorch.so: undefined symbol: THDoubleStorage_copyHalf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch, will do!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a change in this file in the latest commits -- did I miss it? Running the above code yields the same results as before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it was not there. Now it is:
th> x=torch.HalfStorage(3)
[0.0001s]
th> x
0
0
0
[torch.HalfStorage of size 3]

                                                                  [0.0003s]

@@ -128,6 +128,11 @@ TH_API void THTensor_(eqTensorT)(THTensor *r_, THTensor *ta, THTensor *tb);
TH_API void THTensor_(abs)(THTensor *r_, THTensor *t);
#endif

#if defined(TH_REAL_IS_FLOAT) || defined(TH_REAL_IS_DOUBLE) || defined(TH_REAL_IS_HALF)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the change here? If I'm reading this right, even if GENERIC_USE_HALF_MATH were enabled, we wouldn't generate these because in THTensorMath.c, these are still guarded by TH_REAL_IS_FLOAT/DOUBLE and not HALF. Did you mean to change THTensorMath.c as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably leftover from some experiments, I will clean it up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -32,7 +31,7 @@ void THVector_(fill)(real *x, const real c, const ptrdiff_t n) {
THVector_(fill_DISPATCHPTR)(x, c, n);
}


#ifndef TH_GENERIC_NO_MATH
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious, why is fill outside of TH_GENERIC_NO_MATH? AFAICT it's not called unless TH_GENERIC_HALF_MATH is on anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should be removed as well. I thought enabling initialization via fill()was a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

mytester:assert(torch.Tensor.isTensor(t), 'alias not working')
end
function torchtest.isStorage()
local t = torch.randn(3,4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you forget :half() on the tensors below? I can pass expand, repeatTensor, isSameSizeAs without your patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@borisfom
Copy link
Contributor Author

@soumith, @gchanan : I believe I have addressed all the code review comments.
Please verify if it's ready to go!

IMPLEMENT_THStorage_COPY(Byte)
IMPLEMENT_THStorage_COPY(Char)
IMPLEMENT_THStorage_COPY(Short)
IMPLEMENT_THStorage_COPY(Int)
IMPLEMENT_THStorage_COPY(Long)
IMPLEMENT_THStorage_COPY(Float)
IMPLEMENT_THStorage_COPY(Double)
#else
/* only allow pass-through for Half */
IMPLEMENT_THStorage_COPY(Half)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a change in this file in the latest commits -- did I miss it? Running the above code yields the same results as before.

@@ -3,6 +3,10 @@

#include "THGeneral.h"

#ifdef TH_GENERIC_USE_HALF
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what your goal is with having some checks as "#if TH_GENERIC_USE_HALF" and some being "#ifdef TH_GENERIC_USE_HALF" -- could you explain?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relatedly, I don't know what the goal is with cutorch exactly -- do we require torch/cutorch#578 at the same time? Can we commit this without that PR if we restrict ourselves not to use HalfTensor/HalfStorage if we bring in cutorch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was supposed to be '#if TH_GENERIC_USE_HALF', fixed.
The goal with cutorch is to avoid half<->float conversions when not necessary.
This PR can be committed w/o cutorch and no module that do not define TH_GENERIC_USE_HALF would be affected - cutorch woudl still be using its own Half, too.
I just rebuilt and ran the test with this branch and cutorch master.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, to be clear: TH_GENERIC_USE_HALF is meant as temporary flag to let us convert Torch modules one at a time. So that modules that do not define it, will not break. Once we have converted all the modules (that do care about generics and use GenerateAllTypes.h from TH), the flag could be retired.

#endif

#ifndef TH_NATIVE_HALF
# define TH_NATIVE_HALF 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you check if TH_NATIVE_HALF works? (on android?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I have found __fp16 in gcc is not exactly suitable as direct 'half' replacement: there are restrictions to not return it from the function and not to pass as parameter ...
I woudl keep TH_NATIVE_HALF as a placeholder, as it looks like both C++ and C are going to adopt our proposal by 2020: http://open-std.org/JTC1/SC22/WG21/docs/papers/2016/p0192r1.pdf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and I am also looking for a way to adopt __fp16 (via struct) to be usable as it is, too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leaving it as a placeholder seems reasonable for now.

Replaced half<-> float conversion routines with NVidia ones (handle denorm better)
Conflicts:
	lib/TH/generic/THVectorDispatch.c
@@ -11,7 +10,7 @@
* which SIMD extension a given implementation uses
* 3. A dispatch stub, which is what is actually called by clients, that simply wraps the dispatch pointer.
*/

#ifndef TH_GENERIC_NO_MATH
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no matching #endif for this, I don't know if you meant to add it to the end or prior to vectorDispatchInit (see previous comment about this). If it's prior to vectorDispatchInit, you could just remove the preprocessor directives from that function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it was merge atrifact between my working branch cpu_half and half_type used for this PR ...
Fixed it now.

@@ -1,3 +1,10 @@
#include "THGeneral.h"

#ifdef TH_GENERIC_USE_HALF
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this just be #if TH_GENERIC_USE_HALF? There was a case in a previous review -- if possible, seems much simpler to have all the checks be #if and just do the definition once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks for pointing out! Not just simpler, but most importantly, correct - as THGeneral.h always defines it. In this case it did not make difference, but it's important to have it in uniform fashion.

TH_API void THVector_(fill)(real *x, const real c, const ptrdiff_t n);
TH_API void THVector_(add)(real *y, const real *x, const real c, const ptrdiff_t n);
TH_API void THVector_(diff)(real *z, const real *x, const real *y, const ptrdiff_t n);
TH_API void THVector_(scale)(real *y, const real c, const ptrdiff_t n);
TH_API void THVector_(mul)(real *y, const real *x, const ptrdiff_t n);
#endif

/* Initialize the dispatch pointers */
TH_API void THVector_(vectorDispatchInit)(void);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want this under TH_GENERIC_NO_MATH? In generic/Tensor.c you only call it in that situation anyway.

#endif

#ifndef TH_NATIVE_HALF
# define TH_NATIVE_HALF 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leaving it as a placeholder seems reasonable for now.

@@ -257,6 +211,7 @@ for _,Tensor in ipairs({"ByteTensor", "CharTensor",
end
end

if Tensor ~= 'HalfTensor' then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, we don't have access to torch functions at this point. There may be some clever things you can do with interface:print cwrap to properly guard everything with #ifdefs, or presumably building the libraries in multiple stages, so you have access to the functions you need when this is actually running.

In any case, since TH_NATIVE_HALF is basically a non-functional placeholder at this point anyway, it doesn't seem worth it to solve this issue right now.

Copy link
Contributor

@gchanan gchanan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a number of lua functions defined on half that don't work; I assume they should not be defined because they are mainly mathematical (see below). I didn't check all of them, but it would be nice if we had tests to exercise these:

  1. Index functions: (e.g. /home/gchanan/local/torch/install/bin/luajit: symbol lookup error: /home/gchanan/local/torch/install/lib/lua/5.1/libtorch.so: undefined symbol: THHalfTensor_indexSelect)
    index
    indexAdd
    indexCopy
    indexFill

I didn't actually check, but I assume the maskedFunctions (maskedCopy, maskedFille, maskedSelect have the same issue).

  1. Function application (CudaHalfTensor doesn't implement this afaict, so there's no ground truth, but should probably do the conversion automatically:
    apply
    (e.g. i = 0
    [0.0000s]
    th> y:apply(function() i = i +1; return i end)
    .../gchanan/local/torch/install/share/lua/5.1/torch/FFI.lua:124: cannot convert 'number' to 'struct 114'
    stack traceback:)

I assume these have the same issue, but didn't actually test:
map
map2

@@ -32,6 +40,7 @@ struct THFileVTable
size_t (*writeLong)(THFile *self, long *data, size_t n);
size_t (*writeFloat)(THFile *self, float *data, size_t n);
size_t (*writeDouble)(THFile *self, double *data, size_t n);
size_t (*writeHalf)(THFile *self, half *data, size_t n);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gchanan: the intent was to not define any functions except copy/init and file I/O. So that one can just write a pipeline using CUDNN with half and avoid all conversions. I suggest we revisit that if/when I manage to get NATIVE_HALF working ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix the guards.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was the comment re: "intent was not to define" meant for this comment, or the one above it about the math functions in lua? (I'm going to assume that it refers to the math functions since this is in regards to file I/O). I agree with the intent -- but shouldn't we avoid defining the functions in lua if they don't actually have an implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, it was for the math. Yes it would be better to block Lua definitions also I just was not sure how to do it ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to guard the relevant functions in generic/Tensor.c.

@@ -23,6 +30,7 @@ struct THFileVTable
size_t (*readLong)(THFile *self, long *data, size_t n);
size_t (*readFloat)(THFile *self, float *data, size_t n);
size_t (*readDouble)(THFile *self, double *data, size_t n);
size_t (*readHalf)(THFile *self, half *data, size_t n);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be guarded by #if TH_GENERIC_HALF? There are other places in the code where half isn't guarded, but those are .c files that presumably wouldn't be included by outside projects, whereas this is an .h file.

@borisfom
Copy link
Contributor Author

Merged via #874

@borisfom borisfom closed this Dec 30, 2016
@borisfom borisfom deleted the half_type branch December 30, 2016 11:00
*
* NOTICE TO LICENSEE:
*
* This source code and/or documentation ("Licensed Deliverables") are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted in pytorch/pytorch#654 , the license text that we included here was incorrect. We are working to get this corrected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants