New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tensor (reprise) #411
Tensor (reprise) #411
Conversation
@pansk I have some questions.
|
@beru what do you propose to avoid inevitable memory allocation when calling methods? |
|
||
template<typename F> Tensor unary_element_wise_operation(F f) const | ||
{ | ||
if (this->shape() != src.shape()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is src
variable in this unary_element_wise_operation
method?
Unlike binary_element_wise_operation
method, this doesn't have src
argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sh... It's supposed to read shape().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
|
||
template<typename F> Tensor binary_scalar_operation(U scalar, F f) const | ||
{ | ||
if (this->shape() != src.shape()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is src variable in this binary_scalar_operation
method?
Unlike binary_element_wise_operation method
, this doesn't have src argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
// edgarriba wrote: @edgarriba What about passing And I also wonder if all the values in Tensor class should be targeted for processing in any circumstances. |
@beru maybe it's a stupid question but what's the difference between declaring a Tensor before the call or inside? Correct me if I'm wrong but with this simple ops I think that we can assume that all values should be targeted since are element-wise operations. At this first stage I don't think we have to handle the span concept, but indeed it's necessary! |
// zero-overhead version (same performance to raw pointer access. | ||
// have an assertion for out-of-range error) | ||
U& operator[] (const size_t index) { | ||
return mutable_host_data()[index]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a comment zero-overhead
, but is it really so?
Inside mutable_host_data
, I see if
statement so I thought the performance of this method is slower than raw pointer access.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pansk in my original PR I solved with this assert
https://github.com/tiny-dnn/tiny-dnn/pull/400/files#diff-54658142cc96675fb83229795a17df30R467
and tested only in debug mode
https://github.com/tiny-dnn/tiny-dnn/pull/400/files#diff-38b61933c5832ce1f0110501b58856f4R144
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, it's no longer zero overhead - my fault, I didn't remove the comment._ I left the functionality here, but I would rather get rid of this accessor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the operators in next commit.
@edgarriba If the methods declare Tensor instance inside, calling the methods mean memory allocation and deallocation happen frequently. If the size of data is small or operations are not frequently repeated, it may not be a big issue but there are situations when the data is like >= 300MB and the methods are called inside a looong loop. Probablly you've already understood this but let me write this, using STL vector container is almost like using new/delete (malloc/free in C). It dynamically allocates heap memory. Severely repeating heap memory allocation/deallocation may cause heap fragmentation issue if heap manager is not implemented well. Low-spec embedding systems tend to not have decent hardware/operating systems/libraries, so overly using dynamic heap memory can be very troublesome or should be avoided. Besides, memory allocation/deallocation/copy almost always slows down your program. They are very handy but it comes with performance penalty. |
@beru thanks for this excellent explanation!
Tensor add(const Tensor& src) {
const U* dst = mutable_host_data();
const U* src1 = host_data();
const U* src2 = src.host_data();
for_i(true, res.size(), [dst, src1, src2, &f](size_t i) {
dst[i] = f(src1[i], src2[i]);
});
return *this
} But what if we want to overload Tensor<float_t> t1(2,2,2,2);
Tensor<float_t> t2(2,2,2,2);
Tensor<float_t> t3 = t1 + t2; // same as t3 = t1.add(t2)
void add(const Tensor& src1, const Tensor& src2, Tensor& dst) {
// Do we need to force users to resize Tensor dst before the call?
} So, before going further to this discussion: what features do we want to support for |
exactly, but again, I don't know if there's any use case for this feature right know in the pipeline and the same for this basic operations. |
See how reshape is introduced in a basic pipeline like MNIST https://www.tensorflow.org/versions/r0.8/tutorials/mnist/pros/index.html |
sure, but I see it more as a pre-preprocessing operation rather that something really needed during the network computation, no? UPDATE Could span solve this? |
I think that Caffe2 and Keras uses of reshape could give a fair complete use cases overview. For span have you seen 1.2 section of https://github.com/kokkos/array_ref/blob/master/proposals/P0009.rst? |
caffe2 uses reshape to resize the vector containing the tensor shape (note: I assume they support N dimensional tensors) but then I'm not sure what will happen to a tensor that already contains data and you trigger reshape method. |
@beru I agree with you with your considerations about add, sub, ... methods. They shouldn't be in the class, I just left them where they were, but (as discussed with @edgarriba) they would better be outside the class, and probably have a completely different design. |
@edgarriba, @bhack I'm wouldn't implement reshape in a first implementation, I think this decision is quite related to how you implement its connection to the graph structure. |
@beru instead of iterators, I'd rather like using slicing (a la valarray). Actually, when discussing with @edgarriba I also suggested we should have a look at the possibility of replacing the vector with a (possibly gsliced) valarray. |
@edgarriba: for implementing the syntax |
I didn't know about std::valaray, And after reading this stackoverflow page, I think we should stick with std::vector or raw memory. Talking aboute implementing Iterator, I was just kidding. But it seems like current Tensor class does not provide end() like position of inner data. So it can be a bit troublesome.
|
Great works :) |
Today I'll try to rebase the PR. |
Aparently my rebase is a mess, but it should be rebased now. |
Tests are failing |
- moved checks to toDevice/fromDevice - moved operations to non-member functions (layer_*), except fill - fixed tests
Hope it's ok now... |
yes! |
Also tests passed |
LGTM :) |
Well, as usual, I made a mess with GIT, and apparently I can't easily push to @edgarriba's original PR.
Today I reworked a bit @edgarriba's #400, I changed a bit the interface, added the lazy-allocation and lazy-movement of memory around, renamed accessor to host_ptr and host_at (to clarify they work on host memory), and implemented the generic functions for binary and unary host operations, element-wise and scalar, so it's now easier to implement functions such as add, sub, mul, div, exp, sqrt, ...
I commented out linspace, if someone has a strong opinion about that, we can still get it back in.