Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensor (reprise) #411

Merged
merged 43 commits into from Nov 22, 2016
Merged

Tensor (reprise) #411

merged 43 commits into from Nov 22, 2016

Conversation

pansk
Copy link
Contributor

@pansk pansk commented Nov 17, 2016

Well, as usual, I made a mess with GIT, and apparently I can't easily push to @edgarriba's original PR.

Today I reworked a bit @edgarriba's #400, I changed a bit the interface, added the lazy-allocation and lazy-movement of memory around, renamed accessor to host_ptr and host_at (to clarify they work on host memory), and implemented the generic functions for binary and unary host operations, element-wise and scalar, so it's now easier to implement functions such as add, sub, mul, div, exp, sqrt, ...

I commented out linspace, if someone has a strong opinion about that, we can still get it back in.

@beru
Copy link
Contributor

beru commented Nov 17, 2016

@pansk I have some questions.

  • Talking about add, sub, mul, div, sqrt, exp methods, do they need to be implemented as Tensor class's methods? Is it on purpose so that they are chainable?
  • In binary_scalar_operation, unary_element_wise_operation, binary_element_wise_operation methods implementation, Tensor instance is created and returned but that means just calling the methods would result in abusing heap manager (inevitable memory allocation happens everytime you call the methods) and that's not suitable for embedding computing. Those methods may be convenient but not very efficent from my view.
  • What about Tensor Iterator class? Personally speaking, I don't utilize iterators and algorithms very well but if Tensor Iterator will not be implemented, I fear someday hardcore C++ programmers may visit your home with pitchforks and torches in their hands.

@edgarriba
Copy link
Member

@beru what do you propose to avoid inevitable memory allocation when calling methods?


template<typename F> Tensor unary_element_wise_operation(F f) const
{
if (this->shape() != src.shape()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is src variable in this unary_element_wise_operation method?
Unlike binary_element_wise_operation method, this doesn't have src argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sh... It's supposed to read shape().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


template<typename F> Tensor binary_scalar_operation(U scalar, F f) const
{
if (this->shape() != src.shape()) {
Copy link
Contributor

@beru beru Nov 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is src variable in this binary_scalar_operation method?
Unlike binary_element_wise_operation method, this doesn't have src argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

@beru
Copy link
Contributor

beru commented Nov 17, 2016

// edgarriba wrote:
// @beru what do you propose to avoid inevitable memory allocation when calling methods?

@edgarriba What about passing res Tensor class instance variable by an arguement?
So that the caller of the methods can decide when and how to ready the destination data and itermediate allocation will not necessary occur if carefully coded.

And I also wonder if all the values in Tensor class should be targeted for processing in any circumstances.
What if tiny-dnn library code wants to partially edit elements (such as only 1 batch data) reside inside Tensor class instance? I suppose current Tensor class public interface does not readily provide those functionalities.

@edgarriba
Copy link
Member

@beru maybe it's a stupid question but what's the difference between declaring a Tensor before the call or inside? Correct me if I'm wrong but with this simple ops I think that we can assume that all values should be targeted since are element-wise operations. At this first stage I don't think we have to handle the span concept, but indeed it's necessary!

// zero-overhead version (same performance to raw pointer access.
// have an assertion for out-of-range error)
U& operator[] (const size_t index) {
return mutable_host_data()[index];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a comment zero-overhead, but is it really so?
Inside mutable_host_data, I see if statement so I thought the performance of this method is slower than raw pointer access.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, it's no longer zero overhead - my fault, I didn't remove the comment._ I left the functionality here, but I would rather get rid of this accessor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the operators in next commit.

@beru
Copy link
Contributor

beru commented Nov 17, 2016

edgarriba wrote:
what's the difference between declaring a Tensor before the call or inside?

@edgarriba
By readying Tensor instance outside of the methods, caller have opportunity to save number of memory allocations/deallocations.

If the methods declare Tensor instance inside, calling the methods mean memory allocation and deallocation happen frequently. If the size of data is small or operations are not frequently repeated, it may not be a big issue but there are situations when the data is like >= 300MB and the methods are called inside a looong loop.

Probablly you've already understood this but let me write this, using STL vector container is almost like using new/delete (malloc/free in C). It dynamically allocates heap memory.

Severely repeating heap memory allocation/deallocation may cause heap fragmentation issue if heap manager is not implemented well. Low-spec embedding systems tend to not have decent hardware/operating systems/libraries, so overly using dynamic heap memory can be very troublesome or should be avoided.

Besides, memory allocation/deallocation/copy almost always slows down your program. They are very handy but it comes with performance penalty.

@edgarriba
Copy link
Member

edgarriba commented Nov 18, 2016

@beru thanks for this excellent explanation!
Some possibilities that come to my mind are the following:

  • Keep same signature and modify current Tensor state (removed const)
Tensor add(const Tensor& src) {
     const U* dst  = mutable_host_data();
     const U* src1 = host_data();
     const U* src2 = src.host_data();

     for_i(true, res.size(), [dst, src1, src2, &f](size_t i) {
         dst[i] = f(src1[i], src2[i]);
     });
     return *this
}

But what if we want to overload operator +() to behave like this?
Tensor t2 will be modified and cannot be reused for other operations.

Tensor<float_t> t1(2,2,2,2);
Tensor<float_t> t2(2,2,2,2);

Tensor<float_t> t3 = t1 + t2; // same as t3 = t1.add(t2)
  • Get rid off add() from Tensor structure and have it as standalone methods
void add(const Tensor& src1, const Tensor& src2, Tensor& dst) {
     // Do we need to force users to resize Tensor dst before the call?
}

So, before going further to this discussion: what features do we want to support for Tensor structure?

@edgarriba
Copy link
Member

@pansk @beru @bhack @nyanp Another feature that most frameworks have is Tensor reshape. Does it make sense to have tensors with dynamic size/shape in tiny-dnn?

@bhack
Copy link
Contributor

bhack commented Nov 18, 2016

@edgarriba
Copy link
Member

exactly, but again, I don't know if there's any use case for this feature right know in the pipeline and the same for this basic operations.

@bhack
Copy link
Contributor

bhack commented Nov 18, 2016

See how reshape is introduced in a basic pipeline like MNIST https://www.tensorflow.org/versions/r0.8/tutorials/mnist/pros/index.html

@edgarriba
Copy link
Member

edgarriba commented Nov 18, 2016

sure, but I see it more as a pre-preprocessing operation rather that something really needed during the network computation, no?

UPDATE

Could span solve this?

@bhack
Copy link
Contributor

bhack commented Nov 18, 2016

I think that Caffe2 and Keras uses of reshape could give a fair complete use cases overview.

For span have you seen 1.2 section of https://github.com/kokkos/array_ref/blob/master/proposals/P0009.rst?

@edgarriba
Copy link
Member

caffe2 uses reshape to resize the vector containing the tensor shape (note: I assume they support N dimensional tensors) but then I'm not sure what will happen to a tensor that already contains data and you trigger reshape method.

@bhack
Copy link
Contributor

bhack commented Nov 18, 2016

@pansk
Copy link
Contributor Author

pansk commented Nov 19, 2016

@beru I agree with you with your considerations about add, sub, ... methods. They shouldn't be in the class, I just left them where they were, but (as discussed with @edgarriba) they would better be outside the class, and probably have a completely different design.

@pansk
Copy link
Contributor Author

pansk commented Nov 19, 2016

@edgarriba, @bhack I'm wouldn't implement reshape in a first implementation, I think this decision is quite related to how you implement its connection to the graph structure.

@pansk
Copy link
Contributor Author

pansk commented Nov 19, 2016

@beru instead of iterators, I'd rather like using slicing (a la valarray). Actually, when discussing with @edgarriba I also suggested we should have a look at the possibility of replacing the vector with a (possibly gsliced) valarray.

@pansk
Copy link
Contributor Author

pansk commented Nov 19, 2016

@edgarriba: for implementing the syntax Tensor<float_t> t3 = t1 + t2; the standard workaround is to build the expression template returning dummy templated objects, and then perform the computation in an implicit converting assignment or converting construcor.
So, t1+t2 returns e.g. a tensor_sum<decltype(t1), decltype(t2)>, and the computation is triggered by constructing a Tensor taking with a tensor_sum.

@beru
Copy link
Contributor

beru commented Nov 19, 2016

@beru instead of iterators, I'd rather like using slicing (a la valarray). Actually, when discussing with @edgarriba I also suggested we should have a look at the possibility of replacing the vector with a (possibly gsliced) valarray.

I didn't know about std::valaray, And after reading this stackoverflow page, I think we should stick with std::vector or raw memory.

Talking aboute implementing Iterator, I was just kidding. But it seems like current Tensor class does not provide end() like position of inner data. So it can be a bit troublesome.

const U* end() const {
    return mutable_host_data() + shape_[0] * shape_[1] * shape_[2] * shape_[3];
}
U* end() {
    return mutable_host_data() + shape_[0] * shape_[1] * shape_[2] * shape_[3];
}

@nyanp
Copy link
Member

nyanp commented Nov 19, 2016

Great works :)
@pansk Could you rebase from master?

@pansk
Copy link
Contributor Author

pansk commented Nov 21, 2016

Today I'll try to rebase the PR.

@pansk
Copy link
Contributor Author

pansk commented Nov 22, 2016

Aparently my rebase is a mess, but it should be rebased now.

@bhack
Copy link
Contributor

bhack commented Nov 22, 2016

Tests are failing

@pansk
Copy link
Contributor Author

pansk commented Nov 22, 2016

Hope it's ok now...

@edgarriba
Copy link
Member

yes!

@bhack
Copy link
Contributor

bhack commented Nov 22, 2016

Also tests passed

@nyanp
Copy link
Member

nyanp commented Nov 22, 2016

LGTM :)

@nyanp nyanp merged commit 224843e into tiny-dnn:master Nov 22, 2016
@pansk pansk deleted the tensor branch November 22, 2016 12:54
@edgarriba
Copy link
Member

@nyanp @pansk the file history was removed 😱

@pansk pansk restored the tensor branch November 22, 2016 13:13
This was referenced Nov 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants