Reduce memory usage and increase performance for convolution on iOS #3778

petewarden · 2016-08-12T19:54:32Z

We've had lots of problems with large convolutions hitting memory limits on iOS. This new implementation of the operator breaks the work into chunks so we never use more than 16 MB, and uses Apple's Accelerate framework to optimize the matrix multiplication.

Testing shows that it's between 5% to 10% faster than the existing implementation on various models, and keeps memory usage to a minimum.

…ke all the implementations functors

bhack · 2016-08-14T21:03:47Z

For quantized models will be used gemmlowp also on IOS? /cc @wangyida

petewarden · 2016-08-14T23:40:39Z

@bhack gemmlowp can be used on iOS, though we haven't investigated optimizing it for those devices in particular so I expect we'll need to do more work there. This is primarily a fix for memory issues when running float models.

petewarden · 2016-08-14T23:40:52Z

Jenkins, test this please.

bhack · 2016-08-15T00:35:21Z

Could be interesting to benchmark this against BNNS.

wangyida · 2016-08-15T15:43:45Z

@bhack I can apply gemm on FC layer in tiny-dnn of IOS platform now, some memory issue seems related to the batch size and network structure rather than the parametric model itself.

cwhipkey · 2016-08-16T15:05:59Z

tensorflow/core/kernels/conv_ops_using_gemm.cc

+
+// This file contains a set of different implementations of the two-dimensional
+// convolution operation. The standard TensorFlow Conv2d kernel uses EigenTensor
+// to implement the computation, but here there are a variety of different ways


change "here there" to "there"

I've updated the line to read "this module has a variety...". Is that clearer?

andydavis1 · 2016-08-18T22:19:22Z

tensorflow/core/kernels/conv_ops_using_gemm.cc

+    // buffer for the next chunk and reuse it, keeping maximum memory size down.
+    // In this case, we've picked 16 megabytes as a reasonable limit.
+    const size_t max_chunk_size = (16 * 1024 * 1024);
+    OP_REQUIRES(context, (filter_value_count * sizeof(T1)) <= max_chunk_size,


Could pull filter_value_count * sizeof(T1) out into a constant and re-use it below.

andydavis1 · 2016-08-19T20:14:44Z

Looks good to me (my latest round of comments were minor)...

rmlarsen · 2016-08-22T17:27:11Z

tensorflow/core/kernels/conv_ops_using_gemm.cc

+// the Im2ColConvFunctor template definition inside the op registration to
+// enable. Assumes row-major ordering of the values in memory.
+template <class T1, class T2, class T3>
+class ReferenceGemmFunctor {


Why do we need to include this? The problem with including slow reference implementations, is that they end up being used and are hard to get rid of.

We discussed this offline, but to summarize it's useful for bootstrapping porting to new platforms, though I agree it's a little awkward here.

petewarden · 2016-08-23T01:49:33Z

Once the tests have passed, could the admins merge this since we have LGTMs?

petewarden added 2 commits August 11, 2016 17:28

Reduce memory usage and increase performance for convolution on iOS

813cf53

Switched to persistent memory allocation, added documentation, and ma…

1de99ca

…ke all the implementations functors

googlebot added the cla: yes label Aug 12, 2016

vrv assigned cwhipkey Aug 15, 2016

cwhipkey reviewed Aug 16, 2016
View reviewed changes

Updated conv kernel based on review comments

5299807

andydavis1 reviewed Aug 18, 2016
View reviewed changes

rmlarsen reviewed Aug 22, 2016
View reviewed changes

Merged conv changes

38c644f

rmlarsen merged commit 459c2fe into tensorflow:master Aug 23, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce memory usage and increase performance for convolution on iOS #3778

Reduce memory usage and increase performance for convolution on iOS #3778

petewarden commented Aug 12, 2016

bhack commented Aug 14, 2016

petewarden commented Aug 14, 2016

petewarden commented Aug 14, 2016

bhack commented Aug 15, 2016

wangyida commented Aug 15, 2016

cwhipkey Aug 16, 2016

petewarden Aug 17, 2016

andydavis1 Aug 18, 2016

andydavis1 commented Aug 19, 2016

rmlarsen Aug 22, 2016

petewarden Aug 23, 2016

petewarden commented Aug 23, 2016

Reduce memory usage and increase performance for convolution on iOS #3778

Reduce memory usage and increase performance for convolution on iOS #3778

Conversation

petewarden commented Aug 12, 2016

bhack commented Aug 14, 2016

petewarden commented Aug 14, 2016

petewarden commented Aug 14, 2016

bhack commented Aug 15, 2016

wangyida commented Aug 15, 2016

cwhipkey Aug 16, 2016

Choose a reason for hiding this comment

petewarden Aug 17, 2016

Choose a reason for hiding this comment

andydavis1 Aug 18, 2016

Choose a reason for hiding this comment

andydavis1 commented Aug 19, 2016

rmlarsen Aug 22, 2016

Choose a reason for hiding this comment

petewarden Aug 23, 2016

Choose a reason for hiding this comment

petewarden commented Aug 23, 2016