Why the THNN is so slow?! #1048

amazingyyc · 2016-11-24T14:04:20Z

I use the prisma and I found the libthnn.so in the prisma Android's app.
So I test the speed of the prisma's lib and the origin THNN.
I find that the the origin is very very slow.
Like, the origin THNN cost 5s but the prisma's lib just cost 50ms!!!
I find that the prisma's lib use the OpenBlas but the openBlas can't accelerate the speed so much!
Anyone can explain it?

soumith · 2016-11-24T14:36:20Z

are you talking about on-device performance? i.e. for ARM / Android?

soumith · 2016-11-24T14:36:53Z

its likely that they have implemented some custom optimizations that have not been pushed back upstream.

fmassa · 2016-11-24T15:44:57Z

What @soumith mentioned is probably the main reason for the difference in performance.
We also probably lost some (maybe not much?) run-time performance with torch/torch7#839, but it was giving too many compilation problems on some architectures so it was better for maintenance.

amazingyyc · 2016-11-25T01:55:41Z

Yes I talk about the Android platform. The prisma app(using the libTHNN.so) is so fast That I can't image!!

austingg · 2016-11-25T07:49:23Z

@amazingyyc Is your test two thnn with the same model , primsa may simplify model a lot ?
@soumith I found thnn is much slower than cunn and cudnn, about 25 times slower. I don't know whether there is something wrong my call for thnn(use openblas for gemm).

amazingyyc · 2016-11-25T07:57:31Z

@austingg I do not test for any model. I just test a convolutional operate(the first convolution in Googlenet). I build a new demo with the lib that include in the prisma's Android app and the origin thnn. Actually, the origin THNN cost about 2s(in release mode), 5s(in debug mode).

austingg · 2016-11-25T08:00:40Z

@amazingyyc Is that means you test primsa style model or only the conv1 cost 2s ?

amazingyyc · 2016-11-25T08:00:50Z

I can,t find the reason, but I gauss the prisma use the openblas to accelerate the speed and use the uint8 matrix multiplication instead of float32. Just gauss...

amazingyyc · 2016-11-25T08:02:33Z

@austingg
only test conv1.
prisma's lib cost about 50ms
origin thnn cost 2s (int release APK)

amazingyyc · 2016-11-25T08:11:21Z

@austingg cunn is implemented on GPU, Of course the cunn is much faster than thnn(on cpu only)

austingg · 2016-11-25T08:11:37Z

@amazingyyc that's possible. however thnn also used openblas, and openblas has no int8 gemm

austingg · 2016-11-25T08:13:05Z

@amazingyyc I know. just slow too much. In other framework, cpu is about 10x slower than gpu

austingg · 2016-12-05T03:47:42Z

It's my mistake. I compared to cudnn not cuda, and when compared to cudnn, it is normal that torch nn is 25x time slower . sorry for misleading.

amazingyyc · 2016-12-06T03:55:46Z

I test the prisma's lib and the gemmlowp (ref: https://github.com/google/gemmlowp) on the same scale convolution2d. the prisma do the convolution2d and use the gemmlowp do the matrix multiply on the same scale.
I found that the cost time of both on the same order of magnitude.
So I think the prisma use the uint8 matrix multiply instead of float32 (ref:http://ip.cadence.com/uploads/presentations/1100AM_TensorFlow_on_Embedded_Devices_PeteWarden.pdf).
And using the gemmlowp will accelerate the THNN too.

austingg · 2016-12-07T01:38:50Z

@amazingyyc does it need net surgery to use gemmlowp ?

amazingyyc · 2016-12-07T01:50:47Z

@austingg
Yes u have to instead of the THNN's float-matrix-multiply by the gemmlowp(convert float to uint8 by yourself and uint8-matrix-multiply by gemmlowp)
covert method ref:https://petewarden.com/2016/05/03/how-to-quantize-neural-networks-with-tensorflow/
the gemmlowp only include the head file check this:https://github.com/google/gemmlowp, so it can be used easily.

amazingyyc · 2016-12-07T01:58:14Z

@austingg
another attention, the gemmlowp will not open multi-thread in small matrix, but it will use multi-thread in big matrix automatically. SO, if u want to get the most fastest speed, U have to change the code of gemmlowp.

austingg · 2016-12-07T07:33:58Z

@amazingyyc thank u so much

austingg · 2016-12-12T10:22:32Z

@amazingyyc have you ever seen other benchmark about 8bit-gemm on mobile devices?

amazingyyc · 2016-12-12T10:52:47Z

@austingg sorry i do not know other.

austingg · 2016-12-13T05:01:08Z

@amazingyyc I have done some research, in tf issue, many people complaint when they used quantized(8bit)

soumith closed this as completed Nov 26, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why the THNN is so slow?! #1048

Why the THNN is so slow?! #1048

amazingyyc commented Nov 24, 2016

soumith commented Nov 24, 2016

soumith commented Nov 24, 2016

fmassa commented Nov 24, 2016

amazingyyc commented Nov 25, 2016

austingg commented Nov 25, 2016

amazingyyc commented Nov 25, 2016

austingg commented Nov 25, 2016

amazingyyc commented Nov 25, 2016

amazingyyc commented Nov 25, 2016

amazingyyc commented Nov 25, 2016

austingg commented Nov 25, 2016

austingg commented Nov 25, 2016

austingg commented Dec 5, 2016

amazingyyc commented Dec 6, 2016

austingg commented Dec 7, 2016

amazingyyc commented Dec 7, 2016

amazingyyc commented Dec 7, 2016

austingg commented Dec 7, 2016

austingg commented Dec 12, 2016

amazingyyc commented Dec 12, 2016

austingg commented Dec 13, 2016

Why the THNN is so slow?! #1048

Why the THNN is so slow?! #1048

Comments

amazingyyc commented Nov 24, 2016

soumith commented Nov 24, 2016

soumith commented Nov 24, 2016

fmassa commented Nov 24, 2016

amazingyyc commented Nov 25, 2016

austingg commented Nov 25, 2016

amazingyyc commented Nov 25, 2016

austingg commented Nov 25, 2016

amazingyyc commented Nov 25, 2016

amazingyyc commented Nov 25, 2016

amazingyyc commented Nov 25, 2016

austingg commented Nov 25, 2016

austingg commented Nov 25, 2016

austingg commented Dec 5, 2016

amazingyyc commented Dec 6, 2016

austingg commented Dec 7, 2016

amazingyyc commented Dec 7, 2016

amazingyyc commented Dec 7, 2016

austingg commented Dec 7, 2016

austingg commented Dec 12, 2016

amazingyyc commented Dec 12, 2016

austingg commented Dec 13, 2016