Minor refactoring of specialized kernel size logic for convolution/correlation kernels #5

Merged
merged 1 commit into from Apr 14, 2014

Conversation

Projects
None yet
3 participants
@ajtulloch
Contributor

ajtulloch commented Mar 31, 2014

There is a significant amount of code duplicating the logic of

If kernel rows and columns are equal and in [3,4,...,13]:
  instantiate the kernel with the given dimensions
else
  instantiate the kernel with 0.

Thus, we can create a macro which encapsulates this logic. I thought this refactoring slightly improves the readability of the code for the heavy duplication. It also allows us to reduce the file size by ~30%.

@soumith

This comment has been minimized.

Show comment
Hide comment
@soumith

soumith Mar 31, 2014

Member

i like this. so much cleaner.

Member

soumith commented Mar 31, 2014

i like this. so much cleaner.

@clementfarabet

This comment has been minimized.

Show comment
Hide comment
@clementfarabet

clementfarabet Mar 31, 2014

Member

yeah definitely an improvement :-)

Is the code tested? Can I just go ahead and merge?

Member

clementfarabet commented Mar 31, 2014

yeah definitely an improvement :-)

Is the code tested? Can I just go ahead and merge?

@ajtulloch

This comment has been minimized.

Show comment
Hide comment
@ajtulloch

ajtulloch Mar 31, 2014

Contributor

It compiles, but I'm not sure on how to test it - any advice? My only CUDA-enabled hardware is my laptop's NVIDIA GT 650M.

Contributor

ajtulloch commented Mar 31, 2014

It compiles, but I'm not sure on how to test it - any advice? My only CUDA-enabled hardware is my laptop's NVIDIA GT 650M.

@soumith

This comment has been minimized.

Show comment
Hide comment
Member

soumith commented Mar 31, 2014

You could run the tests here: https://github.com/torch/cunn/tree/master/test

@ajtulloch

This comment has been minimized.

Show comment
Hide comment
@ajtulloch

ajtulloch Mar 31, 2014

Contributor

I'm having issues getting the tests running - it appears to be the interaction with the NVIDIA CUDA libraries and OS X Mavericks. I'll try again tomorrow, but I'll also try spinning up a Linux GPU box.

If @soumith or @clementfarabet have the time and inclination, it may be quicker for you to patch this and see if the tests pass on your setup. It's a mechanical change that makes it hard to introduce bugs that don't manifest at compile time, but it's definitely worth checking.

Contributor

ajtulloch commented Mar 31, 2014

I'm having issues getting the tests running - it appears to be the interaction with the NVIDIA CUDA libraries and OS X Mavericks. I'll try again tomorrow, but I'll also try spinning up a Linux GPU box.

If @soumith or @clementfarabet have the time and inclination, it may be quicker for you to patch this and see if the tests pass on your setup. It's a mechanical change that makes it hard to introduce bugs that don't manifest at compile time, but it's definitely worth checking.

@ajtulloch

This comment has been minimized.

Show comment
Hide comment
@ajtulloch

ajtulloch Apr 4, 2014

Contributor

Tests passed:

tulloch at ajt92c in ~/Code/torch/cunn (master)
∴ th test/test.lua 
Running 38 tests
______________________________________  ==> Done Completed 56 asserts in 38 tests with 0 errors
--------------------------------------------------------------------------------

SpatialConvolution.backward 26x114x34 o 13x8 -> 33x102x27:       average speedup is 0.14095951177748
SpatialMaxPoolingCUDA.backward 32x48x120x120 o 4x4 -> 32x48x30x30:       average speedup is 0.41406701457109
SpatialMaxPooling.backward 41x188x244 o 4x4 -> 41x47x61:         average speedup is 0.80568943223228
SoftMax.backward 24 -> 24:       average speedup is 0.1255230125523
LogSoftMax.backward 21 -> 21:    average speedup is 0.10204081632653
SpatialSubSampling.backward 52x248x208 o 4x4 -> 52x62x52:        average speedup is 0.34685705906902
SpatialConvolution.backward 8x1x28x34 o 3x8 -> 8x13x26x27:       average speedup is 0.13519833708644
SpatialLPPooling.backward (P=2 only) 31x99x136 o 3x4 -> 31x33x34:        average speedup is 1.3775691970403
SoftMax forward 78 -> 78:        average speedup is 0.026968716289105
Max.backward 995x8:      average speedup is 0.34328358208955
SpatialMaxPooling.backward 8x17x64x180 o 2x3 -> 8x17x32x60:      average speedup is 1.1033724340176
Sqrt.backward 31 -> 31:          average speedup is 0.38636363636364
Square forward 92 -> 92:         average speedup is 0.22826086956522
MSECriterion2 4251 :     average speedup is 0.43875278396437
SpatialSubSampling.forward 5x43x98x350 o 2x4 -> 5x43x33x174:     average speedup is 1.7876310272537
SpatialSubSampling.backward 9x5x156x168 o 4x3 -> 9x5x39x56:      average speedup is 1.0902288412641
SpatialLPPooling.forward (P=2 only) 5x808x582 o 4x3 -> 5x202x194:        average speedup is 5.9070947809776
Sqrt forward 40 -> 40:   average speedup is 0.079497907949791
Threshold.backward 73 -> 73:     average speedup is 0.28099173553719
Square.backward 6 -> 6:          average speedup is 0.1984126984127
SpatialConvolution.forward 19x223x388 o 12x4 -> 18x212x193 [s: 1x2]:     average speedup is 0.61608696142845
MSECriterion 4251 :      average speedup is 0.3218954248366
SpatialMaxPooling.forward 27x279x261 o 3x3 -> 27x93x87:          average speedup is 3.8389252203583
SpatialConvolutionCUDA.forward 32x16x36x36 o 6x6 -> 32x32x16x16 [s: 2x2]:        average speedup is 31.217450039596
Max forward 50x96:       average speedup is 0.046189376443418
Sigmoid.backward 17 -> 17:       average speedup is 0.22826086956522
Threshold forward 50 -> 50:      average speedup is 0.36708860759494
SpatialMaxPoolingCUDA.forward 32x32x60x60 o 2x2 -> 32x32x30x30:          average speedup is 13.250921861282
SpatialMaxPooling.forward 6x47x108x916 o 2x4 -> 6x47x54x229:     average speedup is 4.5511924774112
SpatialSubSampling.forward 38x712x402 o 4x2 -> 38x237x101:       average speedup is 1.9903135610146
Abs forward 41 -> 41:    average speedup is 0.026315789473684
SpatialConvolution.backward 32x4x12x12 o 8x8 -> 32x32x5x5:       average speedup is 5.3264571629213
Abs.backward 51 -> 51:   average speedup is 0.32954545454545
LogSoftMax forward 68 -> 68:     average speedup is 0.018338727076591
Tanh.backward 91 -> 91:          average speedup is 0.27173913043478
Sigmoid forward 25 -> 25:        average speedup is 0.16190476190476
Tanh forward 70 -> 70:   average speedup is 0.050086355785838
SpatialConvolution.forward 7x25x20x21 o 8x4 -> 7x52x13x18 [s: 1x1]:      average speedup is 0.1608589350089
Contributor

ajtulloch commented Apr 4, 2014

Tests passed:

tulloch at ajt92c in ~/Code/torch/cunn (master)
∴ th test/test.lua 
Running 38 tests
______________________________________  ==> Done Completed 56 asserts in 38 tests with 0 errors
--------------------------------------------------------------------------------

SpatialConvolution.backward 26x114x34 o 13x8 -> 33x102x27:       average speedup is 0.14095951177748
SpatialMaxPoolingCUDA.backward 32x48x120x120 o 4x4 -> 32x48x30x30:       average speedup is 0.41406701457109
SpatialMaxPooling.backward 41x188x244 o 4x4 -> 41x47x61:         average speedup is 0.80568943223228
SoftMax.backward 24 -> 24:       average speedup is 0.1255230125523
LogSoftMax.backward 21 -> 21:    average speedup is 0.10204081632653
SpatialSubSampling.backward 52x248x208 o 4x4 -> 52x62x52:        average speedup is 0.34685705906902
SpatialConvolution.backward 8x1x28x34 o 3x8 -> 8x13x26x27:       average speedup is 0.13519833708644
SpatialLPPooling.backward (P=2 only) 31x99x136 o 3x4 -> 31x33x34:        average speedup is 1.3775691970403
SoftMax forward 78 -> 78:        average speedup is 0.026968716289105
Max.backward 995x8:      average speedup is 0.34328358208955
SpatialMaxPooling.backward 8x17x64x180 o 2x3 -> 8x17x32x60:      average speedup is 1.1033724340176
Sqrt.backward 31 -> 31:          average speedup is 0.38636363636364
Square forward 92 -> 92:         average speedup is 0.22826086956522
MSECriterion2 4251 :     average speedup is 0.43875278396437
SpatialSubSampling.forward 5x43x98x350 o 2x4 -> 5x43x33x174:     average speedup is 1.7876310272537
SpatialSubSampling.backward 9x5x156x168 o 4x3 -> 9x5x39x56:      average speedup is 1.0902288412641
SpatialLPPooling.forward (P=2 only) 5x808x582 o 4x3 -> 5x202x194:        average speedup is 5.9070947809776
Sqrt forward 40 -> 40:   average speedup is 0.079497907949791
Threshold.backward 73 -> 73:     average speedup is 0.28099173553719
Square.backward 6 -> 6:          average speedup is 0.1984126984127
SpatialConvolution.forward 19x223x388 o 12x4 -> 18x212x193 [s: 1x2]:     average speedup is 0.61608696142845
MSECriterion 4251 :      average speedup is 0.3218954248366
SpatialMaxPooling.forward 27x279x261 o 3x3 -> 27x93x87:          average speedup is 3.8389252203583
SpatialConvolutionCUDA.forward 32x16x36x36 o 6x6 -> 32x32x16x16 [s: 2x2]:        average speedup is 31.217450039596
Max forward 50x96:       average speedup is 0.046189376443418
Sigmoid.backward 17 -> 17:       average speedup is 0.22826086956522
Threshold forward 50 -> 50:      average speedup is 0.36708860759494
SpatialMaxPoolingCUDA.forward 32x32x60x60 o 2x2 -> 32x32x30x30:          average speedup is 13.250921861282
SpatialMaxPooling.forward 6x47x108x916 o 2x4 -> 6x47x54x229:     average speedup is 4.5511924774112
SpatialSubSampling.forward 38x712x402 o 4x2 -> 38x237x101:       average speedup is 1.9903135610146
Abs forward 41 -> 41:    average speedup is 0.026315789473684
SpatialConvolution.backward 32x4x12x12 o 8x8 -> 32x32x5x5:       average speedup is 5.3264571629213
Abs.backward 51 -> 51:   average speedup is 0.32954545454545
LogSoftMax forward 68 -> 68:     average speedup is 0.018338727076591
Tanh.backward 91 -> 91:          average speedup is 0.27173913043478
Sigmoid forward 25 -> 25:        average speedup is 0.16190476190476
Tanh forward 70 -> 70:   average speedup is 0.050086355785838
SpatialConvolution.forward 7x25x20x21 o 8x4 -> 7x52x13x18 [s: 1x1]:      average speedup is 0.1608589350089
@soumith

This comment has been minimized.

Show comment
Hide comment
@soumith

soumith Apr 4, 2014

Member

awesome! thanks.

Member

soumith commented Apr 4, 2014

awesome! thanks.

@ajtulloch

This comment has been minimized.

Show comment
Hide comment
@ajtulloch

ajtulloch Apr 7, 2014

Contributor

Did you want any further testing, etc?

Contributor

ajtulloch commented Apr 7, 2014

Did you want any further testing, etc?

@soumith

This comment has been minimized.

Show comment
Hide comment
@soumith

soumith Apr 14, 2014

Member

@andresy @clementfarabet are we waiting on this?

Member

soumith commented Apr 14, 2014

@andresy @clementfarabet are we waiting on this?

clementfarabet added a commit that referenced this pull request Apr 14, 2014

Merge pull request #5 from ajtulloch/cleanup-kernel-logic-slightly
Minor refactoring of specialized kernel size logic for convolution/correlation kernels

@clementfarabet clementfarabet merged commit f88bbe6 into torch:master Apr 14, 2014

@clementfarabet

This comment has been minimized.

Show comment
Hide comment
@clementfarabet

clementfarabet Apr 14, 2014

Member

Ok just merged it. I'm going to start using it now.

Member

clementfarabet commented Apr 14, 2014

Ok just merged it. I'm going to start using it now.

@ajtulloch

This comment has been minimized.

Show comment
Hide comment
@ajtulloch

ajtulloch Apr 14, 2014

Contributor

Thanks!

Contributor

ajtulloch commented Apr 14, 2014

Thanks!

@ajtulloch ajtulloch deleted the ajtulloch:cleanup-kernel-logic-slightly branch Apr 14, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment