Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot Compress Model #7

Closed
kai-xie opened this issue Jun 2, 2017 · 2 comments
Closed

Cannot Compress Model #7

kai-xie opened this issue Jun 2, 2017 · 2 comments

Comments

@kai-xie
Copy link

kai-xie commented Jun 2, 2017

DNS is a good method and thank you for sharing your code!

My question is:
The compilation and installation was successful and it did not cost much effort ( Only your code was used and the original Caffe code was not added, so I assume your code could be used as a standalone package ).
But when I tried to compress the Lenet5, as suggested int the README, I only changed the "ip1" layer's type to "CInnerProduct" and added the "cinner_product_param" part, the result did not converge, and the size of the ouput caffemodel is 3.2M, even larger thant the original size 1.7M.

So I was wondering if you have encountered this kind of problem before and what is possibly my mis-operation.

The following is the prototxt file, as is in the caffe examples, only the "CInnerProduct" part changed:

name: "LeNet"
layer {
 name: "mnist"
 type: "Data"
 top: "data"
 top: "label"
 include {
  phase: TRAIN
 }
 transform_param {
  scale: 0.00390625
 }
 data_param {
  source: "examples/mnist/mnist_train_lmdb"
  batch_size: 64
  backend: LMDB
 }
}
layer {
 name: "mnist"
 type: "Data"
 top: "data"
 top: "label"
 include {
  phase: TEST
 }
 transform_param {
  scale: 0.00390625
 }
 data_param {
  source: "examples/mnist/mnist_test_lmdb"
  batch_size: 100
  backend: LMDB
 }
}
layer {
 name: "conv1"
 type: "Convolution"
 bottom: "data"
 top: "conv1"
 param {
  lr_mult: 1
 }
 param {
  lr_mult: 2
 }
 convolution_param {
  num_output: 20
  kernel_size: 5
  stride: 1
  weight_filler {
   type: "xavier"
  }
  bias_filler {
   type: "constant"
  }
 }
}
layer {
 name: "pool1"
 type: "Pooling"
 bottom: "conv1"
 top: "pool1"
 pooling_param {
  pool: MAX
  kernel_size: 2
  stride: 2
 }
}
layer {
 name: "conv2"
 type: "Convolution"
 bottom: "pool1"
 top: "conv2"
 param {
  lr_mult: 1
 }
 param {
  lr_mult: 2
 }
 convolution_param {
  num_output: 50
  kernel_size: 5
  stride: 1
  weight_filler {
   type: "xavier"
  }
  bias_filler {
   type: "constant"
  }
 }
}
layer {
 name: "pool2"
 type: "Pooling"
 bottom: "conv2"
 top: "pool2"
 pooling_param {
  pool: MAX
  kernel_size: 2
  stride: 2
 }
}
layer {
 name: "ip1"
 type: "CInnerProduct"
 bottom: "pool2"
 top: "ip1"
 param {
  lr_mult: 1
 }
 param {
  lr_mult: 2
 }
 inner_product_param {
  num_output: 500
  weight_filler {
   type: "xavier"
  }
  bias_filler {
   type: "constant"
  }
 }
 cinner_product_param {
  gamma: 0.0001
  power: 1
  c_rate: 4
  iter_stop: 14000
  weight_mask_filler {
   type: "constant"
   value: 1
  }
  bias_mask_filler {
   type: "constant"
   value: 1
  }
 }
}
layer {
 name: "relu1"
 type: "ReLU"
 bottom: "ip1"
 top: "ip1"
}
layer {
 name: "ip2"
 type: "InnerProduct"
 bottom: "ip1"
 top: "ip2"
 param {
  lr_mult: 1
 }
 param {
  lr_mult: 2
 }
 inner_product_param {
  num_output: 10
  weight_filler {
   type: "xavier"
  }
  bias_filler {
   type: "constant"
  }
 }
}
layer {
 name: "accuracy"
 type: "Accuracy"
 bottom: "ip2"
 bottom: "label"
 top: "accuracy"
 include {
  phase: TEST
 }
}
layer {
 name: "loss"
 type: "SoftmaxWithLoss"
 bottom: "ip2"
 bottom: "label"
 top: "loss"
}

The output from iteration 9000 to iteration 10000 is as following:
(accuracy lingering around 0.1135 )

I0602 04:05:34.931988 15322 solver.cpp:314] Iteration 9000, Testing net (#0)
I0602 04:05:35.897229 15322 solver.cpp:363] Test net output #0: accuracy = 0.1135
I0602 04:05:35.897274 15322 solver.cpp:363] Test net output #1: loss = 2.30104 (* 1 = 2.30104 loss)
I0602 04:05:35.906638 15322 solver.cpp:226] Iteration 9000, loss = 2.30204
I0602 04:05:35.906673 15322 solver.cpp:242] Train net output #0: loss = 2.30204 (* 1 = 2.30204 loss)
I0602 04:05:35.906682 15322 solver.cpp:521] Iteration 9000, lr = 0.00617924
I0602 04:05:37.375916 15322 solver.cpp:226] Iteration 9100, loss = 2.2923
I0602 04:05:37.376133 15322 solver.cpp:242] Train net output #0: loss = 2.2923 (* 1 = 2.2923 loss)
I0602 04:05:37.376145 15322 solver.cpp:521] Iteration 9100, lr = 0.00615496
I0602 04:05:38.845537 15322 solver.cpp:226] Iteration 9200, loss = 2.30995
I0602 04:05:38.845561 15322 solver.cpp:242] Train net output #0: loss = 2.30995 (* 1 = 2.30995 loss)
I0602 04:05:38.845568 15322 solver.cpp:521] Iteration 9200, lr = 0.0061309
I0602 04:05:40.314781 15322 solver.cpp:226] Iteration 9300, loss = 2.31165
I0602 04:05:40.314803 15322 solver.cpp:242] Train net output #0: loss = 2.31165 (* 1 = 2.31165 loss)
I0602 04:05:40.314811 15322 solver.cpp:521] Iteration 9300, lr = 0.00610706
I0602 04:05:41.782209 15322 solver.cpp:226] Iteration 9400, loss = 2.29439
I0602 04:05:41.782232 15322 solver.cpp:242] Train net output #0: loss = 2.29439 (* 1 = 2.29439 loss)
I0602 04:05:41.782239 15322 solver.cpp:521] Iteration 9400, lr = 0.00608343
I0602 04:05:43.237807 15322 solver.cpp:314] Iteration 9500, Testing net (#0)
I0602 04:05:44.201413 15322 solver.cpp:363] Test net output #0: accuracy = 0.1135
I0602 04:05:44.201436 15322 solver.cpp:363] Test net output #1: loss = 2.30121 (* 1 = 2.30121 loss)
I0602 04:05:44.210533 15322 solver.cpp:226] Iteration 9500, loss = 2.30612
I0602 04:05:44.210551 15322 solver.cpp:242] Train net output #0: loss = 2.30612 (* 1 = 2.30612 loss)
I0602 04:05:44.210559 15322 solver.cpp:521] Iteration 9500, lr = 0.00606002
I0602 04:05:45.679636 15322 solver.cpp:226] Iteration 9600, loss = 2.30252
I0602 04:05:45.679658 15322 solver.cpp:242] Train net output #0: loss = 2.30252 (* 1 = 2.30252 loss)
I0602 04:05:45.679666 15322 solver.cpp:521] Iteration 9600, lr = 0.00603682
I0602 04:05:47.147786 15322 solver.cpp:226] Iteration 9700, loss = 2.29213
I0602 04:05:47.147809 15322 solver.cpp:242] Train net output #0: loss = 2.29213 (* 1 = 2.29213 loss)
I0602 04:05:47.147817 15322 solver.cpp:521] Iteration 9700, lr = 0.00601382
I0602 04:05:48.616607 15322 solver.cpp:226] Iteration 9800, loss = 2.29719
I0602 04:05:48.616629 15322 solver.cpp:242] Train net output #0: loss = 2.29719 (* 1 = 2.29719 loss)
I0602 04:05:48.616637 15322 solver.cpp:521] Iteration 9800, lr = 0.00599102
I0602 04:05:50.084087 15322 solver.cpp:226] Iteration 9900, loss = 2.2912
I0602 04:05:50.084110 15322 solver.cpp:242] Train net output #0: loss = 2.2912 (* 1 = 2.2912 loss)
I0602 04:05:50.084120 15322 solver.cpp:521] Iteration 9900, lr = 0.00596843
I0602 04:05:51.538485 15322 solver.cpp:399] Snapshotting to binary proto file examples/mnist/lenet_iter_10000.caffemodel
I0602 04:05:51.553609 15322 solver.cpp:684] Snapshotting solver state to binary proto fileexamples/mnist/lenet_iter_10000.solverstate
I0602 04:05:51.606297 15322 solver.cpp:295] Iteration 10000, loss = 2.29934
I0602 04:05:51.606360 15322 solver.cpp:314] Iteration 10000, Testing net (#0)
I0602 04:05:52.568142 15322 solver.cpp:363] Test net output #0: accuracy = 0.1135
I0602 04:05:52.568188 15322 solver.cpp:363] Test net output #1: loss = 2.30109 (* 1 = 2.30109 loss)
I0602 04:05:52.568197 15322 solver.cpp:300] Optimization Done.
I0602 04:05:52.568205 15322 caffe.cpp:184] Optimization Done.

Thank you very much!

@yiwenguo
Copy link
Owner

yiwenguo commented Jun 2, 2017

Hi @kai-xie , what about using a smaller c_rate (e.g., 2 and 3) for the 'ip1' layer? In the cases when other layers are dense and only 'ip1' is to be compressed, 4 might be too large.

Yes, the obtained caffemodel after running this repo should be larger than the original dense model. This is because we store both the weight tensor Ws and the mask tensor Ts. Hence you should post-process the obtained model to further get sparse tensor W.*Ts and use sparse tensor storage formats for getting memory/storage savings.

@kai-xie
Copy link
Author

kai-xie commented Jun 2, 2017

I changed the c_rate to 2 and it worked.
As for getting the compressed model, I think I'd better read the source code.

Thank you very much for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants