<a href="https://colab.research.google.com/github/willismax/ML-in-Production-30-days-sharing/blob/main/notebook/21.%E6%A8%A1%E5%9E%8B%E5%84%AA%E5%8C%96_%E5%89%AA%E6%9E%9D_Pruning_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 剪枝 Pruning

- 此為鐵人賽系列文示範文件，參考[TensorFlow Lite官方範例](https://www.tensorflow.org/lite/performance/post_training_quantization)修改而成。
- TF Lite 評估函數參考[來源](https://www.tensorflow.org/lite/performance/post_training_integer_quant_16x8)。

- 剪枝 [Pruning](https://www.tensorflow.org/model_optimization/guide/pruning/pruning_with_keras)將無關緊要的權重歸零刪除歸零，在壓縮時能明顯縮小尺寸。
- 經過剪枝且量化的模型可以縮小的原來1/10大小。
- Tensorflow 模型優化模組的`prune_low_magnitude()`，可以將Keras模型在訓練期間將影響較小的權重修剪歸零。
- 在本範例中，您將使用與示範[訓練後量化](https://colab.research.google.com/drive/1ukgVrMdtWjpReIygWHJ7-Lcw61Lv5kAO)相同的基準模型進行優化。

In [1]:
# 建立評估模型的dict
MODEL_SIZE = {}
ACCURACY = {}

In [2]:
!pip install -q -U tensorflow_model_optimization

[?25l[K     |█▍                              | 10 kB 21.7 MB/s eta 0:00:01[K     |██▊                             | 20 kB 20.6 MB/s eta 0:00:01[K     |████▏                           | 30 kB 12.6 MB/s eta 0:00:01[K     |█████▌                          | 40 kB 10.2 MB/s eta 0:00:01[K     |███████                         | 51 kB 4.5 MB/s eta 0:00:01[K     |████████▎                       | 61 kB 5.3 MB/s eta 0:00:01[K     |█████████▋                      | 71 kB 5.7 MB/s eta 0:00:01[K     |███████████                     | 81 kB 5.9 MB/s eta 0:00:01[K     |████████████▍                   | 92 kB 6.5 MB/s eta 0:00:01[K     |█████████████▉                  | 102 kB 5.2 MB/s eta 0:00:01[K     |███████████████▏                | 112 kB 5.2 MB/s eta 0:00:01[K     |████████████████▌               | 122 kB 5.2 MB/s eta 0:00:01[K     |██████████████████              | 133 kB 5.2 MB/s eta 0:00:01[K     |███████████████████▎            | 143 kB 5.2 MB/s eta 0:00:01[K 

In [3]:
import tensorflow as tf
import tensorflow_model_optimization as tfmot
import numpy as np
import os

## 建立基本模型

- 模型採用`tf.keras.datasets.mnist`，用CNN進行建模。

In [4]:
# Load MNIST dataset
mnist = tf.keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# Normalize the input image so that each pixel value is between 0 to 1.
train_images = train_images / 255.0
test_images = test_images / 255.0

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


In [5]:
def model_builder():

  keras = tf.keras

  model = keras.Sequential([
    keras.layers.InputLayer(input_shape=(28, 28)),
    keras.layers.Reshape(target_shape=(28, 28, 1)),
    keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation='relu'),
    keras.layers.MaxPooling2D(pool_size=(2, 2)),
    keras.layers.Flatten(),
    keras.layers.Dense(10, activation='softmax')
  ])

  return model

In [6]:
baseline_model = model_builder()
baseline_model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
    )

baseline_model.summary()
baseline_model.save_weights('baseline_weights.h5')

baseline_model.fit(
    train_images, 
    train_labels, 
    epochs=1, 
    shuffle=False
    )

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 reshape (Reshape)           (None, 28, 28, 1)         0         
                                                                 
 conv2d (Conv2D)             (None, 26, 26, 12)        120       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 13, 13, 12)       0         
 )                                                               
                                                                 
 flatten (Flatten)           (None, 2028)              0         
                                                                 
 dense (Dense)               (None, 10)                20290     
                                                                 
Total params: 20,410
Trainable params: 20,410
Non-trainable params: 0
____________________________________________________

<keras.callbacks.History at 0x7f948c96bb10>

In [7]:
# 儲存未量化模型
baseline_model.save('non_pruned.h5', include_optimizer=False)

# 評估模型並紀錄準確率
_, ACCURACY['baseline Keras model'] = baseline_model.evaluate(test_images, test_labels)

# 紀錄模型大小
MODEL_SIZE['baseline h5'] = os.path.getsize('non_pruned.h5')



In [8]:
ACCURACY

{'baseline Keras model': 0.9544000029563904}

In [9]:
MODEL_SIZE

{'baseline h5': 99144}

## 使用剪枝調整模型

- 進行剪枝，另外因為剪枝模型方法有增加一層包裝層，摘要顯示的參數會增加。

In [10]:
# Get the pruning method
prune_low_magnitude = tfmot.sparsity.keras.prune_low_magnitude

# Compute end step to finish pruning after 2 epochs.
batch_size = 128
epochs = 2
validation_split = 0.1 # 10% of training set will be used for validation set. 

num_images = train_images.shape[0] * (1 - validation_split)
end_step = np.ceil(num_images / batch_size).astype(np.int32) * epochs

# Define pruning schedule.
pruning_params = {
    'pruning_schedule': tfmot.sparsity.keras.PolynomialDecay(
        initial_sparsity=0.50,
        final_sparsity=0.80,
        begin_step=0,
        end_step=end_step)
    }

# Pass in the trained baseline model
model_for_pruning = prune_low_magnitude(
    baseline_model, 
    **pruning_params
    )

# `prune_low_magnitude` requires a recompile.
model_for_pruning.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
    )

model_for_pruning.summary()

  trainable=False)


Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 prune_low_magnitude_reshape  (None, 28, 28, 1)        1         
  (PruneLowMagnitude)                                            
                                                                 
 prune_low_magnitude_conv2d   (None, 26, 26, 12)       230       
 (PruneLowMagnitude)                                             
                                                                 
 prune_low_magnitude_max_poo  (None, 13, 13, 12)       1         
 ling2d (PruneLowMagnitude)                                      
                                                                 
 prune_low_magnitude_flatten  (None, 2028)             1         
  (PruneLowMagnitude)                                            
                                                                 
 prune_low_magnitude_dense (  (None, 10)               4

  aggregation=tf.VariableAggregation.MEAN)
  aggregation=tf.VariableAggregation.MEAN)


- 查看模型中某一層的權重。
  - 剪枝前，有些微弱的權重。
  - 剪枝後，其中許多將被清零。

In [11]:
# 剪枝前的模型權重
model_for_pruning.weights[1]

<tf.Variable 'conv2d/kernel:0' shape=(3, 3, 1, 12) dtype=float32, numpy=
array([[[[-1.38251856e-01,  5.87693863e-02,  6.17338836e-01,
           1.08750060e-01,  2.33612686e-01,  2.05966040e-01,
           3.54602486e-01, -6.88196719e-01,  2.98866808e-01,
          -1.55828863e-01, -1.81033790e-01,  2.16924008e-02]],

        [[ 2.13686824e-01,  2.14811400e-01, -7.14632332e-01,
          -1.17491134e-01,  3.92742068e-01,  2.02936888e-01,
          -1.47854397e-02, -3.02368641e-01,  2.53374457e-01,
          -6.78731978e-01,  2.45914683e-01,  5.62961176e-02]],

        [[ 2.69658089e-01,  8.45840871e-02, -1.61144391e-01,
           9.61077958e-02, -7.01730371e-01,  4.82204482e-02,
          -3.08770835e-01, -9.93704051e-02,  6.57210171e-01,
          -6.09114468e-01,  2.19620198e-01,  1.99704468e-01]]],


       [[[-6.87311813e-02,  7.84165338e-02,  2.05764458e-01,
           6.11244179e-02,  3.80745620e-01, -1.08126715e-01,
           1.98725969e-01,  1.38404980e-01,  1.14110354e-02,
 

- 重新訓練模型。並在 Callback 增加`tfmot.sparsity.keras.UpdatePruningStep()`參數。

In [12]:
# Callback to update pruning wrappers at each step
callbacks=[tfmot.sparsity.keras.UpdatePruningStep()]

# Train and prune the model
model_for_pruning.fit(
    train_images, 
    train_labels,
    epochs=epochs, 
    validation_split=validation_split,
    callbacks=callbacks
    )

Epoch 1/2
Epoch 2/2


<keras.callbacks.History at 0x7f94910e1290>

- 重新訓練後已修剪，觀察同一層的權重變化，許多不重要的權重已歸零。

In [13]:
# 剪枝後的模型權重
model_for_pruning.weights[1]

<tf.Variable 'conv2d/kernel:0' shape=(3, 3, 1, 12) dtype=float32, numpy=
array([[[[ 0.        ,  0.        ,  0.83686227, -0.        ,
          -0.        ,  0.        ,  0.84368485, -1.1801223 ,
          -0.        ,  0.        , -0.        ,  0.        ]],

        [[ 0.        ,  0.        , -1.202781  ,  0.        ,
           0.7022678 , -0.        , -0.        , -0.        ,
          -0.        , -0.9440731 ,  0.        ,  0.        ]],

        [[ 0.        ,  0.        , -0.        , -0.        ,
          -1.0991509 , -0.        , -0.        , -0.        ,
           1.1029274 , -1.0059565 ,  0.        ,  0.        ]]],


       [[[ 0.        ,  0.        ,  0.        , -0.        ,
           0.84900624,  0.        ,  0.        ,  0.        ,
          -0.        ,  0.        , -0.        ,  0.        ]],

        [[ 0.        ,  0.        , -1.5815098 , -0.        ,
           0.668169  , -0.        ,  0.9408827 ,  0.        ,
          -0.        ,  0.        ,  0.      

### 剪枝後移除包裝層


- 剪枝之後，您可以用[`tfmot.sparsity.keras.strip_pruning()`](https://www.tensorflow.org/model_optimization/api_docs/python/tfmot/sparsity/keras/strip_pruning)刪除包裝層以具有與基線模型相同的層和參數。
- 此方法也有助於保存模型並導出為`*.tflite`檔案格式。

In [14]:
# Remove pruning wrappers
model_for_export = tfmot.sparsity.keras.strip_pruning(model_for_pruning)
model_for_export    .summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 reshape (Reshape)           (None, 28, 28, 1)         0         
                                                                 
 conv2d (Conv2D)             (None, 26, 26, 12)        120       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 13, 13, 12)       0         
 )                                                               
                                                                 
 flatten (Flatten)           (None, 2028)              0         
                                                                 
 dense (Dense)               (None, 10)                20290     
                                                                 
Total params: 20,410
Trainable params: 20,410
Non-trainable params: 0
____________________________________________________

- 因為包裝器已被移除，相同的模型權重，已移置索引[0]。

In [15]:
model_for_export.weights[0]

<tf.Variable 'conv2d/kernel:0' shape=(3, 3, 1, 12) dtype=float32, numpy=
array([[[[ 0.        ,  0.        ,  0.83686227, -0.        ,
          -0.        ,  0.        ,  0.84368485, -1.1801223 ,
          -0.        ,  0.        , -0.        ,  0.        ]],

        [[ 0.        ,  0.        , -1.202781  ,  0.        ,
           0.7022678 , -0.        , -0.        , -0.        ,
          -0.        , -0.9440731 ,  0.        ,  0.        ]],

        [[ 0.        ,  0.        , -0.        , -0.        ,
          -1.0991509 , -0.        , -0.        , -0.        ,
           1.1029274 , -1.0059565 ,  0.        ,  0.        ]]],


       [[[ 0.        ,  0.        ,  0.        , -0.        ,
           0.84900624,  0.        ,  0.        ,  0.        ,
          -0.        ,  0.        , -0.        ,  0.        ]],

        [[ 0.        ,  0.        , -1.5815098 , -0.        ,
           0.668169  , -0.        ,  0.9408827 ,  0.        ,
          -0.        ,  0.        ,  0.      

- 將剪枝後的檔案保存為`*.h5`，此時模型與修剪前大小相同。但一旦壓縮模型則改善
相當明顯。



In [16]:
# Save Keras model
model_for_export.save('pruned_model.h5', include_optimizer=False)

# Get uncompressed model size of baseline and pruned models
MODEL_SIZE['pruned non quantized h5'] = os.path.getsize('pruned_model.h5')



In [17]:
MODEL_SIZE

{'baseline h5': 99144, 'pruned non quantized h5': 99144}

## 模型壓縮3倍術

- 剪枝後的模型再壓縮。
- 壓縮後檔案大小約為原本1/3，這是因為剪枝後歸零的權重可以更有效的壓縮。

In [18]:
import tempfile
import zipfile

_, zipped_file = tempfile.mkstemp('.zip')
with zipfile.ZipFile(zipped_file, 'w', compression=zipfile.ZIP_DEFLATED) as f:
    f.write('pruned_model.h5')


MODEL_SIZE['pruned non quantized h5'] = os.path.getsize('pruned_model.h5')

In [19]:
MODEL_SIZE

{'baseline h5': 99144, 'pruned non quantized h5': 99144}

## 模型壓縮10倍術


- 現在嘗試將已精剪枝後的模型再量化。
- 量化原本就會縮小約4倍，將剪枝模型壓縮後再量化，與基本模型相比，這使模型減少了約 10 倍。
- 小10倍精度還能維持水準。

In [20]:
# 剪枝壓縮後再量化模型
converter = tf.lite.TFLiteConverter.from_keras_model(baseline_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]

tflite_model = converter.convert()

with open('pruned_quantized.tflite', 'wb') as f:
    f.write(tflite_model)

INFO:tensorflow:Assets written to: /tmp/tmp8lwhbr77/assets




In [21]:
MODEL_SIZE['pruned quantized tflite'] = os.path.getsize('pruned_quantized.tflite')
MODEL_SIZE


{'baseline h5': 99144,
 'pruned non quantized h5': 99144,
 'pruned quantized tflite': 24112}

- 即便小十倍，精度還維持原本水準。

In [22]:
# A helper function to evaluate the TF Lite model using "test" dataset.
# from: https://www.tensorflow.org/lite/performance/post_training_integer_quant_16x8#evaluate_the_models
def evaluate_model(filemane):
  #Load the model into the interpreters
  interpreter = tf.lite.Interpreter(model_path=str(filemane))
  interpreter.allocate_tensors()

  input_index = interpreter.get_input_details()[0]["index"]
  output_index = interpreter.get_output_details()[0]["index"]

  # Run predictions on every image in the "test" dataset.
  prediction_digits = []
  for test_image in test_images:
    # Pre-processing: add batch dimension and convert to float32 to match with
    # the model's input data format.
    test_image = np.expand_dims(test_image, axis=0).astype(np.float32)
    interpreter.set_tensor(input_index, test_image)

    # Run inference.
    interpreter.invoke()

    # Post-processing: remove batch dimension and find the digit with highest
    # probability.
    output = interpreter.tensor(output_index)
    digit = np.argmax(output()[0])
    prediction_digits.append(digit)

  # Compare prediction results with ground truth labels to calculate accuracy.
  accurate_count = 0
  for index in range(len(prediction_digits)):
    if prediction_digits[index] == test_labels[index]:
      accurate_count += 1
  accuracy = accurate_count * 1.0 / len(prediction_digits)

  return accuracy

In [23]:
# Get accuracy of pruned Keras and TF Lite models

_, ACCURACY['pruned model h5'] = model_for_pruning.evaluate(test_images, test_labels)
ACCURACY['pruned and quantized tflite'] = evaluate_model('pruned_quantized.tflite')



## 成果

In [24]:
ACCURACY

{'baseline Keras model': 0.9544000029563904,
 'pruned and quantized tflite': 0.9663,
 'pruned model h5': 0.96670001745224}

In [25]:
MODEL_SIZE

{'baseline h5': 99144,
 'pruned non quantized h5': 99144,
 'pruned quantized tflite': 24112}

## 參考


- [TensorFlow Lite官方範例](https://www.tensorflow.org/lite/performance/post_training_quantization)。
- TF Lite 評估函數參考[來源](https://www.tensorflow.org/lite/performance/post_training_integer_quant_16x8)。