# Fine tune pre-trained networks
We will fine-tune a pre-trained network (CaffeNet) on the Caltech-101 dataset. 

### 1. Setup configuration files
* Copy prototxt files from CaffeNet

In [None]:
%%bash
cd ~/apps/caffe/models
mkdir finetune_cal101
cp bvlc_reference_caffenet/*.prototxt finetune_cal101/

* Edit solver.prototxt files for Caltech-101

* Next, edit train_val.txt, which describes the network configuration during training. Since we are going to fine-tune using raw images (unlike in the CaffeNet example), we have to change many specifications. We show some important points below. 

### 2. Fine-tune the network

Now, everything is ready! Let's start fine-tuning.

In [None]:
%%bash
export LD_LIBRARY_PATH="/home/ubuntu/anaconda2/lib:$LD_LIBRARY_PATH"
cd ~/apps/caffe
# ./build/tools/caffe shows the usage
./build/tools/caffe train -solver models/finetune_cal101/solver.prototxt \
-weights models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel -gpu 0

# -solver : define solver parameters
# -weights : specify a pre-trained model (Optional for fine-tuning) 

After epochs, we got 84% validation accuracy.

We observe that valication accuracy improves to 89.7% after 50,000 iterations, although it is difficult to confirm it on the iLect server due to resource limitation.

### 3. Using the fine-tuned network

Of course, we can utilize our fine-tuned network in the same manner as in the previous exercise.
To do this, we need to prepared "deploy.txt" that specifies the configurations for the final fixed network. 

Now, let's call our fine-tuned network using python interface. Here, we use the snapshot after 1000 iterations.

In [None]:
# set up Python environment: numpy for numerical routines, and matplotlib for plotting
import sys,os
import numpy as np

# load caffe
caffe_root = '/home/ubuntu/apps/caffe/'  # this file should be run from {caffe_root}/examples (otherwise change this line)
sys.path.insert(0, caffe_root + 'python')
import caffe # If you get "No module named _caffe", either you have not built pycaffe or you have the wrong path.

# load Caltech-101 labels
labels_file = '/home/ubuntu/dataset/Caltech101/labels.txt'
labels = np.loadtxt(labels_file, str, delimiter='\t')

caffe.set_mode_gpu()

model_def = caffe_root + 'models/finetune_cal101/deploy.prototxt'
model_weights = caffe_root + 'models/finetune_cal101/caltech101_train_iter_1000.caffemodel'

net = caffe.Net(model_def,      # defines the structure of the model
                model_weights,  # contains the trained weights
                caffe.TEST)     # use test mode (e.g., don't perform dropout)

(K,H,W) = net.blobs['data'].shape[1:] # input size

# load the mean ImageNet image (as distributed with Caffe) for subtraction
mu = np.load(caffe_root + 'python/caffe/imagenet/ilsvrc_2012_mean.npy') # average image over the ImageNet dataset
mu = caffe.io.resize_image(mu.transpose(1,2,0),(H,W)) # resize the average image
mu = mu.transpose(2,0,1)
#mu = mu.mean(1).mean(1)  # we may use average over pixels instead

# create transformer for the input called 'data'
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
transformer.set_transpose('data', (2,0,1))  # move image channels to outermost dimension
transformer.set_mean('data', mu)            # subtract the dataset-mean value in each channel
transformer.set_raw_scale('data', 255)      # rescale from [0, 1] to [0, 255]
transformer.set_channel_swap('data', (2,1,0))  # swap channels from RGB to BGR

image = caffe.io.load_image(caffe_root + 'examples/images/cat.jpg')
transformed_image = transformer.preprocess('data', image)

# copy the image data into the memory allocated for the net
net.blobs['data'].data[...] = transformed_image # fill the batch with the same image...(i.e., 50 copies)
### perform classification
output = net.forward()
output_prob = output['prob'][0]  # the output probability vector for the first image in the batch

print 'predicted class is:', output_prob.argmax()
print 'output label:', labels[output_prob.argmax()]

### 4. Exercise
* Setup prototxt files above.
* Change some parameters and see what happenes (e.g., learning rate, batch size, solver type).
* Compare with fixed feature based approach and full-scratch training. (Full-scartch training can be done by simply removing -weight option)
* Train on other pre-trained network (VGG-16).
* Use your own dataset!