##  Image Classification with Pre-trained Imagenet CNN

This example is adapted from Caffe's iPython Notebook example. We use pre-trained CNN model converted from Caffe's Model Zoo. Specifically, we use bvlc_reference_caffenet, which is the same model used in Caffe's own iPython Notebook example.

This notebook is located in * examples/ijulia/ilsvrc12 * under Mocha's source directory. If you want to run this example by yourself, you need to

* Install IJulia.jl, which is a Julia backend for IPython. Of course, you also need to have Python and IPython installed.
* Install Images.jl, which we will use to read image files in this example.
* (Optional) Install Gadfly.jl, which will be used to plot the class probability prediction.
* Download pre-trained CNN model. There is a shell script get-model.sh that you could run to download the pre-trained CNN model in HDF5 format converted from Caffe's original binary protocol buffer format.

After all the preparation, you can start the notebook by executing the following command in this demo's source directory.

 _ jupyter notebook _

### Constructing Convolutional Network
We will use Mocha's native extension here to get faster convolution. If you prefer to disable it or use CUDA backend instead, please refer to Mocha's document for details.

In [1]:
using Mocha
backend=Mocha.CPUBackend()
Mocha.init(backend)

Configuring Mocha...
 * CUDA       disabled by default
 * Native Ext disabled by default
Mocha configured, continue loading module...
DefaultBackend = Mocha.CPUBackend


Next we will define the network structure. This is directly adapted from Caffe's bvlc_reference_caffenet model definition. Please refer to Mocha's CIFAR-10 tutorial on how to translate Caffe's model definition to Mocha. This model takes 3-channel color images of size 256-by-256 and crop to take a 227-by-227 region.

In [2]:
img_width, img_height, img_channels = (256, 256, 3)
crop_size = (227, 227)
batch_size = 1  # could be larger if you want to classify a bunch of images at a time

layers = [
  MemoryDataLayer(name="data", tops=[:data], batch_size=batch_size,
      transformers=[(:data, DataTransformers.Scale(scale=255)),
                    (:data, DataTransformers.SubMean(mean_file="model/ilsvrc12_mean.hdf5"))],
      data = Array[zeros(img_width, img_height, img_channels, batch_size)])
  CropLayer(name="crop", tops=[:cropped], bottoms=[:data], crop_size=crop_size)
  ConvolutionLayer(name="conv1", tops=[:conv1], bottoms=[:cropped],
      kernel=(11,11), stride=(4,4), n_filter=96, neuron=Neurons.ReLU())
  PoolingLayer(name="pool1", tops=[:pool1], bottoms=[:conv1],
      kernel=(3,3), stride=(2,2), pooling=Pooling.Max())
  LRNLayer(name="norm1", tops=[:norm1], bottoms=[:pool1],
      kernel=5, scale=0.0001, power=0.75)
  ConvolutionLayer(name="conv2", tops=[:conv2], bottoms=[:norm1],
      kernel=(5,5), pad=(2,2), n_filter=256, n_group=2, neuron=Neurons.ReLU())
  PoolingLayer(name="pool2", tops=[:pool2], bottoms=[:conv2],
      kernel=(3,3), stride=(2,2), pooling=Pooling.Max())
  LRNLayer(name="norm2", tops=[:norm2], bottoms=[:pool2],
      kernel=5, scale=0.0001, power=0.75)
  ConvolutionLayer(name="conv3", tops=[:conv3], bottoms=[:norm2],
      kernel=(3,3), pad=(1,1), n_filter=384, neuron=Neurons.ReLU())
  ConvolutionLayer(name="conv4", tops=[:conv4], bottoms=[:conv3],
      kernel=(3,3), pad=(1,1), n_filter=384, n_group=2, neuron=Neurons.ReLU())
  ConvolutionLayer(name="conv5", tops=[:conv5], bottoms=[:conv4],
      kernel=(3,3), pad=(1,1), n_filter=256, n_group=2, neuron=Neurons.ReLU())
  PoolingLayer(name="pool5", tops=[:pool5], bottoms=[:conv5],
      kernel=(3,3), stride=(2,2), pooling=Pooling.Max())
  InnerProductLayer(name="fc6", tops=[:fc6], bottoms=[:pool5],
      output_dim=4096, neuron=Neurons.ReLU())
  InnerProductLayer(name="fc7", tops=[:fc7], bottoms=[:fc6],
      output_dim=4096, neuron=Neurons.ReLU())
  InnerProductLayer(name="fc8", tops=[:fc8], bottoms=[:fc7],
      output_dim=1000)
  SoftmaxLayer(name="prob", tops=[:prob], bottoms=[:fc8])
]

net = Net("imagenet", backend, layers)
println(net)

[2018-08-16T16:56:56 | info | Mocha]: Constructing net imagenet on Mocha.CPUBackend...
[2018-08-16T16:56:57 | info | Mocha]: Topological sorting 16 layers...
[2018-08-16T16:56:57 | info | Mocha]: Setup layers...
[2018-08-16T16:57:03 | info | Mocha]: Network constructed!
************************************************************
          NAME: imagenet
       BACKEND: Mocha.CPUBackend
  ARCHITECTURE: 16 layers
............................................................
 *** Mocha.MemoryDataLayer(data)
    Outputs ---------------------------
          data: Blob(256 x 256 x 3 x 1)
............................................................
 *** Mocha.CropLayer(crop)
    Inputs ----------------------------
          data: Blob(256 x 256 x 3 x 1)
    Outputs ---------------------------
       cropped: Blob(227 x 227 x 3 x 1)
............................................................
 *** Mocha.ConvolutionLayer(conv1)
    Inputs ----------------------------
       cropped: Blob(227 x

In [3]:
open("net.dot", "w") do out net2dot(out, net) end
run(pipeline(`dot -Tpng net.dot`, "net.png"))

In [6]:
?Images.load

  * `load(filename)` loads the contents of a formatted file, trying to infer

the format from `filename` and/or magic bytes in the file.

  * `load(strm)` loads from an `IOStream` or similar object. In this case,

there is no filename extension, so we rely on the magic bytes for format identification.

  * `load(File(format"PNG", filename))` specifies the format directly, and bypasses inference.
  * `load(Stream(format"PNG", io))` specifies the format directly, and bypasses inference.
  * `load(f; options...)` passes keyword arguments on to the loader.
