# Networks Using Blocks (VGG)

:label:`sec_vgg`


While AlexNet proved that deep convolutional neural networks
can achieve good results, it did not offer a general template
to guide subsequent researchers in designing new networks.
In the following sections, we will introduce several heuristic concepts
commonly used to design deep networks.

Progress in this field mirrors that in chip design
where engineers went from placing transistors
to logical elements to logic blocks.
Similarly, the design of neural network architectures
had grown progressively more abstract,
with researchers moving from thinking in terms of
individual neurons to whole layers,
and now to blocks, repeating patterns of layers.

The idea of using blocks first emerged from the
[Visual Geometry Group](http://www.robots.ox.ac.uk/~vgg/) (VGG)
at Oxford University,
in their eponymously-named VGG network.
It is easy to implement these repeated structures in code
with any modern deep learning framework by using loops and subroutines.


## VGG Blocks

The basic building block of classic convolutional networks
is a sequence of the following layers:
(i) a convolutional layer
(with padding to maintain the resolution),
(ii) a nonlinearity such as a ReLU, (iii) a pooling layer such 
as a max pooling layer. 
One VGG block consists of a sequence of convolutional layers,
followed by a max pooling layer for spatial downsampling.
In the original VGG paper :cite:`Simonyan.Zisserman.2014`,
the authors 
employed convolutions with $3\times3$ kernels
and $2 \times 2$ max pooling with stride of $2$
(halving the resolution after each block).
In the code below, we define a function called `vggBlock`
to implement one VGG block.
The function takes two arguments
corresponding to the number of convolutional layers `numConvs`
and the number of output channels `numChannels`.

In [1]:
%use @file[../djl-pytorch.json]
%use lets-plot
@file:DependsOn("org.apache.commons:commons-lang3:3.12.0")
import ai.djl.metric.Metrics

fun getLong(nm: String, n: Long): Long {
    val name = System.getProperty(nm)
    return if (null == name) n.toLong() else name.toLong()
}

class Accumulator(n: Int) {
    val data = FloatArray(n) { 0f }


    /* Adds a set of numbers to the array */
    fun add(args: FloatArray) {
        for (i in 0..args.size - 1) {
            data[i] += args[i]
        }
    }

    /* Resets the array */
    fun reset() {
        data.fill(0f)
    }

    /* Returns the data point at the given index */
    fun get(index: Int): Float {
        return data[index]
    }
}

class DataPoints(X:NDArray , y:NDArray ) {
    private val X = X
    private val y = y

    fun  getX() : NDArray{
        return X
    }
    
    fun getY() :NDArray {
        return y
    }
}

fun syntheticData(manager:NDManager , w: NDArray , b : Float, numExamples: Int) : DataPoints {
    val X = manager.randomNormal(Shape(numExamples.toLong(), w.size()))
    var y = X.matMul(w).add(b)
    // Add noise
    y = y.add(manager.randomNormal(0f, 0.01f, y.getShape(), DataType.FLOAT32))
    return DataPoints(X, y);
}

object Training {

    fun linreg(X: NDArray, w: NDArray, b: NDArray): NDArray {
        return X.dot(w).add(b);
    }

    fun squaredLoss(yHat: NDArray, y: NDArray): NDArray {
        return (yHat.sub(y.reshape(yHat.getShape())))
            .mul((yHat.sub(y.reshape(yHat.getShape()))))
            .div(2);
    }

    fun sgd(params: NDList, lr: Float, batchSize: Int) {
    val lrt = Tracker.fixed(lr);
    val opt = Optimizer.sgd().setLearningRateTracker(lrt).build();
        for (param in params) {
            // Update param in place.
            // param = param - param.gradient * lr / batchSize
            // val ind = params.indexOf(param)
            // params.rep
            // params.set(ind, param.sub(param.getGradient().mul(lr).div(batchSize)))
            opt.update(param.toString(), param, param.getGradient().div(batchSize))
//            param.subi(param.getGradient().mul(lr).div(batchSize));
        }
    }

    /**
     * Allows to do gradient calculations on a subManager. This is very useful when you are training
     * on a lot of epochs. This subManager could later be closed and all NDArrays generated from the
     * calculations in this function will be cleared from memory when subManager is closed. This is
     * always a great practice but the impact is most notable when there is lot of data on various
     * epochs.
     */
    fun sgd(params: NDList, lr: Float, batchSize: Int, subManager: NDManager) {
        for (param in params) {
            // Update param in place.
            // param = param - param.gradient * lr / batchSize
            val gradient = param.getGradient()
            gradient.attach(subManager);
            param.subi(gradient.mul(lr).div(batchSize))
        }
    }

    fun accuracy(yHat: NDArray, y: NDArray): Float {
        // Check size of 1st dimension greater than 1
        // to see if we have multiple samples
        if (yHat.getShape().size(1) > 1) {
            // Argmax gets index of maximum args for given axis 1
            // Convert yHat to same dataType as y (int32)
            // Sum up number of true entries
            return yHat.argMax(1)
                .toType(DataType.INT32, false)
                .eq(y.toType(DataType.INT32, false))
                .sum()
                .toType(DataType.FLOAT32, false)
                .getFloat();
        }
        return yHat.toType(DataType.INT32, false)
            .eq(y.toType(DataType.INT32, false))
            .sum()
            .toType(DataType.FLOAT32, false)
            .getFloat();
    }

    fun trainingChapter6(
        trainIter: ArrayDataset,
        testIter: ArrayDataset,
        numEpochs: Int,
        trainer: Trainer,
        evaluatorMetrics: MutableMap<String, DoubleArray>
    ): Double {

        trainer.setMetrics(Metrics())

        EasyTrain.fit(trainer, numEpochs, trainIter, testIter)

        val metrics = trainer.getMetrics()

        trainer.getEvaluators()
            .forEach { evaluator ->
                {
                    evaluatorMetrics.put(
                        "train_epoch_" + evaluator.getName(),
                        metrics.getMetric("train_epoch_" + evaluator.getName()).stream()
                            .mapToDouble { x -> x.getValue() }
                            .toArray())
                    evaluatorMetrics.put(
                        "validate_epoch_" + evaluator.getName(),
                        metrics
                            .getMetric("validate_epoch_" + evaluator.getName())
                            .stream()
                            .mapToDouble { x -> x.getValue() }
                            .toArray())
                }
            }

        return metrics.mean("epoch")
    }

    /* Softmax-regression-scratch */
    fun evaluateAccuracy(net: UnaryOperator<NDArray>, dataIterator: Iterable<Batch>): Float {
        val metric = Accumulator(2) // numCorrectedExamples, numExamples
        for (batch in dataIterator) {
            val X = batch.getData().head()
            val y = batch.getLabels().head()
            metric.add(floatArrayOf(accuracy(net.apply(X), y), y.size().toFloat()))
            batch.close()
        }
        return metric.get(0) / metric.get(1)
    }
    /* End Softmax-regression-scratch */

    /* MLP */
    /* Evaluate the loss of a model on the given dataset */
    fun evaluateLoss(
        net: UnaryOperator<NDArray>,
        dataIterator: Iterable<Batch>,
        loss: BinaryOperator<NDArray>
    ): Float {
        val metric = Accumulator(2) // sumLoss, numExamples

        for (batch in dataIterator) {
            val X = batch . getData ().head();
            val y = batch . getLabels ().head();
            metric.add(
                floatArrayOf(loss.apply(net.apply(X), y).sum().getFloat(), y.size().toFloat()) )
            batch.close()
        }
        return metric.get(0) / metric.get(1)
    }
    /* End MLP */
}

// %load ../utils/djl-imports
// %load ../utils/plot-utils
// %load ../utils/DataPoints.java
// %load ../utils/Training.java
// %load ../utils/Accumulator.java

In [2]:
import ai.djl.basicdataset.cv.classification.*;
import org.apache.commons.lang3.ArrayUtils;

In [3]:
fun vggBlock(numConvs: Int,numChannels: Int) : SequentialBlock{

    val tempBlock = SequentialBlock();
    for (i in 0 until numConvs) {
        // DJL has default stride of 1x1, so don't need to set it explicitly.
        tempBlock
                .add(Conv2d.builder()
                        .setFilters(numChannels)
                        .setKernelShape(Shape(3, 3))
                        .optPadding(Shape(1, 1))
                        .build()
                )
                .add(Activation::relu);
    }
    tempBlock.add(Pool.maxPool2dBlock(Shape(2, 2), Shape(2, 2)));
    return tempBlock;
}

## VGG Network

Like AlexNet and LeNet,
the VGG Network can be partitioned into two parts:
the first consisting mostly of convolutional and pooling layers
and a second consisting of fully-connected layers.
The convolutional portion of the net connects several `vggBlock` modules
in succession.
In :numref:`fig_vgg`, the variable `convArch` consists of a list of tuples (one per block),
where each contains two values: the number of convolutional layers
and the number of output channels,
which are precisely the arguments requires to call
the `vggBlock` function.
The fully-connected module is identical to that covered in AlexNet.

![Designing a network from building blocks](https://raw.githubusercontent.com/d2l-ai/d2l-en/master/img/vgg.svg)

:width:`400px`


:label:`fig_vgg`


The original VGG network had 5 convolutional blocks,
among which the first two have one convolutional layer each
and the latter three contain two convolutional layers each.
The first block has 64 output channels
and each subsequent block doubles the number of output channels,
until that number reaches $512$.
Since this network uses $8$ convolutional layers
and $3$ fully-connected layers, it is often called VGG-11.

In [4]:
val convArch = arrayOf(intArrayOf(1, 64), intArrayOf(1, 128), intArrayOf(2, 256), intArrayOf(2, 512), intArrayOf(2, 512))

The following code implements VGG-11. This is a simple matter of executing a for loop over `convArch`.

In [5]:
fun  VGG(convArch : Array<IntArray>) : SequentialBlock {

    val block = SequentialBlock();
    // The convolutional layer part
    for (i in 0 until  convArch.size) {
        block.add(vggBlock(convArch[i][0], convArch[i][1]));
    }

    // The fully connected layer part
    block
        .add(Blocks.batchFlattenBlock())
        .add(Linear
                .builder()
                .setUnits(4096)
                .build())
        .add(Activation::relu)
        .add(Dropout
                .builder()
                .optRate(0.5f)
                .build())
        .add(Linear
                .builder()
                .setUnits(4096)
                .build())
        .add(Activation::relu)
        .add(Dropout
                .builder()
                .optRate(0.5f)
                .build())
        .add(Linear.builder().setUnits(10).build());
    
    return block;
}

val block = VGG(convArch);

Next, we will construct a single-channel data example
with a height and width of 224 to observe the output shape of each layer.

In [6]:
val lr = 0.05f;
val model = Model.newInstance("vgg-display");
model.setBlock(block);

val loss = Loss.softmaxCrossEntropyLoss();

val lrt = Tracker.fixed(lr);
val sgd = Optimizer.sgd().setLearningRateTracker(lrt).build();

val config = DefaultTrainingConfig(loss).optOptimizer(sgd) // Optimizer (loss function)
                .optDevices(Engine.getInstance().getDevices(1)) // single GPU
                .addEvaluator(Accuracy()) // Model Accuracy
                .addTrainingListeners(*TrainingListener.Defaults.logging()); // Logging

val trainer = model.newTrainer(config);

val inputShape = Shape(1, 1, 224, 224);

val manager = NDManager.newBaseManager()
    val X = manager.randomUniform(0f, 1.0f, inputShape);
    trainer.initialize(inputShape);

    var currentShape = X.getShape();

    for (i in 0 until block.getChildren().size()) {
        val newShape = block.getChildren().get(i).getValue().getOutputShapes(arrayOf<Shape>(currentShape))
        currentShape = newShape[0];
        println(block.getChildren().get(i).getKey() + " layer output : " + currentShape);
    }
manager.close()
// save memory on VGG params
model.close();

01SequentialBlock layer output : (1, 64, 112, 112)
02SequentialBlock layer output : (1, 128, 56, 56)
03SequentialBlock layer output : (1, 256, 28, 28)
04SequentialBlock layer output : (1, 512, 14, 14)
05SequentialBlock layer output : (1, 512, 7, 7)
06LambdaBlock layer output : (1, 25088)
07Linear layer output : (1, 4096)
08LambdaBlock layer output : (1, 4096)
09Dropout layer output : (1, 4096)
10Linear layer output : (1, 4096)
11LambdaBlock layer output : (1, 4096)
12Dropout layer output : (1, 4096)
13Linear layer output : (1, 10)


As you can see, we halve height and width at each block,
finally reaching a height and width of 7
before flattening the representations
for processing by the fully-connected layer.

## Model Training

Since VGG-11 is more computationally-heavy than AlexNet
we construct a network with a smaller number of channels.
This is more than sufficient for training on Fashion-MNIST.

In [7]:
val ratio = 4;

for(i in 0 until convArch.size){
    convArch[i][1] = convArch[i][1] / ratio;
}

val inputShape = Shape(1, 1, 96, 96); // resize the input shape to save memory

val model = Model.newInstance("vgg-tiny");
val newBlock = VGG(convArch);
model.setBlock(newBlock);
val loss = Loss.softmaxCrossEntropyLoss();

val lrt = Tracker.fixed(lr);
val sgd = Optimizer.sgd().setLearningRateTracker(lrt).build();

val config = DefaultTrainingConfig(loss).optOptimizer(sgd) // Optimizer (loss function)
                .optDevices(Engine.getInstance().getDevices(1)) // single GPU
                .addEvaluator(Accuracy()) // Model Accuracy
                .addTrainingListeners(*TrainingListener.Defaults.logging()); // Logging

val trainer = model.newTrainer(config);
trainer.initialize(inputShape);

In [8]:
val batchSize = 128;
val numEpochs = Integer.getInteger("MAX_EPOCH", 10);

//double[] trainLoss;
//double[] testAccuracy;
//double[] epochCount;
//double[] trainAccuracy;

val epochCount = IntArray(numEpochs) { it + 1 }

val trainIter = FashionMnist.builder()
        .addTransform(Resize(96))
        .addTransform(ToTensor())
        .optUsage(Dataset.Usage.TRAIN)
        .setSampling(batchSize, true)
        .optLimit(getLong("DATASET_LIMIT", Long.MAX_VALUE))
        .build();

val testIter = FashionMnist.builder()
        .addTransform(Resize(96))
        .addTransform(ToTensor())
        .optUsage(Dataset.Usage.TEST)
        .setSampling(batchSize, true)
        .optLimit(getLong("DATASET_LIMIT", Long.MAX_VALUE))
        .build();

trainIter.prepare();
testIter.prepare();


Apart from using a slightly larger learning rate,
the model training process is similar to that of AlexNet in the last section.

In [9]:
val evaluatorMetrics = mutableMapOf<String, DoubleArray>()
val avgTrainTimePerEpoch = Training.trainingChapter6(trainIter, testIter, numEpochs, trainer, evaluatorMetrics);

Training:    100% |████████████████████████████████████████| Accuracy: 0.72, SoftmaxCrossEntropyLoss: 0.77████                                  | Accuracy: 0.32, SoftmaxCrossEntropyLoss: 1.90��███████                             | Accuracy: 0.48, SoftmaxCrossEntropyLoss: 1.43��█████████████████                      | Accuracy: 0.60, SoftmaxCrossEntropyLoss: 1.11% |██████████████████████████████          | Accuracy: 0.68, SoftmaxCrossEntropyLoss: 0.87███      | Accuracy: 0.70, SoftmaxCrossEntropyLoss: 0.82███████████████████████████   | Accuracy: 0.71, SoftmaxCrossEntropyLoss: 0.79
Validating:  100% |████████████████████████████████████████|                            |    |            |██████████████                   |
Training:    100% |████████████████████████████████████████| Accuracy: 0.86, SoftmaxCrossEntropyLoss: 0.38               | Accuracy: 0.85, SoftmaxCrossEntropyLoss: 0.40% |██████████████████████████████████████  | Accuracy: 0.86, SoftmaxCrossEntropyLoss: 0.38
Validating:

In [15]:
val trainLoss = evaluatorMetrics.get("train_epoch_SoftmaxCrossEntropyLoss");
val trainAccuracy = evaluatorMetrics.get("train_epoch_Accuracy");
val testAccuracy = evaluatorMetrics.get("validate_epoch_Accuracy");

print("loss %.3f,".format(trainLoss!![numEpochs - 1]))
print(" train acc %.3f,".format(trainAccuracy!![numEpochs - 1]))
print(" test acc %.3f\n".format(testAccuracy!![numEpochs - 1]))
print("%.1f examples/sec".format(trainIter.size() / (avgTrainTimePerEpoch / Math.pow(10.0, 9.0))))
println();

null


null
java.lang.NullPointerException
	at Line_920.<init>(Line_920.jupyter-kts:6)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
	at kotlin.script.experimental.jvm.BasicJvmScriptEvaluator.evalWithConfigAndOtherScriptsResults(BasicJvmScriptEvaluator.kt:105)
	at kotlin.script.experimental.jvm.BasicJvmScriptEvaluator.invoke$suspendImpl(BasicJvmScriptEvaluator.kt:47)
	at kotlin.script.experimental.jvm.BasicJvmScriptEvaluator.invoke(BasicJvmScriptEvaluator.kt)
	at kotlin.script.experimental.jvm.BasicJvmReplEvaluator.eval(BasicJvmReplEvaluator.kt:49)
	at org.jetbrains.kotlinx.jupyter.repl.impl.InternalEvaluatorImpl$eval$resultWithDiagno

![Contour Gradient Descent.](https://d2l-java-resources.s3.amazonaws.com/img/chapter_convolution-modern-cnn-VGG.png)

In [None]:
// String[] lossLabel = new String[trainLoss.length + testAccuracy.length + trainAccuracy.length];

val trainLossLabel =  Array<String>(trainLoss!!.size) { "train loss" }
val trainAccLabel = Array<String>(trainLoss!!.size) { "train acc" }
val testAccLabel = Array<String>(trainLoss!!.size) { "test acc" }
val data = mapOf<String, Any>(
      "label" to trainLossLabel + trainAccLabel + testAccLabel,
      "epoch" to epochCount + epochCount + epochCount,
      "metrics" to trainLoss!! + trainAccuracy!! + testAccuracy!!
)

var plot = letsPlot(data)
plot += geomLine { x = "epoch" ; y = "metrics" ; color = "label"}
plot + ggsize(700, 500)

## Summary

* VGG-11 constructs a network using reusable convolutional blocks. Different VGG models can be defined by the differences in the number of convolutional layers and output channels in each block.
* The use of blocks leads to very compact representations of the network definition. It allows for efficient design of complex networks.
* In their work Simonyan and Ziserman experimented with various architectures. In particular, they found that several layers of deep and narrow convolutions (i.e., $3 \times 3$) were more effective than fewer layers of wider convolutions.

## Exercises

1. When printing out the dimensions of the layers we only saw 8 results rather than 11. Where did the remaining 3 layer informations go?
1. Compared with AlexNet, VGG is much slower in terms of computation, and it also needs more GPU memory. Try to analyze the reasons for this.
1. Try to change the height and width of the images in Fashion-MNIST from 224 to 96. What influence does this have on the experiments?
1. Refer to Table 1 in :cite:`Simonyan.Zisserman.2014` to construct other common models, such as VGG-16 or VGG-19.