# Networks with Parallel Concatenations (GoogLeNet)

:label:`sec_googlenet`


In 2014, :cite:`Szegedy.Liu.Jia.ea.2015`
won the ImageNet Challenge, proposing a structure
that combined the strengths of the NiN and repeated blocks paradigms.
One focus of the paper was to address the question
of which sized convolutional kernels are best.
After all, previous popular networks employed choices
as small as $1 \times 1$ and as large as $11 \times 11$.
One insight in this paper was that sometimes
it can be advantageous to employ a combination of variously-sized kernels.
In this section, we will introduce GoogLeNet,
presenting a slightly simplified version of the original model---we
omit a few ad hoc features that were added to stabilize training
but are unnecessary now with better training algorithms available.

## Inception Blocks

The basic convolutional block in GoogLeNet is called an Inception block,
likely named due to a quote from the movie Inception ("We Need To Go Deeper"),
which launched a viral meme.

![Structure of the Inception block. ](https://raw.githubusercontent.com/d2l-ai/d2l-en/master/img/inception.svg)

As depicted in the figure above,
the inception block consists of four parallel paths.
The first three paths use convolutional layers
with window sizes of $1\times 1$, $3\times 3$, and $5\times 5$
to extract information from different spatial sizes.
The middle two paths perform a $1\times 1$ convolution on the input
to reduce the number of input channels, reducing the model's complexity.
The fourth path uses a $3\times 3$ maximum pooling layer,
followed by a $1\times 1$ convolutional layer
to change the number of channels.
The four paths all use appropriate padding to give the input and output the same height and width.
Finally, the outputs along each path are concatenated
along the channel dimension and comprise the block's output.
The commonly-tuned parameters of the Inception block
are the number of output channels per layer.

In [1]:
%use @file[../djl-pytorch.json]
%use lets-plot
@file:DependsOn("org.apache.commons:commons-lang3:3.12.0")
import ai.djl.metric.Metrics

fun getLong(nm: String, n: Long): Long {
    val name = System.getProperty(nm)
    return if (null == name) n.toLong() else name.toLong()
}

class Accumulator(n: Int) {
    val data = FloatArray(n) { 0f }


    /* Adds a set of numbers to the array */
    fun add(args: FloatArray) {
        for (i in 0..args.size - 1) {
            data[i] += args[i]
        }
    }

    /* Resets the array */
    fun reset() {
        data.fill(0f)
    }

    /* Returns the data point at the given index */
    fun get(index: Int): Float {
        return data[index]
    }
}

class DataPoints(X:NDArray , y:NDArray ) {
    private val X = X
    private val y = y

    fun  getX() : NDArray{
        return X
    }
    
    fun getY() :NDArray {
        return y
    }
}

fun syntheticData(manager:NDManager , w: NDArray , b : Float, numExamples: Int) : DataPoints {
    val X = manager.randomNormal(Shape(numExamples.toLong(), w.size()))
    var y = X.matMul(w).add(b)
    // Add noise
    y = y.add(manager.randomNormal(0f, 0.01f, y.getShape(), DataType.FLOAT32))
    return DataPoints(X, y);
}

object Training {

    fun linreg(X: NDArray, w: NDArray, b: NDArray): NDArray {
        return X.dot(w).add(b);
    }

    fun squaredLoss(yHat: NDArray, y: NDArray): NDArray {
        return (yHat.sub(y.reshape(yHat.getShape())))
            .mul((yHat.sub(y.reshape(yHat.getShape()))))
            .div(2);
    }

    fun sgd(params: NDList, lr: Float, batchSize: Int) {
    val lrt = Tracker.fixed(lr);
    val opt = Optimizer.sgd().setLearningRateTracker(lrt).build();
        for (param in params) {
            // Update param in place.
            // param = param - param.gradient * lr / batchSize
            // val ind = params.indexOf(param)
            // params.rep
            // params.set(ind, param.sub(param.getGradient().mul(lr).div(batchSize)))
            opt.update(param.toString(), param, param.getGradient().div(batchSize))
//            param.subi(param.getGradient().mul(lr).div(batchSize));
        }
    }

    /**
     * Allows to do gradient calculations on a subManager. This is very useful when you are training
     * on a lot of epochs. This subManager could later be closed and all NDArrays generated from the
     * calculations in this function will be cleared from memory when subManager is closed. This is
     * always a great practice but the impact is most notable when there is lot of data on various
     * epochs.
     */
    fun sgd(params: NDList, lr: Float, batchSize: Int, subManager: NDManager) {
        for (param in params) {
            // Update param in place.
            // param = param - param.gradient * lr / batchSize
            val gradient = param.getGradient()
            gradient.attach(subManager);
            param.subi(gradient.mul(lr).div(batchSize))
        }
    }

    fun accuracy(yHat: NDArray, y: NDArray): Float {
        // Check size of 1st dimension greater than 1
        // to see if we have multiple samples
        if (yHat.getShape().size(1) > 1) {
            // Argmax gets index of maximum args for given axis 1
            // Convert yHat to same dataType as y (int32)
            // Sum up number of true entries
            return yHat.argMax(1)
                .toType(DataType.INT32, false)
                .eq(y.toType(DataType.INT32, false))
                .sum()
                .toType(DataType.FLOAT32, false)
                .getFloat();
        }
        return yHat.toType(DataType.INT32, false)
            .eq(y.toType(DataType.INT32, false))
            .sum()
            .toType(DataType.FLOAT32, false)
            .getFloat();
    }

    fun trainingChapter6(
        trainIter: ArrayDataset,
        testIter: ArrayDataset,
        numEpochs: Int,
        trainer: Trainer,
        evaluatorMetrics: MutableMap<String, DoubleArray>
    ): Double {

        trainer.setMetrics(Metrics())

        EasyTrain.fit(trainer, numEpochs, trainIter, testIter)

        val metrics = trainer.getMetrics()

        trainer.getEvaluators()
            .forEach { evaluator ->
                {
                    evaluatorMetrics.put(
                        "train_epoch_" + evaluator.getName(),
                        metrics.getMetric("train_epoch_" + evaluator.getName()).stream()
                            .mapToDouble { x -> x.getValue() }
                            .toArray())
                    evaluatorMetrics.put(
                        "validate_epoch_" + evaluator.getName(),
                        metrics
                            .getMetric("validate_epoch_" + evaluator.getName())
                            .stream()
                            .mapToDouble { x -> x.getValue() }
                            .toArray())
                }
            }

        return metrics.mean("epoch")
    }

    /* Softmax-regression-scratch */
    fun evaluateAccuracy(net: UnaryOperator<NDArray>, dataIterator: Iterable<Batch>): Float {
        val metric = Accumulator(2) // numCorrectedExamples, numExamples
        for (batch in dataIterator) {
            val X = batch.getData().head()
            val y = batch.getLabels().head()
            metric.add(floatArrayOf(accuracy(net.apply(X), y), y.size().toFloat()))
            batch.close()
        }
        return metric.get(0) / metric.get(1)
    }
    /* End Softmax-regression-scratch */

    /* MLP */
    /* Evaluate the loss of a model on the given dataset */
    fun evaluateLoss(
        net: UnaryOperator<NDArray>,
        dataIterator: Iterable<Batch>,
        loss: BinaryOperator<NDArray>
    ): Float {
        val metric = Accumulator(2) // sumLoss, numExamples

        for (batch in dataIterator) {
            val X = batch . getData ().head();
            val y = batch . getLabels ().head();
            metric.add(
                floatArrayOf(loss.apply(net.apply(X), y).sum().getFloat(), y.size().toFloat()) )
            batch.close()
        }
        return metric.get(0) / metric.get(1)
    }
    /* End MLP */
}

// %load ../utils/djl-imports
// %load ../utils/plot-utils
// %load ../utils/DataPoints.java
// %load ../utils/Training.java
// %load ../utils/Accumulator.java

In [2]:
import ai.djl.basicdataset.cv.classification.*;
import org.apache.commons.lang3.ArrayUtils;
import java.util.stream.*;

In [3]:
// c1 - c4 are the number of output channels for each layer in the path
fun inceptionBlock(c1: Int, c2: IntArray, c3: IntArray, c4: Int) : ParallelBlock {

    // Path 1 is a single 1 x 1 convolutional layer
    val p1 = SequentialBlock().add(
            Conv2d.builder()
                    .setFilters(c1)
                    .setKernelShape(Shape(1, 1))
                    .build())
            .add(Activation::relu);

    // Path 2 is a 1 x 1 convolutional layer followed by a 3 x 3
    // convolutional layer
    val p2 = SequentialBlock().add(
            Conv2d.builder()
                    .setFilters(c2[0])
                    .setKernelShape(Shape(1, 1))
                    .build())
            .add(Activation::relu)
            .add(
                    Conv2d.builder()
                            .setFilters(c2[1])
                            .setKernelShape(Shape(3, 3))
                            .optPadding(Shape(1, 1))
                            .build())
            .add(Activation::relu);

    // Path 3 is a 1 x 1 convolutional layer followed by a 5 x 5
    // convolutional layer
    val p3 = SequentialBlock().add(
            Conv2d.builder()
                    .setFilters(c3[0])
                    .setKernelShape(Shape(1, 1))
                    .build())
            .add(Activation::relu)
            .add(
                    Conv2d.builder()
                            .setFilters(c3[1])
                            .setKernelShape(Shape(5, 5))
                            .optPadding(Shape(2, 2))
                            .build())
            .add(Activation::relu);

    // Path 4 is a 3 x 3 maximum pooling layer followed by a 1 x 1
    // convolutional layer
    val p4 : Block = SequentialBlock()
            .add(Pool.maxPool2dBlock(Shape(3, 3), Shape(1, 1), Shape(1, 1)))
            .add(Conv2d.builder()
                    .setFilters(c4)
                    .setKernelShape(Shape(1, 1))
                    .build())
            .add(Activation::relu);

    // Concatenate the outputs on the channel dimension
    return ParallelBlock(
            { list: List<NDList> ->
                val concatenatedList = list
                    .stream()
                    .map { obj: NDList -> obj.head() }
                    .collect(Collectors.toList())
                NDList(NDArrays.concat(NDList(concatenatedList), 1))
            }, Arrays.asList(p1, p2, p3, p4)
        )
}

To gain some intuition for why this network works so well,
consider the combination of the filters.
They explore the image in varying ranges.
This means that details at different extents
can be recognized efficiently by different filters.
At the same time, we can allocate different amounts of parameters
for different ranges (e.g., more for short range
but not ignore the long range entirely).

## GoogLeNet Model

As shown in :numref:`fig_inception_full`, GoogLeNet uses a stack of a total of 9 inception blocks
and global average pooling to generate its estimates.
Maximum pooling between inception blocks reduced the dimensionality.
The first part is identical to AlexNet and LeNet,
the stack of blocks is inherited from VGG
and the global average pooling avoids
a stack of fully-connected layers at the end.
The architecture is depicted below.

![Full GoogLeNet Model](https://raw.githubusercontent.com/d2l-ai/d2l-en/master/img/inception-full.svg)

:label:`fig_inception_full`


We can now implement GoogLeNet piece by piece.
The first component uses a 64-channel $7\times 7$ convolutional layer.

In [4]:
val block1 = SequentialBlock();
block1
    .add(Conv2d.builder()
                .setKernelShape(Shape(7, 7))
                .optPadding(Shape(3, 3))
                .optStride(Shape(2, 2))
                .setFilters(64)
                .build())
    .add(Activation::relu)
    .add(Pool.maxPool2dBlock(Shape(3, 3), Shape(2, 2), Shape(1, 1)));

SequentialBlock {
	Conv2d
	LambdaBlock
	maxPool2d
}

The second component uses two convolutional layers:
first, a 64-channel $1\times 1$ convolutional layer,
then a $3\times 3$ convolutional layer that triples the number of channels. This corresponds to the second path in the Inception block.

In [5]:
val block2 = SequentialBlock();
block2
    .add(Conv2d.builder()
            .setFilters(64)
            .setKernelShape(Shape(1, 1))
            .build())
    .add(Activation::relu)
    .add(Conv2d.builder()
            .setFilters(192)
            .setKernelShape(Shape(3, 3))
            .optPadding(Shape(1, 1))
            .build())
    .add(Activation::relu)
    .add(Pool.maxPool2dBlock(Shape(3, 3), Shape(2, 2), Shape(1, 1)));

SequentialBlock {
	Conv2d
	LambdaBlock
	Conv2d
	LambdaBlock
	maxPool2d
}

The third component connects two complete Inception blocks in series.
The number of output channels of the first Inception block is
$64+128+32+32=256$, and the ratio to the output channels
of the four paths is $64:128:32:32=2:4:1:1$.
The second and third paths first reduce the number of input channels
to $96/192=1/2$ and $16/192=1/12$, respectively,
and then connect the second convolutional layer.
The number of output channels of the second Inception block
is increased to $128+192+96+64=480$, and the ratio to the number of output channels per path is $128:192:96:64 = 4:6:3:2$.
The second and third paths first reduce the number of input channels
to $128/256=1/2$ and $32/256=1/8$, respectively.

In [6]:
val block3 = SequentialBlock();
block3
        .add(inceptionBlock(64, intArrayOf(96, 128), intArrayOf(16, 32), 32))
        .add(inceptionBlock(128, intArrayOf(128, 192), intArrayOf(32, 96), 64))
        .add(Pool.maxPool2dBlock(Shape(3, 3), Shape(2, 2), Shape(1, 1)));

SequentialBlock {
	ParallelBlock {
		SequentialBlock {
			Conv2d
			LambdaBlock
		}
		SequentialBlock {
			Conv2d
			LambdaBlock
			Conv2d
			LambdaBlock
		}
		SequentialBlock {
			Conv2d
			LambdaBlock
			Conv2d
			LambdaBlock
		}
		SequentialBlock {
			maxPool2d
			Conv2d
			LambdaBlock
		}
	}
	ParallelBlock {
		SequentialBlock {
			Conv2d
			LambdaBlock
		}
		SequentialBlock {
			Conv2d
			LambdaBlock
			Conv2d
			LambdaBlock
		}
		SequentialBlock {
			Conv2d
			LambdaBlock
			Conv2d
			LambdaBlock
		}
		SequentialBlock {
			maxPool2d
			Conv2d
			LambdaBlock
		}
	}
	maxPool2d
}

The fourth block is more complicated.
It connects five Inception blocks in series,
and they have $192+208+48+64=512$, $160+224+64+64=512$,
$128+256+64+64=512$, $112+288+64+64=528$,
and $256+320+128+128=832$ output channels, respectively.
The number of channels assigned to these paths is similar
to that in the third module:
the second path with the $3\times 3$ convolutional layer
outputs the largest number of channels,
followed by the first path with only the $1\times 1$ convolutional layer,
the third path with the $5\times 5$ convolutional layer,
and the fourth path with the $3\times 3$ maximum pooling layer.
The second and third paths will first reduce
the number of channels according the ratio.
These ratios are slightly different in different Inception blocks.

In [7]:
val block4 = SequentialBlock();
block4
        .add(inceptionBlock(192, intArrayOf(96, 208), intArrayOf(16, 48), 64))
        .add(inceptionBlock(160, intArrayOf(112, 224),intArrayOf(24, 64), 64))
        .add(inceptionBlock(128, intArrayOf(128, 256), intArrayOf(24, 64), 64))
        .add(inceptionBlock(112, intArrayOf(144, 288), intArrayOf(32, 64), 64))
        .add(inceptionBlock(256, intArrayOf(160, 320), intArrayOf(32, 128), 128))
        .add(Pool.maxPool2dBlock(Shape(3, 3), Shape(2, 2), Shape(1, 1)));

SequentialBlock {
	ParallelBlock {
		SequentialBlock {
			Conv2d
			LambdaBlock
		}
		SequentialBlock {
			Conv2d
			LambdaBlock
			Conv2d
			LambdaBlock
		}
		SequentialBlock {
			Conv2d
			LambdaBlock
			Conv2d
			LambdaBlock
		}
		SequentialBlock {
			maxPool2d
			Conv2d
			LambdaBlock
		}
	}
	ParallelBlock {
		SequentialBlock {
			Conv2d
			LambdaBlock
		}
		SequentialBlock {
			Conv2d
			LambdaBlock
			Conv2d
			LambdaBlock
		}
		SequentialBlock {
			Conv2d
			LambdaBlock
			Conv2d
			LambdaBlock
		}
		SequentialBlock {
			maxPool2d
			Conv2d
			LambdaBlock
		}
	}
	ParallelBlock {
		SequentialBlock {
			Conv2d
			LambdaBlock
		}
		SequentialBlock {
			Conv2d
			LambdaBlock
			Conv2d
			LambdaBlock
		}
		SequentialBlock {
			Conv2d
			LambdaBlock
			Conv2d
			LambdaBlock
		}
		SequentialBlock {
			maxPool2d
			Conv2d
			LambdaBlock
		}
	}
	ParallelBlock {
		SequentialBlock {
			Conv2d
			LambdaBlock
		}
		SequentialBlock {
			Conv2d
			LambdaBlock
			Conv2d
			LambdaBlock
		}
		Seq

The fifth block has two Inception blocks with $256+320+128+128=832$
and $384+384+128+128=1024$ output channels.
The number of channels assigned to each path
is the same as that in the third and fourth modules,
but differs in specific values.
It should be noted that the fifth block is followed by the output layer.
This block uses the global average pooling layer
to change the height and width of each channel to 1, just as in NiN.
Finally, we turn the output into a two-dimensional array
followed by a fully-connected layer
whose number of outputs is the number of label classes.

In [8]:
val block5 = SequentialBlock();
block5
        .add(inceptionBlock(256, intArrayOf(160, 320), intArrayOf(32, 128), 128))
        .add(inceptionBlock(384, intArrayOf(192, 384), intArrayOf(48, 128), 128))
        .add(Pool.globalAvgPool2dBlock());

var block = SequentialBlock();
block = block.addAll(block1, block2, block3, block4, block5, Linear.builder().setUnits(10).build());

The GoogLeNet model is computationally complex,
so it is not as easy to modify the number of channels as in VGG.
To have a reasonable training time on Fashion-MNIST,
we reduce the input height and width from 224 to 96.
This simplifies the computation.
The changes in the shape of the output
between the various modules is demonstrated below.

In [10]:
val manager = NDManager.newBaseManager();
val lr = 0.1f;
val model = Model.newInstance("cnn");
model.setBlock(block);

val loss = Loss.softmaxCrossEntropyLoss();

val lrt = Tracker.fixed(lr);
val sgd = Optimizer.sgd().setLearningRateTracker(lrt).build();

val config = DefaultTrainingConfig(loss).optOptimizer(sgd) // Optimizer (loss function)
        .optDevices(Engine.getInstance().getDevices(1)) // single GPU
        .addEvaluator(Accuracy()) // Model Accuracy
        .addTrainingListeners(*TrainingListener.Defaults.logging()); // Logging

val trainer = model.newTrainer(config);

val X = manager.randomUniform(0f, 1.0f, Shape(1, 1, 96, 96));
trainer.initialize(X.getShape());
var currentShape = X.getShape();

for (i in 0 until block.getChildren().size()) {
    val newShape = block.getChildren().get(i).getValue().getOutputShapes(arrayOf<Shape>(currentShape));
    currentShape = newShape[0];
    println(block.getChildren().get(i).getKey()+ i + " layer output : " + currentShape);
}

01SequentialBlock0 layer output : (1, 64, 24, 24)
02SequentialBlock1 layer output : (1, 192, 12, 12)
03SequentialBlock2 layer output : (1, 480, 6, 6)
04SequentialBlock3 layer output : (1, 832, 3, 3)
05SequentialBlock4 layer output : (1, 1024)
06Linear5 layer output : (1, 10)


## Data Acquisition and Training

As before, we train our model using the Fashion-MNIST dataset.
 We transform it to $96 \times 96$ pixel resolution
 before invoking the training procedure.

In [11]:
val batchSize = 128;
val numEpochs = Integer.getInteger("MAX_EPOCH", 10);

//double[] trainLoss;
//double[] testAccuracy;
//double[] epochCount;
//double[] trainAccuracy;

val epochCount = IntArray(numEpochs) { it + 1 }
//new double[numEpochs];
//
//for (int i = 0; i < epochCount.length; i++) {
//    epochCount[i] = (i + 1);
//}

val trainIter = FashionMnist.builder()
        .addTransform(Resize(96))
        .addTransform(ToTensor())
        .optUsage(Dataset.Usage.TRAIN)
        .setSampling(batchSize, true)
        .optLimit(getLong("DATASET_LIMIT", Long.MAX_VALUE))
        .build();

val testIter = FashionMnist.builder()
        .addTransform(Resize(96))
        .addTransform(ToTensor())
        .optUsage(Dataset.Usage.TEST)
        .setSampling(batchSize, true)
        .optLimit(getLong("DATASET_LIMIT", Long.MAX_VALUE))
        .build();

trainIter.prepare();
testIter.prepare();

In [12]:
val evaluatorMetrics = mutableMapOf<String, DoubleArray>()
val avgTrainTimePerEpoch = Training.trainingChapter6(trainIter, testIter, numEpochs, trainer, evaluatorMetrics);

Training:    100% |████████████████████████████████████████| Accuracy: 0.38, SoftmaxCrossEntropyLoss: 1.73: 2.55██████████████                 | Accuracy: 0.21, SoftmaxCrossEntropyLoss: 2.27, SoftmaxCrossEntropyLoss: 2.21 2.11
Validating:  100% |████████████████████████████████████████|         |ng:   80% |█████████████████████████████████       |
Training:    100% |████████████████████████████████████████| Accuracy: 0.73, SoftmaxCrossEntropyLoss: 0.73��█                                 | Accuracy: 0.58, SoftmaxCrossEntropyLoss: 1.17��████████████                        | Accuracy: 0.64, SoftmaxCrossEntropyLoss: 0.96       | Accuracy: 0.71, SoftmaxCrossEntropyLoss: 0.79xCrossEntropyLoss: 0.73
Validating:  100% |████████████████████████████████████████|�██                               |         |
Training:    100% |████████████████████████████████████████| Accuracy: 0.84, SoftmaxCrossEntropyLoss: 0.44��███████████                        | Accuracy: 0.82, SoftmaxCrossEntropyLoss: 0.48
V

In [13]:
trainLoss = evaluatorMetrics.get("train_epoch_SoftmaxCrossEntropyLoss");
trainAccuracy = evaluatorMetrics.get("train_epoch_Accuracy");
testAccuracy = evaluatorMetrics.get("validate_epoch_Accuracy");

print("loss %.3f,".format(trainLoss[numEpochs - 1]))
print(" train acc %.3f,".format(trainAccuracy[numEpochs - 1]))
print(" test acc %.3f\n".format(testAccuracy[numEpochs - 1]))
print("%.1f examples/sec".format(trainIter.size() / (avgTrainTimePerEpoch / Math.pow(10, 9))))
println()

Line_960.jupyter-kts (1:1 - 10) Unresolved reference: trainLoss
Line_960.jupyter-kts (2:1 - 14) Unresolved reference: trainAccuracy
Line_960.jupyter-kts (3:1 - 13) Unresolved reference: testAccuracy
Line_960.jupyter-kts (5:27 - 36) Unresolved reference: trainLoss
Line_960.jupyter-kts (6:33 - 46) Unresolved reference: trainAccuracy
Line_960.jupyter-kts (7:33 - 45) Unresolved reference: testAccuracy
Line_960.jupyter-kts (8:86 - 88) The integer literal does not conform to the expected type Double
Line_960.jupyter-kts (8:90 - 91) The integer literal does not conform to the expected type Double

![Contour Gradient Descent.](https://d2l-java-resources.s3.amazonaws.com/img/chapter_convolution-modern-cnn-googleNet.png)

In [14]:
// String[] lossLabel = new String[trainLoss.length + testAccuracy.length + trainAccuracy.length];

val trainLossLabel =  Array<String>(trainLoss!!.size) { "train loss" }
val trainAccLabel = Array<String>(trainLoss!!.size) { "train acc" }
val testAccLabel = Array<String>(trainLoss!!.size) { "test acc" }
val data = mapOf<String, Any>(
      "label" to trainLossLabel + trainAccLabel + testAccLabel,
      "epoch" to epochCount + epochCount + epochCount,
      "metrics" to trainLoss!! + trainAccuracy!! + testAccuracy!!
)

var plot = letsPlot(data)
plot += geomLine { x = "epoch" ; y = "metrics" ; color = "label"}
plot + ggsize(700, 500)

Line_962.jupyter-kts (3:37 - 46) Unresolved reference: trainLoss
Line_962.jupyter-kts (4:35 - 44) Unresolved reference: trainLoss
Line_962.jupyter-kts (5:34 - 43) Unresolved reference: trainLoss
Line_962.jupyter-kts (9:20 - 29) Unresolved reference: trainLoss
Line_962.jupyter-kts (9:34 - 47) Unresolved reference: trainAccuracy
Line_962.jupyter-kts (9:52 - 64) Unresolved reference: testAccuracy

## Summary

* The Inception block is equivalent to a subnetwork with four paths. It extracts information in parallel through convolutional layers of different window shapes and maximum pooling layers. $1 \times 1$ convolutions reduce channel dimensionality on a per-pixel level. Max-pooling reduces the resolution.
* GoogLeNet connects multiple well-designed Inception blocks with other layers in series. The ratio of the number of channels assigned in the Inception block is obtained through a large number of experiments on the ImageNet dataset.
* GoogLeNet, as well as its succeeding versions, was one of the most efficient models on ImageNet, providing similar test accuracy with lower computational complexity.

## Exercises

1. There are several iterations of GoogLeNet. Try to implement and run them. Some of them include the following:
    * Add a batch normalization layer :cite:`Ioffe.Szegedy.2015`, as described
      later in :numref:`sec_batch_norm`.
    * Make adjustments to the Inception block
      :cite:`Szegedy.Vanhoucke.Ioffe.ea.2016`.
    * Use "label smoothing" for model regularization
      :cite:`Szegedy.Vanhoucke.Ioffe.ea.2016`.
    * Include it in the residual connection
      :cite:`Szegedy.Ioffe.Vanhoucke.ea.2017`, as described later in
      :numref:`sec_resnet`.
1. What is the minimum image size for GoogLeNet to work?
1. Compare the model parameter sizes of AlexNet, VGG, and NiN with GoogLeNet. How do the latter two network architectures significantly reduce the model parameter size?
1. Why do we need a large range convolution initially?