# Concise Implementation of Multilayer Perceptron

:label:`sec_mlp_djl`


As you might expect, by relying on the DJL library,
we can implement MLPs even more concisely. <br>
Let's setup the relevant libraries first.

In [1]:
%use @file[../djl-pytorch.json]
// @file:DependsOn("../mxnet-native-cu112mkl-1.9.1-linux-x86_64.jar")
%use lets-plot
// %use dataframe
//@file:DependsOn("org.apache.commons:commons-lang3:3.12.0")
// %load ../utils/djl-imports
// %lo・d ../utils/plot-utils

fun getLong(nm: String, n: Long): Long {
    val name = System.getProperty(nm)
    return if (null == name) n.toLong() else name.toLong()
}


In [2]:
import ai.djl.metric.*;
import ai.djl.basicdataset.cv.classification.*;
// import org.apache.commons.lang3.ArrayUtils;

## The Model

As compared to our concise implementation 
of softmax regression implementation
(:numref:`sec_softmax_djl`),
the only difference is that we add 
*two* `Linear` (fully-connected) layers 
(previously, we added *one*).
The first is our hidden layer, 
which contains *256* hidden units
and applies the ReLU activation function.
The second is our output layer.

In [3]:
val net = SequentialBlock();
net.add(Blocks.batchFlattenBlock(784));
net.add(Linear.builder().setUnits(256).build());
net.add(Activation::relu);
net.add(Linear.builder().setUnits(10).build());
net.setInitializer(NormalInitializer(), Parameter.Type.WEIGHT);

Note that DJL, as usual, automatically
infers the missing input dimensions to each layer.

The training loop is *exactly* the same
as when we implemented softmax regression.
This modularity enables us to separate 
matters concerning the model architecture
from orthogonal considerations.

In [4]:
val batchSize = 256;
val  numEpochs = Integer.getInteger("MAX_EPOCH", 10);

val trainIter = FashionMnist.builder()
        .optUsage(Dataset.Usage.TRAIN)
        .setSampling(batchSize, true)
        .optLimit(getLong("DATASET_LIMIT", Long.MAX_VALUE))
        .build();


val testIter = FashionMnist.builder()
        .optUsage(Dataset.Usage.TEST)
        .setSampling(batchSize, true)
        .optLimit(getLong("DATASET_LIMIT", Long.MAX_VALUE))
        .build();

trainIter.prepare();
testIter.prepare();

val evaluatorMetrics = mutableMapOf<String, List<Float>>()

In [None]:
val lrt = Tracker.fixed(0.5f)
val sgd = Optimizer.sgd().setLearningRateTracker(lrt).build();

val loss = Loss.softmaxCrossEntropyLoss();

val config = DefaultTrainingConfig(loss)
                .optOptimizer(sgd) // Optimizer (loss function)
                .optDevices(Engine.getInstance().getDevices(1)) // single GPU
                .addEvaluator(Accuracy()) // Model Accuracy
                .addTrainingListeners(*TrainingListener.Defaults.logging()) // Logging

val model = Model.newInstance("mlp").apply {
    setBlock(net)
}

model.newTrainer(config).use { trainer ->
    trainer.initialize(Shape(1, 784))
    trainer.setMetrics(Metrics())
    EasyTrain.fit(trainer, numEpochs, trainIter, testIter)

    // Collect results from evaluators
    val metrics = trainer.getMetrics()
    trainer.getEvaluators().forEach { evaluator ->
        val trainMetrics = metrics.getMetric("train_epoch_" + evaluator.getName()).map { it.value.toFloat() }
        evaluatorMetrics["train_epoch_" + evaluator.getName()] = trainMetrics

        val validateMetrics = metrics.getMetric("validate_epoch_" + evaluator.getName()).map { it.value.toFloat() }
        evaluatorMetrics["validate_epoch_" + evaluator.getName()] = validateMetrics
    }
    trainer.close()
}

model.close()




In [None]:
val trainLoss = evaluatorMetrics.get("train_epoch_SoftmaxCrossEntropyLoss")
val trainAccuracy = evaluatorMetrics.get("train_epoch_Accuracy")
val testAccuracy = evaluatorMetrics.get("validate_epoch_Accuracy")
val count = listOf(1,2,3,4,5,6,7,8,9,10)

// val lossLabel = String[trainLoss.length + testAccuracy.length + trainAccuracy.length];

val trainLabel = Array<String>(trainLoss!!.size) { "train loss" } 
val accLabel = Array<String>(trainAccuracy!!.size) { "train acc" }
val testLabel = Array<String>(testAccuracy!!.size) {"test acc"}

val data = mapOf( "epochCount" to count + count + count,
                "loss" to trainLoss!! + trainAccuracy!! + testAccuracy!!,
                "lossLabel" to trainLabel + accLabel + testLabel)
var plot = letsPlot(data)
plot += geomLine { x = "epochCount" ; y = "loss" ; color = "lossLabel"}
plot + ggsize(500, 500)

## Exercises

1. Try adding different numbers of hidden layers. What setting (keeping other parameters and hyperparameters constant) works best? 
1. Try out different activation functions. Which ones work best?
1. Try different schemes for initializing the weights. What method works best?

