add StochasticGradient doc back

torch · Sep 20, 2016 · ef4ef6c · ef4ef6c
1 parent 1c0d2bb
commit ef4ef6c
Show file tree

Hide file tree

Showing 2 changed files with 132 additions and 3 deletions.
diff --git a/README.md b/README.md
@@ -16,6 +16,6 @@ This package provides an easy and modular way to build and train simple or compl
    * [`ClassNLLCriterion`](doc/criterion.md#nn.ClassNLLCriterion): the Negative Log Likelihood criterion used for classification;
  * Additional documentation:
    * [Overview](doc/overview.md#nn.overview.dok) of the package essentials including modules, containers and training;
-   * [Training](doc/training.md#nn.traningneuralnet.dok): how to train a neural network using [optim](https://github.com/torch/optim);
+   * [Training](doc/training.md#nn.traningneuralnet.dok): how to train a neural network using [`StochasticGradient`](doc/training.md#nn.StochasticGradient), or [optim](https://github.com/torch/optim);
    * [Testing](doc/testing.md): how to test your modules.
    * [Experimental Modules](https://github.com/clementfarabet/lua---nnx/blob/master/README.md): a package containing experimental modules and criteria.
diff --git a/doc/training.md b/doc/training.md
@@ -6,8 +6,8 @@ use the `optim` optimizer, which implements some cool functionalities, like Nest
 [adagrad](https://github.com/torch/optim/blob/master/doc/index.md#x-adagradopfunc-x-config-state) and
 [adam](https://github.com/torch/optim/blob/master/doc/index.md#x-adamopfunc-x-config-state).
 
-We will demonstrate using a for-loop first, to show the low-level view of what happens in training, and then
-we will show how to train using `optim`.
+We will demonstrate using a for-loop first, to show the low-level view of what happens in training.  [StochasticGradient](#nn.StochasticGradient), a simple class
+which does the job for you is provided as standard.  Finally, `optim` is commonly used, and provides multiple optimization algorithms.
 
 <a name="nn.DoItYourself"></a>
 ## Example of manual training of a neural network ##
@@ -95,6 +95,135 @@ You should see something like:
 [torch.Tensor of dimension 1]
 ```
 
+
+<a name="nn.StochasticGradient.dok"></a>
+## StochasticGradient ##
+
+`StochasticGradient` is a high-level class for training [neural networks](#nn.Module), using a stochastic gradient
+algorithm. This class is [serializable](https://github.com/torch/torch7/blob/master/doc/serialization.md#serialization).
+
+<a name="nn.StochasticGradient"></a>
+### StochasticGradient(module, criterion) ###
+
+Create a `StochasticGradient` class, using the given [Module](module.md#nn.Module) and [Criterion](criterion.md#nn.Criterion).
+The class contains [several parameters](#nn.StochasticGradientParameters) you might want to set after initialization.
+
+<a name="nn.StochasticGradientTrain"></a>
+### train(dataset) ###
+
+Train the module and criterion given in the
+[constructor](#nn.StochasticGradient) over `dataset`, using the
+internal [parameters](#nn.StochasticGradientParameters).
+
+StochasticGradient expect as a `dataset` an object which implements the operator
+`dataset[index]` and implements the method `dataset:size()`. The `size()` methods
+returns the number of examples and `dataset[i]` has to return the i-th example.
+
+An `example` has to be an object which implements the operator
+`example[field]`, where `field` might take the value `1` (input features)
+or `2` (corresponding label which will be given to the criterion). 
+The input is usually a Tensor (except if you use special kind of gradient modules,
+like [table layers](table.md#nn.TableLayers)). The label type depends of the criterion.
+For example, the [MSECriterion](criterion.md#nn.MSECriterion) expects a Tensor, but the
+[ClassNLLCriterion](criterion.md#nn.ClassNLLCriterion) except a integer number (the class).
+
+Such a dataset is easily constructed by using Lua tables, but it could any `C` object
+for example, as long as required operators/methods are implemented. 
+[See an example](#nn.DoItStochasticGradient).
+
+<a name="nn.StochasticGradientParameters"></a>
+### Parameters ###
+
+`StochasticGradient` has several field which have an impact on a call to [train()](#nn.StochasticGradientTrain).
+
+  * `learningRate`: This is the learning rate used during training. The update of the parameters will be `parameters = parameters - learningRate * parameters_gradient`. Default value is `0.01`.
+  * `learningRateDecay`: The learning rate decay. If non-zero, the learning rate (note: the field learningRate will not change value) will be computed after each iteration (pass over the dataset) with: `current_learning_rate =learningRate / (1 + iteration * learningRateDecay)`
+  * `maxIteration`: The maximum number of iteration (passes over the dataset). Default is `25`.
+  * `shuffleIndices`: Boolean which says if the examples will be randomly sampled or not. Default is `true`. If `false`, the examples will be taken in the order of the dataset.
+  * `hookExample`: A possible hook function which will be called (if non-nil) during training after each example forwarded and backwarded through the network. The function takes `(self, example)` as parameters. Default is `nil`.
+  * `hookIteration`: A possible hook function which will be called (if non-nil) during training after a complete pass over the dataset. The function takes `(self, iteration, currentError)` as parameters. Default is `nil`.
+
+<a name="nn.DoItStochasticGradient"></a>
+## Example of training using StochasticGradient ##
+
+We show an example here on a classical XOR problem.
+
+__Dataset__
+
+We first need to create a dataset, following the conventions described in
+[StochasticGradient](#nn.StochasticGradientTrain).
+```lua
+dataset={};
+function dataset:size() return 100 end -- 100 examples
+for i=1,dataset:size() do 
+  local input = torch.randn(2);     -- normally distributed example in 2d
+  local output = torch.Tensor(1);
+  if input[1]*input[2]>0 then     -- calculate label for XOR function
+    output[1] = -1;
+  else
+    output[1] = 1
+  end
+  dataset[i] = {input, output}
+end
+```
+
+__Neural Network__
+
+We create a simple neural network with one hidden layer.
+```lua
+require "nn"
+mlp = nn.Sequential();  -- make a multi-layer perceptron
+inputs = 2; outputs = 1; HUs = 20; -- parameters
+mlp:add(nn.Linear(inputs, HUs))
+mlp:add(nn.Tanh())
+mlp:add(nn.Linear(HUs, outputs))
+```
+
+__Training__
+
+We choose the Mean Squared Error criterion and train the dataset.
+```lua
+criterion = nn.MSECriterion()  
+trainer = nn.StochasticGradient(mlp, criterion)
+trainer.learningRate = 0.01
+trainer:train(dataset)
+```
+
+__Test the network__
+
+```lua
+x = torch.Tensor(2)
+x[1] =  0.5; x[2] =  0.5; print(mlp:forward(x))
+x[1] =  0.5; x[2] = -0.5; print(mlp:forward(x))
+x[1] = -0.5; x[2] =  0.5; print(mlp:forward(x))
+x[1] = -0.5; x[2] = -0.5; print(mlp:forward(x))
+```
+
+You should see something like:
+```lua
+> x = torch.Tensor(2)
+> x[1] =  0.5; x[2] =  0.5; print(mlp:forward(x))
+
+-0.3490
+[torch.Tensor of dimension 1]
+
+> x[1] =  0.5; x[2] = -0.5; print(mlp:forward(x))
+
+ 1.0561
+[torch.Tensor of dimension 1]
+
+> x[1] = -0.5; x[2] =  0.5; print(mlp:forward(x))
+
+ 0.8640
+[torch.Tensor of dimension 1]
+
+> x[1] = -0.5; x[2] = -0.5; print(mlp:forward(x))
+
+-0.2941
+[torch.Tensor of dimension 1]
+```
+
+
 <a name="nn.DoItYourself"></a>
 ## Training using optim ##