Skip to content

Commit

Permalink
add StochasticGradient doc back
Browse files Browse the repository at this point in the history
  • Loading branch information
hughperkins committed Sep 20, 2016
1 parent 1c0d2bb commit ef4ef6c
Show file tree
Hide file tree
Showing 2 changed files with 132 additions and 3 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,6 @@ This package provides an easy and modular way to build and train simple or compl
* [`ClassNLLCriterion`](doc/criterion.md#nn.ClassNLLCriterion): the Negative Log Likelihood criterion used for classification;
* Additional documentation:
* [Overview](doc/overview.md#nn.overview.dok) of the package essentials including modules, containers and training;
* [Training](doc/training.md#nn.traningneuralnet.dok): how to train a neural network using [optim](https://github.com/torch/optim);
* [Training](doc/training.md#nn.traningneuralnet.dok): how to train a neural network using [`StochasticGradient`](doc/training.md#nn.StochasticGradient), or [optim](https://github.com/torch/optim);
* [Testing](doc/testing.md): how to test your modules.
* [Experimental Modules](https://github.com/clementfarabet/lua---nnx/blob/master/README.md): a package containing experimental modules and criteria.
133 changes: 131 additions & 2 deletions doc/training.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ use the `optim` optimizer, which implements some cool functionalities, like Nest
[adagrad](https://github.com/torch/optim/blob/master/doc/index.md#x-adagradopfunc-x-config-state) and
[adam](https://github.com/torch/optim/blob/master/doc/index.md#x-adamopfunc-x-config-state).

We will demonstrate using a for-loop first, to show the low-level view of what happens in training, and then
we will show how to train using `optim`.
We will demonstrate using a for-loop first, to show the low-level view of what happens in training. [StochasticGradient](#nn.StochasticGradient), a simple class
which does the job for you is provided as standard. Finally, `optim` is commonly used, and provides multiple optimization algorithms.

<a name="nn.DoItYourself"></a>
## Example of manual training of a neural network ##
Expand Down Expand Up @@ -95,6 +95,135 @@ You should see something like:
[torch.Tensor of dimension 1]
```


<a name="nn.StochasticGradient.dok"></a>
## StochasticGradient ##

`StochasticGradient` is a high-level class for training [neural networks](#nn.Module), using a stochastic gradient
algorithm. This class is [serializable](https://github.com/torch/torch7/blob/master/doc/serialization.md#serialization).

<a name="nn.StochasticGradient"></a>
### StochasticGradient(module, criterion) ###

Create a `StochasticGradient` class, using the given [Module](module.md#nn.Module) and [Criterion](criterion.md#nn.Criterion).
The class contains [several parameters](#nn.StochasticGradientParameters) you might want to set after initialization.

<a name="nn.StochasticGradientTrain"></a>
### train(dataset) ###

Train the module and criterion given in the
[constructor](#nn.StochasticGradient) over `dataset`, using the
internal [parameters](#nn.StochasticGradientParameters).

StochasticGradient expect as a `dataset` an object which implements the operator
`dataset[index]` and implements the method `dataset:size()`. The `size()` methods
returns the number of examples and `dataset[i]` has to return the i-th example.

An `example` has to be an object which implements the operator
`example[field]`, where `field` might take the value `1` (input features)
or `2` (corresponding label which will be given to the criterion).
The input is usually a Tensor (except if you use special kind of gradient modules,
like [table layers](table.md#nn.TableLayers)). The label type depends of the criterion.
For example, the [MSECriterion](criterion.md#nn.MSECriterion) expects a Tensor, but the
[ClassNLLCriterion](criterion.md#nn.ClassNLLCriterion) except a integer number (the class).

Such a dataset is easily constructed by using Lua tables, but it could any `C` object
for example, as long as required operators/methods are implemented.
[See an example](#nn.DoItStochasticGradient).

<a name="nn.StochasticGradientParameters"></a>
### Parameters ###

`StochasticGradient` has several field which have an impact on a call to [train()](#nn.StochasticGradientTrain).

* `learningRate`: This is the learning rate used during training. The update of the parameters will be `parameters = parameters - learningRate * parameters_gradient`. Default value is `0.01`.
* `learningRateDecay`: The learning rate decay. If non-zero, the learning rate (note: the field learningRate will not change value) will be computed after each iteration (pass over the dataset) with: `current_learning_rate =learningRate / (1 + iteration * learningRateDecay)`
* `maxIteration`: The maximum number of iteration (passes over the dataset). Default is `25`.
* `shuffleIndices`: Boolean which says if the examples will be randomly sampled or not. Default is `true`. If `false`, the examples will be taken in the order of the dataset.
* `hookExample`: A possible hook function which will be called (if non-nil) during training after each example forwarded and backwarded through the network. The function takes `(self, example)` as parameters. Default is `nil`.
* `hookIteration`: A possible hook function which will be called (if non-nil) during training after a complete pass over the dataset. The function takes `(self, iteration, currentError)` as parameters. Default is `nil`.

<a name="nn.DoItStochasticGradient"></a>
## Example of training using StochasticGradient ##

We show an example here on a classical XOR problem.

__Dataset__

We first need to create a dataset, following the conventions described in
[StochasticGradient](#nn.StochasticGradientTrain).
```lua
dataset={};
function dataset:size() return 100 end -- 100 examples
for i=1,dataset:size() do
local input = torch.randn(2); -- normally distributed example in 2d
local output = torch.Tensor(1);
if input[1]*input[2]>0 then -- calculate label for XOR function
output[1] = -1;
else
output[1] = 1
end
dataset[i] = {input, output}
end
```

__Neural Network__

We create a simple neural network with one hidden layer.
```lua
require "nn"
mlp = nn.Sequential(); -- make a multi-layer perceptron
inputs = 2; outputs = 1; HUs = 20; -- parameters
mlp:add(nn.Linear(inputs, HUs))
mlp:add(nn.Tanh())
mlp:add(nn.Linear(HUs, outputs))
```

__Training__

We choose the Mean Squared Error criterion and train the dataset.
```lua
criterion = nn.MSECriterion()
trainer = nn.StochasticGradient(mlp, criterion)
trainer.learningRate = 0.01
trainer:train(dataset)
```

__Test the network__

```lua
x = torch.Tensor(2)
x[1] = 0.5; x[2] = 0.5; print(mlp:forward(x))
x[1] = 0.5; x[2] = -0.5; print(mlp:forward(x))
x[1] = -0.5; x[2] = 0.5; print(mlp:forward(x))
x[1] = -0.5; x[2] = -0.5; print(mlp:forward(x))
```

You should see something like:
```lua
> x = torch.Tensor(2)
> x[1] = 0.5; x[2] = 0.5; print(mlp:forward(x))

-0.3490
[torch.Tensor of dimension 1]

> x[1] = 0.5; x[2] = -0.5; print(mlp:forward(x))

1.0561
[torch.Tensor of dimension 1]

> x[1] = -0.5; x[2] = 0.5; print(mlp:forward(x))

0.8640
[torch.Tensor of dimension 1]

> x[1] = -0.5; x[2] = -0.5; print(mlp:forward(x))

-0.2941
[torch.Tensor of dimension 1]
```


<a name="nn.DoItYourself"></a>
## Training using optim ##

Expand Down

0 comments on commit ef4ef6c

Please sign in to comment.