Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Dropout/DropConnect #413

Closed
zoq opened this issue Mar 1, 2015 · 17 comments
Closed

Add Dropout/DropConnect #413

zoq opened this issue Mar 1, 2015 · 17 comments

Comments

@zoq
Copy link
Member

zoq commented Mar 1, 2015

Dropout is a recently introduced algorithm to prevent co-adaptation during training (overfitting). The key idea is to randomly drop units, along with their connections from a neural network during training. Routhly each element of a layer's output is kept with probability p, otherwise it's being set to 0.

For more information see:

  • Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, Ruslan R. Salakhutdinov, "Improving neural networks by preventing co-adaptation of feature detectors", 2012
  • Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan Salakhutdinov, "Dropout: A Simple Way to Prevent Neural Networks from Overfitting", 2014

A simple way to implement the technique is to introduce a new function which creates a dropOutMask. Afterwards, we can multiply the dropOutMask with the inputActivation in all layers which should support dropout. Something like:

// dropout
if dropoutFraction > 0
{
    ActivationFunction::fn(inputActivation * dropOutMask, outputActivation);
}
else
{
    ActivationFunction::fn(inputActivation, outputActivation);
}

DropConnect is a generalization of Dropout that takes the idea a step further. Rather than zeroing each unit activations with probability p, it zeroes the weights/connections with probability p.

For more information see:

  • Li Wan, Matthew Zeiler, Sixin Zhang, Yann Le Cun, Rob Fergus, "Regularization of Neural Networks using DropConnect", 2013

The idea to implement the technique is similar, except that we need to introduce the feature to all connections which should support dropConnect. The modified code should look something like:

// dropConnect
if dropConnectFraction > 0
{
    outputLayer.InputActivation() += (weights * dropOutMask * input);
}
else
{
    outputLayer.InputActivation() += (weights * input);
}
@stephentu
Copy link
Contributor

Did the neural net people really have to assign a new phrase to the idea of overfitting? 😄

@zoq
Copy link
Member Author

zoq commented Jun 24, 2015

In 7de290f I wrote the Dropout layer, so the ticket should focus on the Dropconnect implementation.

@palashahuja
Copy link
Contributor

@zoq, Hello, I was wondering if this task is available to solve. If so, I'm willing to do so.

@zoq
Copy link
Member Author

zoq commented Mar 1, 2016

I've added the Dropout layer in 7de290f but I didn't have a chance to implement Dropconnect. If you like you can implement Dropconnect.

theaverageguy added a commit to theaverageguy/mlpack that referenced this issue Mar 4, 2016
@theaverageguy
Copy link

I got the difference @zoq between the two. One has probability multiplications in weights(Connect) and the other has it in activation functions(Out). Can you guide me where to make changes though? According to me the input activation should be modified and then assigned to output activation. Please guide me a bit coz I want to fix this now.

@palashahuja
Copy link
Contributor

@theaverageguy, I am working on this right now. I will be sending a PR very soon regarding this bug ..

@zoq
Copy link
Member Author

zoq commented Mar 4, 2016

@theaverageguy you are right, I really like the images from the authors: http://cs.nyu.edu/~wanli/dropc/.

So, the implementation of the DropConnectLayer isn't that different as of the DropoutLayer. So you can use the DropoutLayer as a basis. So imagine you like to create a simple feedforward network something like that:

LinearLayer<> inputLayer(10, 2);
BiasLayer<> inputBiasLayer(2);
ReLULayer<> inputBaseLayer;

LinearLayer<> hiddenLayer(2, 10);
ReLULayer<> outputLayer;

Now we would like to use DropConnect between the input and the first hidden layer, so we what we need to do here is to set randomly weights from the inputLayer to 0. So let us modify our feedforward network so that it uses this new DropConnectLayer:

LinearLayer<> inputLayer(10, 2);
DropConnectLayer<> (inputLayer, 0.5, true);

BiasLayer<> inputBiasLayer(2);
ReLULayer<> inputBaseLayer;

LinearLayer<> hiddenLayer(2, 10);
ReLULayer<> outputLayer;

As you can see the Constructor of the DropConnectLayer is similar to the DropoutLayer but takes an additional parameter:

DropConnectLayer(layer, ratio, rescale)

In this case, we use layer (LinearLayer) inside the DropConnect Layer for the weight modification. So the Forward(...) function should look like

void Forward(input, output)
{
    layer.Weights() % mask;
    layer.Forward(input, output);    
}

We modify the weights of the layer and then use the Forward() function of the layer. The Backward() function is similar.

I hope this is helpful.

@zoq
Copy link
Member Author

zoq commented Mar 4, 2016

Not sure I get your point, but what if I would like DropConnect for the ConvLayer?

@palashahuja
Copy link
Contributor

Ok, I got it .. thanks ..

abhinavchanda added a commit to abhinavchanda/mlpack that referenced this issue Mar 4, 2016
In drop_connect_layer.hpp, we randomly drop weights instead of
units.This layer is based on linear_layer.hpp, with the exception
that the weights matrix is multiplied by the mask.
@palashahuja
Copy link
Contributor

@zoq, Please have a look at this commit. I have already implemented the dropconnect, but I wanted to know what should I do for layerTraits? It isn't clear, for this case ..

@zoq
Copy link
Member Author

zoq commented Mar 6, 2016

template<
    typename InputDataType,
    typename OutputDataType
>
class LayerTraits<DropConnectLayer<InputDataType, OutputDataType> >
{
 public:
  static const bool IsBinary = false;
  static const bool IsOutputLayer = false;
  static const bool IsBiasLayer = false;
  static const bool IsLSTMLayer = false;
  static const bool IsConnection = true;
};

Looks, good, the DropConnectLayer is connection, since it connects two layer. Btw. I really like your commit message.

@chvsp
Copy link
Contributor

chvsp commented Mar 13, 2016

@zoq is this issue fixed or is there something I can work on. I am willing to contribute for the same.

@zoq
Copy link
Member Author

zoq commented Mar 13, 2016

@chvsp Sorry, @palashahuja is working on the issue. The problem is, I can't assign anyone who isn't already part of mlpack.

@chvsp
Copy link
Contributor

chvsp commented Mar 13, 2016

@zoq I am a GSOC 2016 aspirant and as the PR hasn't been merged for long I thought there would be something which needs work. I don't understand what you meant by "The problem is, I can't assign anyone who isn't already part of mlpack." . Like is there something I should do to be eligible of fixing issues?

@rcurtin
Copy link
Member

rcurtin commented Mar 13, 2016

@chvsp: it seems to be a shortcoming of Github; see https://help.github.com/articles/assigning-issues-and-pull-requests-to-other-github-users/ :

You can only create assignments for yourself, collaborators on personal projects, or members of your organization with read permissions on the repository.

@palashahuja
Copy link
Contributor

@rcurtin, @zoq you could go ahead and close this issue

@zoq
Copy link
Member Author

zoq commented Mar 23, 2016

Merged DropConnect implementation in 63a7f62; take a look at #576.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants