Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add leaky ReLUs #412

Closed
zoq opened this issue Mar 1, 2015 · 11 comments
Closed

Add leaky ReLUs #412

zoq opened this issue Mar 1, 2015 · 11 comments

Comments

@zoq
Copy link
Member

zoq commented Mar 1, 2015

Unlike the standard ReL function the leaky rectified linear function has a non-zero gradient over it's entire domain. So instead of having y = max(0, x), you have y = max(x / a, x), where a is some constant. This means you still get some sort of non-linearity, but the gradient can flow through in both directions.

For more information see:

  • Andrew L. Maas, Awni Y. Hannun, Andrew Y. Ng, "Rectifier Nonlinearities Improve Neural Network Acoustic Models", 2014

Since the parameter is fixed we could add the leakyness factor as a template parameter. The problem with the idea is that C++ doesn't support double as template parameter. So, we need to figure out a way around this issue.

@zoq
Copy link
Member Author

zoq commented Oct 19, 2015

Instead of writing an independent activation function we can just write a new LeakyReLULayer class. The constructor of the LeakyReLULayer class takes the leakyness factor as parameter.

@hercky
Copy link

hercky commented Feb 29, 2016

Hi

I'm willing to take this task. I'm new to the community and I'm trying for GSOC 16 and thus I believe implementing this issue can be a good starting point.

The way I see it now, it involves creating a new rectifier function class at ann/activation_functions and then adding that to base_layer.hpp module ( and corresponding test cases at test/activation_functions_test.cpp )

@zoq
Copy link
Member Author

zoq commented Feb 29, 2016

You are right this is a good starting point to get familiar with the code.

The BaseLayer only works with activation functions that can be called without any additional parameters like the sigmoid or the tanh function. Since the leaky rectified linear function uses the leakyness factor as an additional parameter you can't use the BaseLayer to call the function. But there is an easy solution you can directly implement the LeakyReLULayer without implementing the activation function in ann/activation_functions first. The LeakyReLULayer should have the same functions as SoftmaxLayer but should allow the specification of the leakyness factor in the constructor.

Please leave a comment if something doesn't make sense.

@abhinavchanda
Copy link

@zoq , I have written a LeakyReLULayer class here . I have used the forward and backpropagation functions similar to those in the base_layer.hpp. Please let me know if any changes are required. Also, can you guide me as to how to write test cases for this layer.

@zoq
Copy link
Member Author

zoq commented Mar 2, 2016

Thanks for the contribution. Before I merge the code in (I guess you will open a pull request) could you take a look at the design guidelines especially the comments section:

https://github.com/mlpack/mlpack/wiki/DesignGuidelines

It's minor, but I tend to be picky about code but I am not mean. :)

It would also be great if you could combine the two constructors into one:

LeakyReLULayer(const double alpha = 0.01) : alpha(alpha)

And last but not least, can you add a function that returns alpha and enables the modification of alpha?

@zoq
Copy link
Member Author

zoq commented Mar 2, 2016

About the test, take a look at the activation_functions_test.cpp, it basically tests different activation functions with edge cases and compares them with manually calculated data.

@abhinavchanda
Copy link

Hi. Thanks for the suggestions. I have made the required changes here.Regarding testing I have a doubt. As leakyReLu is a layer, as opposed to a single neuron, so it should have only forwad and backward as public methods and the activation function and derivative should not be exposed, but in tests/activation_functions_test.cpp only activation functions and their derivatives are being tested.

@GYengera
Copy link

GYengera commented Mar 3, 2016

@abhinavchanda your code is well written. It helped me to understand the codebase better.
I think you need to serialize the layer at line 156.
template <typename Archive> void Serialize(Archive& ar, const unsigned int) { }

@sharathts
Copy link

@zoq Is this task still open?

@zoq
Copy link
Member Author

zoq commented Mar 16, 2016

@sharathts No, the code was merged in e6f7ffe.

@sharathts
Copy link

@zoq Thank you for the information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants