Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KathirvalavakumarSubavathi initialization test #414

Closed
zoq opened this issue Mar 1, 2015 · 36 comments
Closed

KathirvalavakumarSubavathi initialization test #414

zoq opened this issue Mar 1, 2015 · 36 comments

Comments

@zoq
Copy link
Member

zoq commented Mar 1, 2015

T. Kathirvalavakumar and S. J. Subavathi proposed an efficient weight initialization method using Cauchy's inequality to improve the convergence in single hidden layer feed forward neural networks. We already implemented the algorithm, but there isn't a test which shows that the code works as expected. This is meant to fill this gap. The test case could compare the results given in the paper with our own implementation. Since we initialize the weights with uniformly distributed random numbers we have to run several iterations and compare the results with a small tolerance against the results from the paper.

For more information see:

  • Thangairulappan Kathirvalavakumar, Subramanian Jeyaseeli Subavathi, "A New Weight Initialization Method Using Cauchy’s Inequality Based on Sensitivity Analysis", 2011
@yenchenlin
Copy link

Hello @zoq , I'm new here and I would like to try this.

@zoq
Copy link
Member Author

zoq commented Mar 1, 2016

Hello @yenchenlin1994, the weight initialization method that uses Cauchy's inequality is one of my first choices when it comes, to initialize the weights. So, it would be great to see a test case for this method.

Btw. great picture, I didn't have a chance to see the movie :(

@ReVeaL-CVPR
Copy link

Hello @zoq, I've read the paper you listed. it seems not hard to implement the algorithm in the paper step by step. But I didn't find the test for the fnn, so what we should do it to write the weight initializing function into the fnn package and write a test program to compare the result of the existing one and the new one? And the test data should be created by our own?

@zoq
Copy link
Member Author

zoq commented Mar 6, 2016

We can use the VanillaNetworkTest (feedforward_network_test.app) as the basis; so replicate the e.g. 5-3-1 structure and check the return value of the optimizer.
Since we can't exactly replicate the results from the paper, we should run the test a couple of times, and take the mean value of the optimizer results and see if this is close to the results from the paper.

@vasanthkalingeri
Copy link
Contributor

@zoq , I went through the paper and will start working on this.

@chvsp
Copy link
Contributor

chvsp commented Mar 12, 2016

@vasanthkalingeri Hi I am also working on the same issue for the past 1 day. I would like to know how far have you reached on the same. If possible we could work together on the issue.

@ReVeaL-CVPR
Copy link

Hello, @zoq . I tried to add the KathirvalavakumarSubavathiInitialization algorithm you have implemented into the test case, but find something confusing.

  1. If I'm correct, we should use Iris Dataset as mentioned in the paper, and I find it in the data directory. The results in the paper, is it the iris_test.csv? Is it correct to just write:
    data::Load("iris_train.csv", dataset, true);
    arma::mat trainData = dataset.submat(0, 0, dataset.n_rows - 4, dataset.n_cols - 1);
    arma::mat trainLabels = dataset.submat(dataset.n_rows - 3, 0, dataset.n_rows - 1, dataset.n_cols - 1);
    data::Load("iris_test.csv", dataset, true);
    2.The constructor for instantiating KathirvalavakumarSubavathiInitialization require 2 parameter, and their are no default arguments. But the constructor on fnn.hpp pass no arguments. So the code below is incorrect:
    FFN< decltype(modules), decltype(classOutputLayer) ,
    KathirvalavakumarSubavathiInitialization,PerformanceFunctionType> net(modules, classOutputLayer);
    Is there any misunderstanding? Should we add default arguments for KSinitialization or somehow pass the arguments?
  2. Also this code. What it is for net(modules, classOutputLayer)? I can't find the implementation of "net".

@chvsp @vasanthkalingeri I'm very glad to cooperate with you, and it will be very appreciated if you can solve my confusion~

@chvsp
Copy link
Contributor

chvsp commented Mar 13, 2016

@tpBull net is an object of the class FFN. (modules, classOutputLayer) are the arguments passed to the constructor of FFN.

@zoq
Copy link
Member Author

zoq commented Mar 13, 2016

  1. It should be the same dataset. And yes you have to split the last dimension from the training.
  2. Take a look at the constructor of the FNN class:
FFN(LayerType &&network,
  OutputType &&outputLayer,
  InitializationRuleType initializeRule = InitializationRuleType(),
  PerformanceFunction performanceFunction = PerformanceFunction());

The third parameter it the object used to initialize the weights. So in your test you should use that parameter, something like this should work:

KathirvalavakumarSubavathiInitialization initRule(data, 0.3);
FFN<decltype(modules),
    decltype(classOutputLayer),
    KathirvalavakumarSubavathiInitialization,
    PerformanceFunctionType> net(modules, classOutputLayer, initRule);

@vasanthkalingeri
Copy link
Contributor

@chvsp and @tpBull I would love to work with you on the issue. I initially started my tests with iris to learn its working. Seeing that @tpBull is working on the same, I am writing tests for 3.3, 3.4 and 3.5 as in the paper. Please let me know your status as well, so there is no repetition in work.

@chvsp
Copy link
Contributor

chvsp commented Mar 14, 2016

@vasanthkalingeri Hi apparently I too have got the iris test to work but I have done some subtle minute changes to the KSINIT algo, I am now trying to modify the tests which doesn't require the changes to the algo. @tpBull Please update your status. @vasanthkalingeri Hey lets split up the rest of the tests. You tell which you have started on?

@vasanthkalingeri
Copy link
Contributor

@zoq I found this std::logic_error thrown by armadillo with KathirvalavakumarSubavathiInitialization. Have added the fix here #572. However, I am not sure why only I faced this error, is it something to do with my armadillo version ?

@vasanthkalingeri
Copy link
Contributor

@zoq I have written a few tests. But I am not able to replicate the results that they are getting in the paper. For instance, they seem to be getting an MSE in the order of 0.00x whereas my results are in the range 0.x. For MSE, I used, sum(sum( (predictions - testLabels) ^ 2) ) / n_test_labels. The paper doesn't explicitly state this formula, it only states that it is the MSE error. Here is the code https://github.com/vasanthkalingeri/mlpack/blob/stdlogicerror/src/mlpack/tests/kathirvalavakumar_subavathi_test.cpp , Can you please tell me why there is a discrepancy. I understand that such discrepancy is expected in two spirals problem, however I am unable to figure out why it is so in the standard iris data set. I understand that you might not have time to look at the code, so it would be really helpful if you could tell me if there is something I am missing. I did the 10-fold CV and averaged the errors over all folds as well.

@zoq
Copy link
Member Author

zoq commented Mar 30, 2016

Sorry for the slow response. I guess it would be a good idea to use another output layer instead of the BinaryClassificationLayer. If you use the BinaryClassificationLayer layer you get (output > 0.5), but what you like to get is actual output parameter.

@chvsp
Copy link
Contributor

chvsp commented Apr 14, 2016

Hi @zoq
Sorry for the late response but I have been working on this issue with vasanthkalingeri. I would like to complete this. Considering your above comment, I plan to use the MultiClassClassificationLayer as it returns the the output parameter. Would it be okay to use it or would you recommend writing another layer for the same?

@vasanthkalingeri
Copy link
Contributor

vasanthkalingeri commented Apr 14, 2016

Hi @chvsp I am sorry, I assumed you had not started implementing anything since at the time of our discussion there was a std:logic_error in the code. I have implemented all the tests but they still don't match the results in the paper.

I did try the multiclassclassificationLayer, the errors still seem to be in the same range. I am trying to find out the bug. Unfortunately I have to finish my college project report as well, so it is taking a long time to fix this error.

Although very unlikely, the paper doesn't talk about the optimization function used, could RMSprop be the problem here ? I did try out SGD and the results seem to be the same.

@chvsp
Copy link
Contributor

chvsp commented Apr 15, 2016

Hi @vasanthkalingeri , I was working on this in sync with the comments here from the very beginning. The only reason I didn't post here was that all the problems which I had were already mentioned and solved by Marcus.

Anyways, I too had the same doubt regarding RMSProp. I had asked @zoq about the same on IRC. He said that according to him RMSProp would work better than the optimization which is given in the paper. So I guess doubting on optimization wouldn't lead us anywhere.

I think the problem lies here. If you read the paper, there is a line which states that an input node x0 in addition to the input activations which should always be 1. also one node in the hidden layer is required to be active throughout. This is my thought after going through the paper again.

Anyways, let's collaborate on this and get this thing fixed. Tell me about your time restrictions so we could plan accordingly. what say?

@vasanthkalingeri
Copy link
Contributor

Then I guess we have a lot of duplicate work done already, currently I have posted the code for iris and two spirals. I will push the code I have with the other tests as well.

Can you post your code as well ? it's much better to just pick parts from each other's code that will make everything more efficient and neat.

Yes in tests, rmsprop did give the least errors.

Are you referring to the bias node x0 in the input layer ? I have added the bias node as well, it still has an error. And I didn't quite understand what you meant by the hidden layer part.

Collaborating together on this will fix it very soon especially since we both have the same error. Knowing that you are facing the same logic error gives me confidence that the problem lies only in the network portion of the code.

I have project review and cycle tests going on till the next week, but I can still work on it. Please post your progress so we prevent duplicate effort.

@chvsp
Copy link
Contributor

chvsp commented Apr 16, 2016

Hi @vasanthkalingeri . Sorry for the delayed response. I had some urgent submissions. Anyways here's the link to my implementation. https://github.com/chvsp/mlpack/blob/ksinit/src/mlpack/tests/ksinit.cpp . Have a look at this. I have only implemented the iris dataset(and this is just a proof of concept. If the error does reduce, I will add the BOOST_REQUIRE ). I am working on reducing the error, as once it's done, the rest can be written easily.

And about the hidden layer part. There is another bias term for the hidden layer as well. I was referring to that.

@vasanthkalingeri
Copy link
Contributor

Hi, that's fine. From your implementation looks like both of us have added the bias node for the hidden and the initial layer, so the error can't be with regard to the bias term.

We still have to find the bug then.

@zoq
Copy link
Member Author

zoq commented Apr 18, 2016

So I'm not sure if we are able to exactly replicate the MSE from the paper, it's partially stochastic right. However, we should make sure, we can replicate the MSE with a small fraction of noise added.

I couldn't test the code but could you make sure that in each fold the network starts with different weights. I think, it starts with the same weights in each fold.

@vasanthkalingeri
Copy link
Contributor

The error we are getting is 100 times the error obtained in the paper, do such differences occur because of the stochastic nature ?

Yes, the network is running with the same weight on each fold, I will make sure it starts with a different weight setting now.

@vasanthkalingeri
Copy link
Contributor

vasanthkalingeri commented Apr 20, 2016

@zoq I was mistaken, I just checked the code and the weights are reinitialized in each fold so that part is fine.

The bug is because I missed one line of the paper, they are dividing the entire data along with the labels by 10, I did that and obtained the same error magnitudes. I will be pushing the updated code soon.

@zoq
Copy link
Member Author

zoq commented Apr 20, 2016

@vasanthkalingeri That sounds great, thanks for taking the time to look into the issue.

@chvsp
Copy link
Contributor

chvsp commented Apr 21, 2016

@vasanthkalingeri Ya right, I overlooked that Logistic Function part. I was trying to normalize the labels but that didn't workout. Cool, quite a good issue. Learnt a lot about mlpack ann implementation this way.

@chvsp
Copy link
Contributor

chvsp commented Sep 11, 2016

@vasanthkalingeri Hi, as the issue is not closed yet. I would like to ask whether you are working on this. Or else I am ready to write the code. This is in the best interest of the community. Thanks.

@chvsp
Copy link
Contributor

chvsp commented Sep 17, 2016

@zoq Please advise if I can go about it.

@zoq
Copy link
Member Author

zoq commented Sep 17, 2016

Hello @chvsp, I would really like to see a test for the initialization strategy, so if you like feel free to work on it.

@vasanthkalingeri
Copy link
Contributor

Hello @chvsp, yes my implementation still requires work. I am sure you would do a much better job than me. Thank you.

@chvsp
Copy link
Contributor

chvsp commented Sep 22, 2016

Hi @zoq , I have completed the full implementation of the test for the Iris Dataset. I am currently implementing the other tests mentioned in the paper. Also, for the Iris dataset, I am getting the MSE close to 0.00065 whereas the published error is 0.00034. I am unable to decrease it further. Kindly look at my code and suggest some modifications if you are free. Thanks

https://github.com/chvsp/mlpack/blob/kathirinit/src/mlpack/tests/ksinit.cpp

@zoq
Copy link
Member Author

zoq commented Sep 30, 2016

@chvsp Sorry for the slow response, what I would do is to perform the CrossValidation test more than once, maybe 10 times. Maybe that we, we see some improvements by using another random seed.

Also, I don't think we can exactly replicate the error since the initialisation method isn't deterministic. At the end, you're error looks reasonable good to me, so if you like you can open a PR.

@kris-singh
Copy link
Contributor

@zoq is this still open. I would like to work on it

@chvsp
Copy link
Contributor

chvsp commented Feb 16, 2017

Hi @kris-singh . I have completed writing tests for the Iris dataset and the Nonlinear Function Approximation problem. I will open a PR with the clean code by 22nd of Feb (got some exams till then). It would be great if you can start working on the others. This code will give you a starting point. You would need to add some Boost test cases.
Cheers

@kris-singh
Copy link
Contributor

kris-singh commented Feb 16, 2017

@chvsp ok no problem. When you say start working on others and "add some Boost test cases" . Can you explain what you mean.

@starlord1311
Copy link

@zoq can i take up this issue?

@zoq
Copy link
Member Author

zoq commented Feb 12, 2018

@starlord1311 Sorry, forgot to close the issue, solved in: 85482cc#diff-f82a63766e75369ce5f94a67a73544de, but if you like you can implement another initalization method like: the ReLU Initialization method as described in: Delving Deep into Rectifiers:
Surpassing Human-Level Performance on ImageNet Classification by Kaiming He et al.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants