-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KathirvalavakumarSubavathi initialization test #414
Comments
Hello @zoq , I'm new here and I would like to try this. |
Hello @yenchenlin1994, the weight initialization method that uses Cauchy's inequality is one of my first choices when it comes, to initialize the weights. So, it would be great to see a test case for this method. Btw. great picture, I didn't have a chance to see the movie :( |
Hello @zoq, I've read the paper you listed. it seems not hard to implement the algorithm in the paper step by step. But I didn't find the test for the fnn, so what we should do it to write the weight initializing function into the fnn package and write a test program to compare the result of the existing one and the new one? And the test data should be created by our own? |
We can use the VanillaNetworkTest (feedforward_network_test.app) as the basis; so replicate the e.g. 5-3-1 structure and check the return value of the optimizer. |
@zoq , I went through the paper and will start working on this. |
@vasanthkalingeri Hi I am also working on the same issue for the past 1 day. I would like to know how far have you reached on the same. If possible we could work together on the issue. |
Hello, @zoq . I tried to add the KathirvalavakumarSubavathiInitialization algorithm you have implemented into the test case, but find something confusing.
@chvsp @vasanthkalingeri I'm very glad to cooperate with you, and it will be very appreciated if you can solve my confusion~ |
@tpBull net is an object of the class FFN. (modules, classOutputLayer) are the arguments passed to the constructor of FFN. |
The third parameter it the object used to initialize the weights. So in your test you should use that parameter, something like this should work:
|
@chvsp and @tpBull I would love to work with you on the issue. I initially started my tests with iris to learn its working. Seeing that @tpBull is working on the same, I am writing tests for 3.3, 3.4 and 3.5 as in the paper. Please let me know your status as well, so there is no repetition in work. |
@vasanthkalingeri Hi apparently I too have got the iris test to work but I have done some subtle minute changes to the KSINIT algo, I am now trying to modify the tests which doesn't require the changes to the algo. @tpBull Please update your status. @vasanthkalingeri Hey lets split up the rest of the tests. You tell which you have started on? |
@zoq I have written a few tests. But I am not able to replicate the results that they are getting in the paper. For instance, they seem to be getting an MSE in the order of 0.00x whereas my results are in the range 0.x. For MSE, I used, sum(sum( (predictions - testLabels) ^ 2) ) / n_test_labels. The paper doesn't explicitly state this formula, it only states that it is the MSE error. Here is the code https://github.com/vasanthkalingeri/mlpack/blob/stdlogicerror/src/mlpack/tests/kathirvalavakumar_subavathi_test.cpp , Can you please tell me why there is a discrepancy. I understand that such discrepancy is expected in two spirals problem, however I am unable to figure out why it is so in the standard iris data set. I understand that you might not have time to look at the code, so it would be really helpful if you could tell me if there is something I am missing. I did the 10-fold CV and averaged the errors over all folds as well. |
Sorry for the slow response. I guess it would be a good idea to use another output layer instead of the BinaryClassificationLayer. If you use the BinaryClassificationLayer layer you get (output > 0.5), but what you like to get is actual output parameter. |
Hi @zoq |
Hi @chvsp I am sorry, I assumed you had not started implementing anything since at the time of our discussion there was a std:logic_error in the code. I have implemented all the tests but they still don't match the results in the paper. I did try the multiclassclassificationLayer, the errors still seem to be in the same range. I am trying to find out the bug. Unfortunately I have to finish my college project report as well, so it is taking a long time to fix this error. Although very unlikely, the paper doesn't talk about the optimization function used, could RMSprop be the problem here ? I did try out SGD and the results seem to be the same. |
Hi @vasanthkalingeri , I was working on this in sync with the comments here from the very beginning. The only reason I didn't post here was that all the problems which I had were already mentioned and solved by Marcus. Anyways, I too had the same doubt regarding RMSProp. I had asked @zoq about the same on IRC. He said that according to him RMSProp would work better than the optimization which is given in the paper. So I guess doubting on optimization wouldn't lead us anywhere. I think the problem lies here. If you read the paper, there is a line which states that an input node x0 in addition to the input activations which should always be 1. also one node in the hidden layer is required to be active throughout. This is my thought after going through the paper again. Anyways, let's collaborate on this and get this thing fixed. Tell me about your time restrictions so we could plan accordingly. what say? |
Then I guess we have a lot of duplicate work done already, currently I have posted the code for iris and two spirals. I will push the code I have with the other tests as well. Can you post your code as well ? it's much better to just pick parts from each other's code that will make everything more efficient and neat. Yes in tests, rmsprop did give the least errors. Are you referring to the bias node x0 in the input layer ? I have added the bias node as well, it still has an error. And I didn't quite understand what you meant by the hidden layer part. Collaborating together on this will fix it very soon especially since we both have the same error. Knowing that you are facing the same logic error gives me confidence that the problem lies only in the network portion of the code. I have project review and cycle tests going on till the next week, but I can still work on it. Please post your progress so we prevent duplicate effort. |
Hi @vasanthkalingeri . Sorry for the delayed response. I had some urgent submissions. Anyways here's the link to my implementation. https://github.com/chvsp/mlpack/blob/ksinit/src/mlpack/tests/ksinit.cpp . Have a look at this. I have only implemented the iris dataset(and this is just a proof of concept. If the error does reduce, I will add the BOOST_REQUIRE ). I am working on reducing the error, as once it's done, the rest can be written easily. And about the hidden layer part. There is another bias term for the hidden layer as well. I was referring to that. |
Hi, that's fine. From your implementation looks like both of us have added the bias node for the hidden and the initial layer, so the error can't be with regard to the bias term. We still have to find the bug then. |
So I'm not sure if we are able to exactly replicate the MSE from the paper, it's partially stochastic right. However, we should make sure, we can replicate the MSE with a small fraction of noise added. I couldn't test the code but could you make sure that in each fold the network starts with different weights. I think, it starts with the same weights in each fold. |
The error we are getting is 100 times the error obtained in the paper, do such differences occur because of the stochastic nature ? Yes, the network is running with the same weight on each fold, I will make sure it starts with a different weight setting now. |
@zoq I was mistaken, I just checked the code and the weights are reinitialized in each fold so that part is fine. The bug is because I missed one line of the paper, they are dividing the entire data along with the labels by 10, I did that and obtained the same error magnitudes. I will be pushing the updated code soon. |
@vasanthkalingeri That sounds great, thanks for taking the time to look into the issue. |
@vasanthkalingeri Ya right, I overlooked that Logistic Function part. I was trying to normalize the labels but that didn't workout. Cool, quite a good issue. Learnt a lot about mlpack ann implementation this way. |
@vasanthkalingeri Hi, as the issue is not closed yet. I would like to ask whether you are working on this. Or else I am ready to write the code. This is in the best interest of the community. Thanks. |
@zoq Please advise if I can go about it. |
Hello @chvsp, I would really like to see a test for the initialization strategy, so if you like feel free to work on it. |
Hello @chvsp, yes my implementation still requires work. I am sure you would do a much better job than me. Thank you. |
Hi @zoq , I have completed the full implementation of the test for the Iris Dataset. I am currently implementing the other tests mentioned in the paper. Also, for the Iris dataset, I am getting the MSE close to 0.00065 whereas the published error is 0.00034. I am unable to decrease it further. Kindly look at my code and suggest some modifications if you are free. Thanks https://github.com/chvsp/mlpack/blob/kathirinit/src/mlpack/tests/ksinit.cpp |
@chvsp Sorry for the slow response, what I would do is to perform the CrossValidation test more than once, maybe 10 times. Maybe that we, we see some improvements by using another random seed. Also, I don't think we can exactly replicate the error since the initialisation method isn't deterministic. At the end, you're error looks reasonable good to me, so if you like you can open a PR. |
@zoq is this still open. I would like to work on it |
Hi @kris-singh . I have completed writing tests for the Iris dataset and the Nonlinear Function Approximation problem. I will open a PR with the clean code by 22nd of Feb (got some exams till then). It would be great if you can start working on the others. This code will give you a starting point. You would need to add some Boost test cases. |
@chvsp ok no problem. When you say start working on others and "add some Boost test cases" . Can you explain what you mean. |
@zoq can i take up this issue? |
@starlord1311 Sorry, forgot to close the issue, solved in: 85482cc#diff-f82a63766e75369ce5f94a67a73544de, but if you like you can implement another initalization method like: the ReLU Initialization method as described in: Delving Deep into Rectifiers: |
T. Kathirvalavakumar and S. J. Subavathi proposed an efficient weight initialization method using Cauchy's inequality to improve the convergence in single hidden layer feed forward neural networks. We already implemented the algorithm, but there isn't a test which shows that the code works as expected. This is meant to fill this gap. The test case could compare the results given in the paper with our own implementation. Since we initialize the weights with uniformly distributed random numbers we have to run several iterations and compare the results with a small tolerance against the results from the paper.
For more information see:
The text was updated successfully, but these errors were encountered: