New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FTRL implementation in tensorflow V.S. FTRL in Google's research paper #3725
Comments
Unless you think there is a bug, this question would be better asked on StackOverflow. Github issues are for code bugs and feature requests, not requests for clarification. |
I think there is a bug in it. |
The FTRL implementation in Tensorflow was exactly coming from that paper. I think the meaning of parameters is quite consistent with the notation in the paper. Where do you think the bug is? |
Hi, will001: Thanks for your attention! To our understanding, we can establish the following connections: However, we found that the notation "beta" is the paper is missing from the Tensorflow implementation. Furthermore, we guess that l2 is lambda2 in the paper, however, we are not able to get this conclusion by comparing the two implementations. Instead, we can only get this equation: Can you clarify these two points a little bit? Thanks again. |
I have same confusion. It seems that the optimizer force |
I do have the same question with @yanyachen |
Although the documentation points to the right paper, it was unclear to me (until I dug into the code) whether the TensorFlow class implemented Nesterov's dual averaging (ie. FTRL) or the FTRL-Proximal variant proposed in the Ad Click Prediction paper. It would be good to clarify this in the documentation, along with the meaning of the hyperparameters. Thanks! |
Thanks tangruiming@ for pointing out, the comments 2) in get_training_ops.py is not accurate, which should be: For the missing parameter 'beta' in the implementation, which actually comes from the initial_accumulator_value for the 'accum', where accum = initial_accumulator_value + sigma{g(i)^2}. So, you can think of it as beta + sqrt(n^i) == sqrt((initial_accumulator_value + sigma(g(i)^2))). To ageron@, the implementation in Tensorflow as FTRL-Proximal, proposed in the Ad Click Prediction paper. |
Hi, will001@: Thank you very much for your clarification. I found another place in the comments in the get_training_ops.py that may be wrong: Secondly, I want to confirm sth about "beta": Thanks again. Ruiming |
To tangruiming@, you are right. Thanks for pointing it out. |
To will001, thank you very much. My doubts are clear. |
by the way, where is the bias of logistic regression? |
Closing after reading latest comment from @tangruiming. |
@will001 , seems points (2) and (3) @tangruiming mentioned aren't fixed yet. |
hey, sorry to dig the old issue, but go into the implementation, still have some question here. tensorflow/core/kernel/training_ops.cc class SparseApplyFtrlOp
a/updated_a -> n FtrlCompute
linear -> z so , the problem is here
why there is |
@ydp still in this SparseApplyFtrlOp I cannot find where the var is set to zero if var = (sign(linear) * l1 - linear) / quadratic if |linear| <= l1 , so where is the sparse solution |
I agree. Seems like an issue we should fix. |
@tanzhenyu @will001 |
Hello, every one!
I am interested in digging the details how FTRL is implemented in tensorflow. I find some information in the file "gen_training_ops.py" in the folder /tensorflow/python/training. In this file, the formula of FTRL algorithm is described as follows:
I am also reading the paper "Ad Click Prediction: a View from the Trenches" by Google in KDD'13. The formula of FTRL algorithm is given in page 2 of this paper. Comparing this two implementations, we find some connections:
var is w_{t,i} in the paper; l1 is lambda1 in the paper; linear is zi in the paper; lr is alpha in the paper; grad is gi in the paper; accum is ni in the paper.
But also, there are some inconsistent points:
according to the paper, the Equation (2) above should be
linear += grad - (accum_new^(-lr_power) - accum^(-lr_power)) / lr * var
also we can get the following equation by comparing the two implementations:
2l2 *alpha = beta + alpha * lambda2
For any expert who is familiar with the FTRL implementation in the tensorflow, can you help us to clarify the meaning the parameters given in tensorflow, and the connections with the FTRL code in Google's research paper "Ad Click Prediction: a View from the Trenches".
Thanks!
The text was updated successfully, but these errors were encountered: