Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FTRL implementation in tensorflow V.S. FTRL in Google's research paper #3725

Closed
tangruiming opened this issue Aug 10, 2016 · 18 comments
Closed
Assignees

Comments

@tangruiming
Copy link

Hello, every one!

I am interested in digging the details how FTRL is implemented in tensorflow. I find some information in the file "gen_training_ops.py" in the folder /tensorflow/python/training. In this file, the formula of FTRL algorithm is described as follows:

def apply_ftrl(var, accum, linear, grad, lr, l1, l2, lr_power,
use_locking=None, name=None):
r"""Update '*var' according to the Ftrl-proximal scheme
accum_new = accum + grad * grad      ------ (1)
linear += grad + (accum_new^(-lr_power) - accum^(-lr_power)) / lr * var        ------ (2)
quadratic = 1.0 / (accum_new^(lr_power) * lr) + 2 * l2              ------ (3)
var = (sign(linear) * l1 - linear) / quadratic if |linear| > l1 else 0.0           ------(4)
accum = accum_new        ------(5)

I am also reading the paper "Ad Click Prediction: a View from the Trenches" by Google in KDD'13. The formula of FTRL algorithm is given in page 2 of this paper. Comparing this two implementations, we find some connections:
var is w_{t,i} in the paper; l1 is lambda1 in the paper; linear is zi in the paper; lr is alpha in the paper; grad is gi in the paper; accum is ni in the paper.

But also, there are some inconsistent points:
according to the paper, the Equation (2) above should be
linear += grad - (accum_new^(-lr_power) - accum^(-lr_power)) / lr * var
also we can get the following equation by comparing the two implementations:
2l2 *alpha = beta + alpha * lambda2

For any expert who is familiar with the FTRL implementation in the tensorflow, can you help us to clarify the meaning the parameters given in tensorflow, and the connections with the FTRL code in Google's research paper "Ad Click Prediction: a View from the Trenches".

Thanks!

@girving
Copy link
Contributor

girving commented Aug 10, 2016

Unless you think there is a bug, this question would be better asked on StackOverflow. Github issues are for code bugs and feature requests, not requests for clarification.

@girving girving closed this as completed Aug 10, 2016
@tangruiming
Copy link
Author

I think there is a bug in it.

@will001
Copy link

will001 commented Aug 11, 2016

The FTRL implementation in Tensorflow was exactly coming from that paper. I think the meaning of parameters is quite consistent with the notation in the paper. Where do you think the bug is?

@tangruiming
Copy link
Author

Hi, will001:

Thanks for your attention!

To our understanding, we can establish the following connections:
var is w_{t,i} in the paper; l1 is lambda1 in the paper; linear is zi in the paper; lr is alpha in the paper; grad is gi in the paper; accum is ni in the paper.

However, we found that the notation "beta" is the paper is missing from the Tensorflow implementation.

Furthermore, we guess that l2 is lambda2 in the paper, however, we are not able to get this conclusion by comparing the two implementations. Instead, we can only get this equation:
2l2 *alpha = beta + alpha * lambda2

Can you clarify these two points a little bit?

Thanks again.

@girving girving reopened this Aug 12, 2016
@yanyachen
Copy link

I have same confusion. It seems that the optimizer force beta = l2 * alpha.
Is there any reason behind that ?

@tangruiming
Copy link
Author

I do have the same question with @yanyachen

@ageron
Copy link
Contributor

ageron commented Aug 18, 2016

Although the documentation points to the right paper, it was unclear to me (until I dug into the code) whether the TensorFlow class implemented Nesterov's dual averaging (ie. FTRL) or the FTRL-Proximal variant proposed in the Ad Click Prediction paper.

It would be good to clarify this in the documentation, along with the meaning of the hyperparameters. Thanks!

@prb12 prb12 added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Aug 22, 2016
@will001
Copy link

will001 commented Aug 22, 2016

Thanks tangruiming@ for pointing out, the comments 2) in get_training_ops.py is not accurate, which should be:
linear += grad - (accum_new^(-lr_power) - accum^(-lr_power)) / lr * var.
I will fix this very soon.

For the missing parameter 'beta' in the implementation, which actually comes from the initial_accumulator_value for the 'accum', where accum = initial_accumulator_value + sigma{g(i)^2}. So, you can think of it as beta + sqrt(n^i) == sqrt((initial_accumulator_value + sigma(g(i)^2))).

To ageron@, the implementation in Tensorflow as FTRL-Proximal, proposed in the Ad Click Prediction paper.

@tangruiming
Copy link
Author

Hi, will001@:

Thank you very much for your clarification.

I found another place in the comments in the get_training_ops.py that may be wrong:
quadratic = 1.0 / (accum_new^(lr_power) * lr) + 2 * l2 ------ (3)
I think it should be
quadratic = 1.0 / (accum_new^(lr_power) * lr) + l2 ------ (3)
Am I right?

Secondly, I want to confirm sth about "beta":
I can have the similar equation as you stated:
beta + sqrt(n_i) = sqrt(accum_new) = sqrt(initial_accumulator_value + sigma(g(i)^2)).
Approximately, we can have beta + sqrt(n_i) = sqrt(beta + n_i), so based on these two equations, it can be concluded that beta is approximately the same as initial_accumulator_value. Am I right?

Thanks again.

Ruiming

@will001
Copy link

will001 commented Aug 23, 2016

To tangruiming@, you are right. Thanks for pointing it out.

@tangruiming
Copy link
Author

To will001, thank you very much. My doubts are clear.

@CasyWang
Copy link

by the way, where is the bias of logistic regression?

@girving girving removed the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Jan 17, 2017
@drpngx
Copy link
Contributor

drpngx commented Jan 24, 2017

Closing after reading latest comment from @tangruiming.

@drpngx drpngx closed this as completed Jan 24, 2017
@microwish
Copy link

@will001 , seems points (2) and (3) @tangruiming mentioned aren't fixed yet.

@ydp
Copy link

ydp commented Mar 30, 2018

hey, sorry to dig the old issue, but go into the implementation, still have some question here.

tensorflow/core/kernel/training_ops.cc

class SparseApplyFtrlOp

          T updated_a = a + g * g;
          using Eigen::numext::pow;
          T sigma = pow(updated_a, -lr_power_scalar) - pow(a, -lr_power_scalar);
          sigma /= lr_scalar;
          T updated_l = l + g - sigma * v;
          v = FtrlCompute(updated_a, updated_l, lr_scalar, l1_scalar, l2_scalar,
                          lr_power_scalar);
          a = updated_a;
          l = updated_l;

a/updated_a -> n
l/updated_l -> z
v -> w
lr_scalar -> alpha
l1_scalar -> lambda1
l2_scalar -> lambda2

FtrlCompute

template <typename T>
inline T FtrlCompute(const T& accum, const T& linear, const T& lr, const T& l1,
                     const T& l2, const T& lr_power) {
  T quadratic;
  if (lr_power == static_cast<T>(-0.5)) {
    quadratic = Eigen::numext::sqrt(accum) / lr + static_cast<T>(2) * l2;
  } else {
    quadratic =
        Eigen::numext::pow(accum, -lr_power) / lr + static_cast<T>(2) * l2;
  }
  if (Eigen::numext::abs(linear) > l1) {
    return (l1 * sgn(linear) - linear) / quadratic;
  } else {
    return static_cast<T>(0.0);
  }
}

linear -> z
l1 -> lambda1
l2 -> lambda2
lr -> alpha
accum -> n

so , the problem is here

quadratic = Eigen::numext::sqrt(accum) / lr + static_cast<T>(2) * l2;

why there is a static_cast<T>(2)? according paper, it is only lambda2.

@sjtusmartboy
Copy link

@ydp still in this SparseApplyFtrlOp I cannot find where the var is set to zero if var = (sign(linear) * l1 - linear) / quadratic if |linear| <= l1 , so where is the sparse solution

@tanzhenyu
Copy link
Contributor

I agree. Seems like an issue we should fix.

@xuyan1115
Copy link

xuyan1115 commented Aug 14, 2020

@tanzhenyu @will001
Has this bug been fixed? I see that the comment has not changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests