Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clad does not generate derivatives of functions with reference inputs correctly. #281

Closed
grimmmyshini opened this issue Aug 8, 2021 · 0 comments · Fixed by #383
Closed

Comments

@grimmmyshini
Copy link
Contributor

grimmmyshini commented Aug 8, 2021

The thread of interest

Problem:

No, for the following example:

void helper(float& a){
    a = a*a;
}

float func(float x){
    helper(x);
    return x;
}

the following gradient is generated:

void func_grad(float x, clad::array_ref<float> _d_x, double &_final_error) {
    helper(x);
    float func_return = x;
    goto _label0;
  _label0:
    * _d_x += 1;
   // double _delta_x = 0;
   // _delta_x += * _d_x * x * 1.1920928955078125E-7;
   //  _final_error += _delta_x;
}

The gradient is obviously wrong.

Proposed solution:

we should somehow propagate the changes forward if any parameter is an L value reference type. I think we could do this by just forwarding the array_ref for the parameter passed as reference instead of creating a temporary _grad*.

Then the gradient would look something like this:

void helper_grad(float &a, clad::array_ref<float> _d_a) {
    float _t0;
    float _t1;
    _t1 = a;
    _t0 = a;
    a = _t1 * _t0;
    {
        float _r_d0 = * _d_a;
        float _r0 = _r_d0 * _t0;
        * _d_a += _r0;
        float _r1 = _t1 * _r_d0;
        * _d_a += _r1;
        * _d_a -= _r_d0;
        * _d_a;
    }
}
void func_grad(float x, clad::array_ref<float> _d_x, double &_final_error) {
    helper(x);
    float func_return = x;
    goto _label0;
  _label0:
    * _d_x += 1;
    helper_grad(x, _d_x);

A generalized poc:

I feel like what I proposed should work. Again, I suppose this requires some debate and some input from some more experienced folks but here is my argument:

Any function call can essentially be swapped into the original function code (or inlined to be more precise), so different combinations and their possible inlines may look like the following:

This is the target function we will use, it will contain a call to a void function or a call-assign to a non-void function

float func(float x, float y){
    // some ops
    y = helper_*(x, y);
    //OR
    helper_ *(x, y);
    // some ops
}

Only one reference variable + no return.

void helper_void_ref(float& x){
    x = x * x;
}

results in the following conversion:

float func(float x, float y){
    // some ops
    helper_void_ref(x);
    // some ops
}

TO:

float func(float x, float y){
    // some ops
    x = x * x;
    // some ops
}

One/pure reference variable(s) with non-void return.

float helper_ret_ref(float& x){
    return (x = x * x);
}

results in the following conversion:

float func(float x, float y){
    // some ops
    y = helper_ret_ref(x);
    // some ops
}

TO:

float func(float x, float y){
    // some ops
    x = x * x;
    y = x;
    // some ops
}

Pass by value with non void return (usual case).

float helper_ret_val(float x){
    return (x = x * x);
}

results in the following conversion:

float func(float x, float y){
    // some ops
    y = helper_ret_val(x);
    // some ops
}

TO:

float func(float x, float y){
  // some ops
  // pass by val so make a copy for x, use the temp value from this point on.
  float _temp_x = x;
  _temp_x = _temp_x * _temp_x;
  y = _temp_x;
  // some ops; use x as usual beyond this point, destruct temp.
}

Mixed call with void return.

void helper_void_mixed(float& x,  float y){
    x = x * y;
}

results in the following conversion:

float func(float x, float y){
    // some ops
    helper_void_mixed(x, y);
    // some ops
}

TO:

float func(float x, float y){
  // some ops
  // pass by val so make a copy for y, use the temp value from this point on for y.
  // pass by ref for x so use x as is.
  float _temp_y = y;
  x = x * _temp_y;
  // some ops; use y as usual beyond this point, destruct temp_y.
}

Mixed call with non-void return.

float helper_ret_mixed(float& x,  float y){
    x = x * x;
    y = x;
    return y;
}

results in the following conversion:

float func(float x, float y){
    // some ops
    y = helper_ret_mixed(x, y);
    // some ops
}

TO:

float func(float x, float y){
  // some ops
  // pass by val so make a copy for y, use the temp value from this point on for y.
  // pass by ref for x so use x as is.
  float _temp_y = y;
  x = x * x;
  _temp_y = x;
  // return assign here:
  y = _temp_y;
  // some ops; use y as usual beyond this point, destruct temp_y.
}

These are good comparisons to test my logic. I feel like we can prove here by how passing the same array ref (and a new temp one for pass by value) results in correct results.

Another "peculiar" case is functions with multiple references and a void return. One way is to actually inline these functions into the code and then continue the differentiation as is. One way to achieve this is to retain the tape types and then differentiate the function normally. To do this, we just have to assume that the void return function actually has a non void type (return type of the target function) and returns a random constant of that type. For example, consider the following:

void helperR(float& x, float& y){
    x = x*x;
    y += x;
}

float randomR(float x, float y){
    helperR(x, y);
    return x + y;
}

Now, lets take two cases:

  1. If we inline the function directly:

Transformed functions:

float randomR(float x, float y){
    x = x*x;
    y += x;
    return x + y;
}

Derivative Produced:

void randomR_grad(float x, float y, clad::array_ref<float> _d_x, clad::array_ref<float> _d_y) {
    float _t0;
    float _t1;
    _t1 = x;
    _t0 = x;
    x = _t1 * _t0;
    y += x;
    float randomR_return = x + y;
    goto _label0;
  _label0:
    {
        * _d_x += 1;
        * _d_y += 1;
    }
    {                                      // Block of interest starts here
        float _r_d1 = * _d_y;
        * _d_y += _r_d1;
        * _d_x += _r_d1;
        * _d_y -= _r_d1;
        * _d_y;
    }
    {
        float _r_d0 = * _d_x;
        float _r0 = _r_d0 * _t0;
        * _d_x += _r0;
        float _r1 = _t1 * _r_d0;
        * _d_x += _r1;
        * _d_x -= _r_d0;
        * _d_x;
    }                                         // Block of interest ends here.
}
  1. By using the dummy return:

Transformed Functions:

// Here we make sure the "dummy" return type is the same as the target function return type.
float helperR(float& x, float& y){
    x = x*x;
    y += x;
    // Here we return 0, or zero equivalent of the return type.
    // Or just return nothing! that works too.
    return 0;
}

float randomR(float x, float y){
    // dummy temp value to make sure clad differentiates this right now...
    float temp = helperR(x, y);
    return x + y;
}

Derivative produced:

void helperR_grad(float &x, float &y, clad::array_ref<float> _d_x, clad::array_ref<float> _d_y) {
    float _t0;
    float _t1;
    _t1 = x;
    _t0 = x;
    x = _t1 * _t0;
    y += x;
    int helperR_return = 0;
    goto _label0;
  _label0:
    ;
    {                                    // See how this block is completely identical to the inlined one we saw before?
        float _r_d1 = * _d_y;
        * _d_y += _r_d1;
        * _d_x += _r_d1;
        * _d_y -= _r_d1;
        * _d_y;
    }
    {
        float _r_d0 = * _d_x;
        float _r0 = _r_d0 * _t0;
        * _d_x += _r0;
        float _r1 = _t1 * _r_d0;
        * _d_x += _r1;
        * _d_x -= _r_d0;
        * _d_x;
    } 
}
void randomR_grad(float x, float y, clad::array_ref<float> _d_x, clad::array_ref<float> _d_y) {
    float _t0;
    float _t1;
    float _d_temp = 0;
    _t0 = x;
    _t1 = y;
    float temp = helperR(_t0, _t1);
    float randomR_return = x + y;
    goto _label0;
  _label0:
    {
        * _d_x += 1;
        * _d_y += 1;
    }
    {
        // float _grad0 = 0.F; -- No need
        // float _grad1 = 0.F; -- No need
        // here we need to make copies x and y again because these temps have been changed before.
        helperR_grad(_t0, _t1, _d_x, _d_y);
        float _r0 = _d_temp * _grad0; //d_temp unused so = 0; noop
        * _d_x += _r0; // Addition with 0; noop
        float _r1 = _d_temp * _grad1; // also 0; noop
        * _d_y += _r1; // also addition with 0; noop
    }
}

Now, you can probably see that the above two cases are absolutely identical. I suppose now it is just a choice of how we want to implement this.

Again, I have no exhaustively thought of this or tested it out so we probably need some more debate here...but that is my two cents!

P.S. Excuse any typos!

Originally posted by @grimmmyshini in #247 (comment)

parth-07 added a commit to parth-07/clad that referenced this issue Mar 1, 2022
The pullback function approach allows to "continue" the reverse mode automatic derivation when required.
This allows correctly computing derivatives when arguments are passed by reference or pointers.

This commit also modifies custom gradient functions to custom pullback functions.

Closes vgvassilev#281, Closes vgvassilev#386, Closes vgvassilev#387
vgvassilev pushed a commit that referenced this issue Mar 1, 2022
The pullback function approach allows to "continue" the reverse mode automatic derivation when required.
This allows correctly computing derivatives when arguments are passed by reference or pointers.

This commit also modifies custom gradient functions to custom pullback functions.

Closes #281, Closes #386, Closes #387
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant